PMID- 30944913 OWN - NLM STAT- PubMed-not-MEDLINE LR - 20220408 IS - 2574-2531 (Electronic) IS - 2574-2531 (Linking) VI - 2 IP - 1 DP - 2019 Apr TI - Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes. PG - 139-149 LID - 10.1093/jamiaopen/ooy061 [doi] AB - OBJECTIVES: Natural language processing (NLP) and machine learning approaches were used to build classifiers to identify genomic-related treatment changes in the free-text visit progress notes of cancer patients. METHODS: We obtained 5889 deidentified progress reports (2439 words on average) for 755 cancer patients who have undergone a clinical next generation sequencing (NGS) testing in Wake Forest Baptist Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN) namely, gated recurrent unit, long short-term memory (LSTM), and bidirectional LSTM (LSTM_Bi) were applied to classify documents to the treatment-change and no-treatment-change groups. Further, we compared the performances of RNNs to 5 machine learning algorithms including Naive Bayes, K-nearest Neighbor, Support Vector Machine for classification, Random forest, and Logistic Regression. RESULTS: Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score. In addition, pretrained word embedding can improve the accuracy of LSTM by 3.4% and reduce the training time by more than 60%. DISCUSSION AND CONCLUSION: NLP and RNN-based text mining solutions have demonstrated advantages in information retrieval and document classification tasks for unstructured clinical progress notes. FAU - Guan, Meijian AU - Guan M AD - Department of Computer Science, Wake Forest University, Winston-Salem, North Carolina, USA. AD - Wake Forest Baptist Comprehensive Cancer Center, Winston Salem, North Carolina, USA. FAU - Cho, Samuel AU - Cho S AD - Department of Computer Science, Wake Forest University, Winston-Salem, North Carolina, USA. AD - Department of Physics, Wake Forest University, Winston-Salem, North Carolina, USA. FAU - Petro, Robin AU - Petro R AD - Wake Forest Baptist Comprehensive Cancer Center, Winston Salem, North Carolina, USA. FAU - Zhang, Wei AU - Zhang W AD - Wake Forest Baptist Comprehensive Cancer Center, Winston Salem, North Carolina, USA. AD - Department of Cancer Biology, Wake Forest School of Medicine, Winston Salem, North Carolina, USA. FAU - Pasche, Boris AU - Pasche B AD - Wake Forest Baptist Comprehensive Cancer Center, Winston Salem, North Carolina, USA. AD - Department of Cancer Biology, Wake Forest School of Medicine, Winston Salem, North Carolina, USA. FAU - Topaloglu, Umit AU - Topaloglu U AD - Wake Forest Baptist Comprehensive Cancer Center, Winston Salem, North Carolina, USA. AD - Department of Cancer Biology, Wake Forest School of Medicine, Winston Salem, North Carolina, USA. LA - eng GR - P30 CA012197/CA/NCI NIH HHS/United States GR - UL1 TR001420/TR/NCATS NIH HHS/United States PT - Journal Article DEP - 20190103 PL - United States TA - JAMIA Open JT - JAMIA open JID - 101730643 PMC - PMC6435007 OTO - NOTNLM OT - cancer OT - electronic health records OT - genomics OT - machine learning OT - natural language processing EDAT- 2019/04/05 06:00 MHDA- 2019/04/05 06:01 PMCR- 2019/01/03 CRDT- 2019/04/05 06:00 PHST- 2018/08/19 00:00 [received] PHST- 2018/11/26 00:00 [revised] PHST- 2018/12/21 00:00 [accepted] PHST- 2019/04/05 06:00 [entrez] PHST- 2019/04/05 06:00 [pubmed] PHST- 2019/04/05 06:01 [medline] PHST- 2019/01/03 00:00 [pmc-release] AID - ooy061 [pii] AID - 10.1093/jamiaopen/ooy061 [doi] PST - ppublish SO - JAMIA Open. 2019 Apr;2(1):139-149. doi: 10.1093/jamiaopen/ooy061. Epub 2019 Jan 3.