PMID- 38433189 OWN - NLM STAT- MEDLINE DCOM- 20240305 LR - 20240306 IS - 1472-6947 (Electronic) IS - 1472-6947 (Linking) VI - 22 IP - Suppl 2 DP - 2024 Mar 3 TI - Natural language processing to identify lupus nephritis phenotype in electronic health records. PG - 348 LID - 10.1186/s12911-024-02420-7 [doi] LID - 348 AB - BACKGROUND: Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data from the Northwestern Medicine Enterprise Data Warehouse (NMEDW). METHODS: We developed five algorithms: a rule-based algorithm using only structured data (baseline algorithm) and four algorithms using different NLP models. The first NLP model applied simple regular expression for keywords search combined with structured data. The other three NLP models were based on regularized logistic regression and used different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components (i.e. a curated list of CUIs, regular expression concepts, structured data) respectively. The baseline algorithm and the best performing NLP algorithm were externally validated on a dataset from Vanderbilt University Medical Center (VUMC). RESULTS: Our best performing NLP model incorporated features from both structured data, regular expression concepts, and mapped concept unique identifiers (CUIs) and showed improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.52 vs 0.93) datasets compared to the baseline lupus nephritis algorithm. CONCLUSION: Our NLP MetaMap mixed model improved the F-measure greatly compared to the structured data only algorithm in both internal and external validation datasets. The NLP algorithms can serve as powerful tools to accurately identify lupus nephritis phenotype in EHR for clinical research and better targeted therapies. CI - (c) 2024. The Author(s). FAU - Deng, Yu AU - Deng Y AD - Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA. FAU - Pacheco, Jennifer A AU - Pacheco JA AD - Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, USA. FAU - Ghosh, Anika AU - Ghosh A AD - Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA. FAU - Chung, Anh AU - Chung A AD - Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA. AD - Department of Medicine/Rheumatology, Feinberg School of Medicine, Northwestern University, Chicago, USA. FAU - Mao, Chengsheng AU - Mao C AD - Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA. FAU - Smith, Joshua C AU - Smith JC AD - Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA. FAU - Zhao, Juan AU - Zhao J AD - Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA. FAU - Wei, Wei-Qi AU - Wei WQ AD - Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA. FAU - Barnado, April AU - Barnado A AD - Department of Medicine, Vanderbilt University Medical Center, Nashville, USA. FAU - Dorn, Chad AU - Dorn C AD - Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA. FAU - Weng, Chunhua AU - Weng C AD - Department of Biomedical Informatics, Columbia University, New York City, USA. FAU - Liu, Cong AU - Liu C AD - Department of Biomedical Informatics, Columbia University, New York City, USA. FAU - Cordon, Adam AU - Cordon A AD - Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, USA. FAU - Yu, Jingzhi AU - Yu J AD - Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA. FAU - Tedla, Yacob AU - Tedla Y AD - Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA. FAU - Kho, Abel AU - Kho A AD - Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA. FAU - Ramsey-Goldman, Rosalind AU - Ramsey-Goldman R AD - Department of Medicine/Rheumatology, Feinberg School of Medicine, Northwestern University, Chicago, USA. FAU - Walunas, Theresa AU - Walunas T AD - Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA. t-walunas@northwestern.edu. FAU - Luo, Yuan AU - Luo Y AUID- ORCID: 0000-0003-0195-7456 AD - Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA. yuan.luo@northwestern.edu. LA - eng GR - U01HG008680/HG/NHGRI NIH HHS/United States GR - U01HG008673/HG/NHGRI NIH HHS/United States GR - U01HG008672/HG/NHGRI NIH HHS/United States GR - 5R21AR072262/AR/NIAMS NIH HHS/United States GR - 1K08 AR072757-01/AR/NIAMS NIH HHS/United States GR - R61 AR076824/AR/NIAMS NIH HHS/United States GR - U01 HG008680/HG/NHGRI NIH HHS/United States PT - Journal Article DEP - 20240303 PL - England TA - BMC Med Inform Decis Mak JT - BMC medical informatics and decision making JID - 101088682 SB - IM MH - Humans MH - *Lupus Nephritis/diagnosis MH - Electronic Health Records MH - Natural Language Processing MH - *Lupus Erythematosus, Systemic MH - Phenotype MH - Rare Diseases PMC - PMC10910523 OTO - NOTNLM OT - Computational phenotyping OT - Electronic health records OT - Lupus nephritis OT - Natural language processing COIS- The authors declare that they have no competing interests. EDAT- 2024/03/04 00:43 MHDA- 2024/03/05 06:44 PMCR- 2024/03/03 CRDT- 2024/03/03 23:13 PHST- 2021/04/09 00:00 [received] PHST- 2024/01/09 00:00 [accepted] PHST- 2024/03/05 06:44 [medline] PHST- 2024/03/04 00:43 [pubmed] PHST- 2024/03/03 23:13 [entrez] PHST- 2024/03/03 00:00 [pmc-release] AID - 10.1186/s12911-024-02420-7 [pii] AID - 2420 [pii] AID - 10.1186/s12911-024-02420-7 [doi] PST - epublish SO - BMC Med Inform Decis Mak. 2024 Mar 3;22(Suppl 2):348. doi: 10.1186/s12911-024-02420-7.