PMID- 37087111 OWN - NLM STAT- MEDLINE DCOM- 20230522 LR - 20240423 IS - 1527-974X (Electronic) IS - 1067-5027 (Print) IS - 1067-5027 (Linking) VI - 30 IP - 6 DP - 2023 May 19 TI - quEHRy: a question answering system to query electronic health records. PG - 1091-1102 LID - 10.1093/jamia/ocad050 [doi] AB - OBJECTIVE: We propose a system, quEHRy, to retrieve precise, interpretable answers to natural language questions from structured data in electronic health records (EHRs). MATERIALS AND METHODS: We develop/synthesize the main components of quEHRy: concept normalization (MetaMap), time frame classification (new), semantic parsing (existing), visualization with question understanding (new), and query module for FHIR mapping/processing (new). We evaluate quEHRy on 2 clinical question answering (QA) datasets. We evaluate each component separately as well as holistically to gain deeper insights. We also conduct a thorough error analysis for a crucial subcomponent, medical concept normalization. RESULTS: Using gold concepts, the precision of quEHRy is 98.33% and 90.91% for the 2 datasets, while the overall accuracy was 97.41% and 87.75%. Precision was 94.03% and 87.79% even after employing an automated medical concept extraction system (MetaMap). Most incorrectly predicted medical concepts were broader in nature than gold-annotated concepts (representative of the ones present in EHRs), eg, Diabetes versus Diabetes Mellitus, Non-Insulin-Dependent. DISCUSSION: The primary performance barrier to deployment of the system is due to errors in medical concept extraction (a component not studied in this article), which affects the downstream generation of correct logical structures. This indicates the need to build QA-specific clinical concept normalizers that understand EHR context to extract the "relevant" medical concepts from questions. CONCLUSION: We present an end-to-end QA system that allows information access from EHRs using natural language and returns an exact, verifiable answer. Our proposed system is high-precision and interpretable, checking off the requirements for clinical use. CI - (c) The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com. FAU - Soni, Sarvesh AU - Soni S AUID- ORCID: 0000-0003-1704-1039 AD - School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA. FAU - Datta, Surabhi AU - Datta S AUID- ORCID: 0000-0002-5634-2005 AD - School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA. FAU - Roberts, Kirk AU - Roberts K AUID- ORCID: 0000-0001-6525-5213 AD - School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA. LA - eng GR - R01 LM011934/LM/NLM NIH HHS/United States GR - R00LM012104/NH/NIH HHS/United States GR - R21 EB029575/EB/NIBIB NIH HHS/United States PT - Journal Article PT - Research Support, N.I.H., Extramural PT - Research Support, Non-U.S. Gov't PL - England TA - J Am Med Inform Assoc JT - Journal of the American Medical Informatics Association : JAMIA JID - 9430800 RN - 7440-57-5 (Gold) SB - IM MH - *Electronic Health Records MH - *Natural Language Processing MH - Semantics MH - Access to Information MH - Gold PMC - PMC10198534 OTO - NOTNLM OT - FHIR OT - artificial intelligence OT - electronic health records OT - machine learning OT - natural language processing OT - question answering COIS- None declared. EDAT- 2023/04/23 00:41 MHDA- 2023/05/22 06:42 PMCR- 2024/04/22 CRDT- 2023/04/22 20:32 PHST- 2022/09/30 00:00 [received] PHST- 2023/01/19 00:00 [revised] PHST- 2023/04/05 00:00 [accepted] PHST- 2023/05/22 06:42 [medline] PHST- 2023/04/23 00:41 [pubmed] PHST- 2023/04/22 20:32 [entrez] PHST- 2024/04/22 00:00 [pmc-release] AID - 7136720 [pii] AID - ocad050 [pii] AID - 10.1093/jamia/ocad050 [doi] PST - ppublish SO - J Am Med Inform Assoc. 2023 May 19;30(6):1091-1102. doi: 10.1093/jamia/ocad050.