PMID- 27539197 OWN - NLM STAT- MEDLINE DCOM- 20180115 LR - 20230821 IS - 1527-974X (Electronic) IS - 1067-5027 (Print) IS - 1067-5027 (Linking) VI - 24 IP - e1 DP - 2017 Apr 1 TI - A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD). PG - e79-e86 LID - 10.1093/jamia/ocw109 [doi] AB - OBJECTIVE: The goal of this study was to develop a practical framework for recognizing and disambiguating clinical abbreviations, thereby improving current clinical natural language processing (NLP) systems' capability to handle abbreviations in clinical narratives. METHODS: We developed an open-source framework for clinical abbreviation recognition and disambiguation (CARD) that leverages our previously developed methods, including: (1) machine learning based approaches to recognize abbreviations from a clinical corpus, (2) clustering-based semiautomated methods to generate possible senses of abbreviations, and (3) profile-based word sense disambiguation methods for clinical abbreviations. We applied CARD to clinical corpora from Vanderbilt University Medical Center (VUMC) and generated 2 comprehensive sense inventories for abbreviations in discharge summaries and clinic visit notes. Furthermore, we developed a wrapper that integrates CARD with MetaMap, a widely used general clinical NLP system. RESULTS AND CONCLUSION: CARD detected 27 317 and 107 303 distinct abbreviations from discharge summaries and clinic visit notes, respectively. Two sense inventories were constructed for the 1000 most frequent abbreviations in these 2 corpora. Using the sense inventories created from discharge summaries, CARD achieved an F1 score of 0.755 for identifying and disambiguating all abbreviations in a corpus from the VUMC discharge summaries, which is superior to MetaMap and Apache's clinical Text Analysis Knowledge Extraction System (cTAKES). Using additional external corpora, we also demonstrated that the MetaMap-CARD wrapper improved MetaMap's performance in recognizing disorder entities in clinical notes. The CARD framework, 2 sense inventories, and the wrapper for MetaMap are publicly available at https://sbmi.uth.edu/ccb/resources/abbreviation.htm . We believe the CARD framework can be a valuable resource for improving abbreviation identification in clinical NLP systems. CI - (c) The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com FAU - Wu, Yonghui AU - Wu Y AD - School of Biomedical Informatics, The University of Texas Health Science Center at Houston. FAU - Denny, Joshua C AU - Denny JC AD - Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee. AD - Department of Medicine, Vanderbilt University School of Medicine. FAU - Trent Rosenbloom, S AU - Trent Rosenbloom S AD - Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee. AD - Department of Medicine, Vanderbilt University School of Medicine. FAU - Miller, Randolph A AU - Miller RA AD - Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee. AD - Department of Medicine, Vanderbilt University School of Medicine. FAU - Giuse, Dario A AU - Giuse DA AD - Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee. FAU - Wang, Lulu AU - Wang L AD - Department of Medicine, Vanderbilt University School of Medicine. FAU - Blanquicett, Carmelo AU - Blanquicett C AD - Department of Medicine, University of Alabama at Birmingham, Birmingham. FAU - Soysal, Ergin AU - Soysal E AD - School of Biomedical Informatics, The University of Texas Health Science Center at Houston. FAU - Xu, Jun AU - Xu J AD - School of Biomedical Informatics, The University of Texas Health Science Center at Houston. FAU - Xu, Hua AU - Xu H AD - School of Biomedical Informatics, The University of Texas Health Science Center at Houston. LA - eng GR - R01 GM103859/GM/NIGMS NIH HHS/United States GR - R01 LM010681/LM/NLM NIH HHS/United States PT - Journal Article PL - England TA - J Am Med Inform Assoc JT - Journal of the American Medical Informatics Association : JAMIA JID - 9430800 SB - IM MH - *Abbreviations as Topic MH - *Electronic Health Records MH - Humans MH - *Machine Learning MH - *Natural Language Processing MH - Patient Discharge PMC - PMC7651947 OTO - NOTNLM OT - clinical abbreviation OT - clinical natural language processing OT - machine learning OT - sense clustering EDAT- 2016/08/20 06:00 MHDA- 2018/01/16 06:00 PMCR- 2017/08/18 CRDT- 2016/08/20 06:00 PHST- 2016/02/05 00:00 [received] PHST- 2016/06/10 00:00 [accepted] PHST- 2016/08/20 06:00 [pubmed] PHST- 2018/01/16 06:00 [medline] PHST- 2016/08/20 06:00 [entrez] PHST- 2017/08/18 00:00 [pmc-release] AID - ocw109 [pii] AID - 10.1093/jamia/ocw109 [doi] PST - ppublish SO - J Am Med Inform Assoc. 2017 Apr 1;24(e1):e79-e86. doi: 10.1093/jamia/ocw109.