PMID- 25841328 OWN - NLM STAT- MEDLINE DCOM- 20160303 LR - 20220316 IS - 1532-0480 (Electronic) IS - 1532-0464 (Print) IS - 1532-0464 (Linking) VI - 55 DP - 2015 Jun TI - Building bridges across electronic health record systems through inferred phenotypic topics. PG - 82-93 LID - S1532-0464(15)00054-4 [pii] LID - 10.1016/j.jbi.2015.03.011 [doi] AB - OBJECTIVE: Data in electronic health records (EHRs) is being increasingly leveraged for secondary uses, ranging from biomedical association studies to comparative effectiveness. To perform studies at scale and transfer knowledge from one institution to another in a meaningful way, we need to harmonize the phenotypes in such systems. Traditionally, this has been accomplished through expert specification of phenotypes via standardized terminologies, such as billing codes. However, this approach may be biased by the experience and expectations of the experts, as well as the vocabulary used to describe such patients. The goal of this work is to develop a data-driven strategy to (1) infer phenotypic topics within patient populations and (2) assess the degree to which such topics facilitate a mapping across populations in disparate healthcare systems. METHODS: We adapt a generative topic modeling strategy, based on latent Dirichlet allocation, to infer phenotypic topics. We utilize a variance analysis to assess the projection of a patient population from one healthcare system onto the topics learned from another system. The consistency of learned phenotypic topics was evaluated using (1) the similarity of topics, (2) the stability of a patient population across topics, and (3) the transferability of a topic across sites. We evaluated our approaches using four months of inpatient data from two geographically distinct healthcare systems: (1) Northwestern Memorial Hospital (NMH) and (2) Vanderbilt University Medical Center (VUMC). RESULTS: The method learned 25 phenotypic topics from each healthcare system. The average cosine similarity between matched topics across the two sites was 0.39, a remarkably high value given the very high dimensionality of the feature space. The average stability of VUMC and NMH patients across the topics of two sites was 0.988 and 0.812, respectively, as measured by the Pearson correlation coefficient. Also the VUMC and NMH topics have smaller variance of characterizing patient population of two sites than standard clinical terminologies (e.g., ICD9), suggesting they may be more reliably transferred across hospital systems. CONCLUSIONS: Phenotypic topics learned from EHR data can be more stable and transferable than billing codes for characterizing the general status of a patient population. This suggests that EHR-based research may be able to leverage such phenotypic topics as variables when pooling patient populations in predictive models. CI - Copyright (c) 2015 Elsevier Inc. All rights reserved. FAU - Chen, You AU - Chen Y AD - Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA. Electronic address: you.chen@vanderbilt.edu. FAU - Ghosh, Joydeep AU - Ghosh J AD - Dept. of Electrical & Computer Engineering, University of Texas, Austin, TX, USA. FAU - Bejan, Cosmin Adrian AU - Bejan CA AD - Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA. FAU - Gunter, Carl A AU - Gunter CA AD - Dept. of Computer Science, University of Illinois at Urbana-Champagne, Champaign, IL, USA. FAU - Gupta, Siddharth AU - Gupta S AD - Dept. of Computer Science, University of Illinois at Urbana-Champagne, Champaign, IL, USA. FAU - Kho, Abel AU - Kho A AD - School of Medicine, Northwestern University, Chicago, IL, USA. FAU - Liebovitz, David AU - Liebovitz D AD - School of Medicine, Northwestern University, Chicago, IL, USA. FAU - Sun, Jimeng AU - Sun J AD - School of Computational Science & Engineering, Georgia Institute of Technology, Atlanta, GA, USA. FAU - Denny, Joshua AU - Denny J AD - Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA; Department of Medicine, Vanderbilt University, Nashville, TN, USA. FAU - Malin, Bradley AU - Malin B AD - Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA; Dept. of Electrical Engineering & Computer Science, School of Engineering, Vanderbilt University, Nashville, TN, USA. LA - eng GR - R01 LM010207/LM/NLM NIH HHS/United States GR - K99 LM011933/LM/NLM NIH HHS/United States GR - R01 GM105688/GM/NIGMS NIH HHS/United States GR - R01 LM010685/LM/NLM NIH HHS/United States GR - R00 LM011933/LM/NLM NIH HHS/United States GR - R01LM010207/LM/NLM NIH HHS/United States GR - UL1 TR000003/TR/NCATS NIH HHS/United States GR - R01LM010685/LM/NLM NIH HHS/United States PT - Journal Article PT - Research Support, N.I.H., Extramural PT - Research Support, U.S. Gov't, Non-P.H.S. DEP - 20150401 PL - United States TA - J Biomed Inform JT - Journal of biomedical informatics JID - 100970413 SB - IM MH - Electronic Health Records/classification/*organization & administration MH - Information Storage and Retrieval/*methods MH - *Machine Learning MH - Medical Record Linkage/*methods MH - Natural Language Processing MH - Phenotype MH - United States MH - *Vocabulary, Controlled PMC - PMC4464930 MID - NIHMS677489 OTO - NOTNLM OT - Clinical phenotype modeling OT - Computers and information processing OT - Data mining OT - Electronic medical records OT - Medical information systems OT - Pattern recognition EDAT- 2015/04/07 06:00 MHDA- 2016/03/05 06:00 PMCR- 2016/06/01 CRDT- 2015/04/06 06:00 PHST- 2014/12/02 00:00 [received] PHST- 2015/03/24 00:00 [revised] PHST- 2015/03/25 00:00 [accepted] PHST- 2015/04/06 06:00 [entrez] PHST- 2015/04/07 06:00 [pubmed] PHST- 2016/03/05 06:00 [medline] PHST- 2016/06/01 00:00 [pmc-release] AID - S1532-0464(15)00054-4 [pii] AID - 10.1016/j.jbi.2015.03.011 [doi] PST - ppublish SO - J Biomed Inform. 2015 Jun;55:82-93. doi: 10.1016/j.jbi.2015.03.011. Epub 2015 Apr 1.