PMID- 30474078 OWN - NLM STAT- PubMed-not-MEDLINE LR - 20220330 IS - 2574-2531 (Electronic) IS - 2574-2531 (Linking) VI - 1 IP - 2 DP - 2018 Oct TI - Learning relevance models for patient cohort retrieval. PG - 265-275 LID - 10.1093/jamiaopen/ooy010 [doi] AB - OBJECTIVE: We explored how judgements provided by physicians can be used to learn relevance models that enhance the quality of patient cohorts retrieved from Electronic Health Records (EHRs) collections. METHODS: A very large number of features were extracted from patient cohort descriptions as well as EHR collections. The features were used to investigate retrieving (1) neurology-specific patient cohorts from the de-identified Temple University Hospital electroencephalography (EEG) Corpus as well as (2) the more general cohorts evaluated in the TREC Medical Records Track (TRECMed) from the de-identified hospital records provided by the University of Pittsburgh Medical Center. The features informed a learning relevance model (LRM) that took advantage of relevance judgements provided by physicians. The LRM implements a pairwise learning-to-rank framework, which enables our learning patient cohort retrieval (L-PCR) system to learn from physicians' feedback. RESULTS AND DISCUSSION: We evaluated the L-PCR system against state-of-the-art traditional patient cohort retrieval systems, and observed a 27% improvement when operating on EEGs and a 53% improvement when operating on TRECMed EHRs, showing the promise of the L-PCR system. We also performed extensive feature analyses to reveal the most effective strategies for representing cohort descriptions as queries, encoding EHRs, and measuring cohort relevance. CONCLUSION: The L-PCR system has significant promise for reliably retrieving patient cohorts from EHRs in multiple settings when trained with relevance judgments. When provided with additional cohort descriptions, the L-PCR system will continue to learn, thus offering a potential solution to the performance barriers of current cohort retrieval systems. FAU - Goodwin, Travis R AU - Goodwin TR AD - Department of Computer Science, Human Language Technology Research Institute, University of Texas at Dallas, Richardson, Texas, USA. FAU - Harabagiu, Sanda M AU - Harabagiu SM AD - Department of Computer Science, Human Language Technology Research Institute, University of Texas at Dallas, Richardson, Texas, USA. LA - eng SI - Dryad/10.5061/dryad.pq0cs6h GR - U01 HG008468/HG/NHGRI NIH HHS/United States PT - Journal Article DEP - 20180928 PL - United States TA - JAMIA Open JT - JAMIA open JID - 101730643 PMC - PMC6241510 OTO - NOTNLM OT - information storage and retrieval OT - machine learning OT - medical informatics OT - search engine EDAT- 2018/11/27 06:00 MHDA- 2018/11/27 06:01 PMCR- 2018/09/28 CRDT- 2018/11/27 06:00 PHST- 2017/12/30 00:00 [received] PHST- 2018/02/26 00:00 [revised] PHST- 2018/09/05 00:00 [accepted] PHST- 2018/11/27 06:00 [entrez] PHST- 2018/11/27 06:00 [pubmed] PHST- 2018/11/27 06:01 [medline] PHST- 2018/09/28 00:00 [pmc-release] AID - ooy010 [pii] AID - 10.1093/jamiaopen/ooy010 [doi] PST - ppublish SO - JAMIA Open. 2018 Oct;1(2):265-275. doi: 10.1093/jamiaopen/ooy010. Epub 2018 Sep 28.