PMID- 35927974 OWN - NLM STAT- MEDLINE DCOM- 20221012 LR - 20230805 IS - 1527-974X (Electronic) IS - 1067-5027 (Print) IS - 1067-5027 (Linking) VI - 29 IP - 11 DP - 2022 Oct 7 TI - Keeping synthetic patients on track: feedback mechanisms to mitigate performance drift in longitudinal health data simulation. PG - 1890-1898 LID - 10.1093/jamia/ocac131 [doi] AB - OBJECTIVE: Synthetic data are increasingly relied upon to share electronic health record (EHR) data while maintaining patient privacy. Current simulation methods can generate longitudinal data, but the results are unreliable for several reasons. First, the synthetic data drifts from the real data distribution over time. Second, the typical approach to quality assessment, which is based on the extent to which real records can be distinguished from synthetic records using a critic model, often fails to recognize poor simulation results. In this article, we introduce a longitudinal simulation framework, called LS-EHR, which addresses these issues. MATERIALS AND METHODS: LS-EHR enhances simulation through conditional fuzzing and regularization, rejection sampling, and prior knowledge embedding. We compare LS-EHR to the state-of-the-art using data from 60 000 EHRs from Vanderbilt University Medical Center (VUMC) and the All of Us Research Program. We assess discrimination between real and synthetic data over time. We evaluate the generation process and critic model using the area under the receiver operating characteristic curve (AUROC). For the critic, a higher value indicates a more robust model for quality assessment. For the generation process, a lower value indicates better synthetic data quality. RESULTS: The LS-EHR critic improves discrimination AUROC from 0.655 to 0.909 and 0.692 to 0.918 for VUMC and All of Us data, respectively. By using the new critic, the LS-EHR generation model reduces the AUROC from 0.909 to 0.758 and 0.918 to 0.806. CONCLUSION: LS-EHR can substantially improve the usability of simulated longitudinal EHR data. CI - (c) The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com. FAU - Zhang, Ziqi AU - Zhang Z AD - Department of Computer Science, Vanderbilt University, Nashville, Tennessee, USA. FAU - Yan, Chao AU - Yan C AUID- ORCID: 0000-0002-6719-1388 AD - Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA. FAU - Malin, Bradley A AU - Malin BA AD - Department of Computer Science, Vanderbilt University, Nashville, Tennessee, USA. AD - Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA. AD - Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA. LA - eng GR - UL1 TR002243/TR/NCATS NIH HHS/United States GR - U2COD023196/GF/NIH HHS/United States PT - Journal Article PT - Research Support, N.I.H., Extramural PL - England TA - J Am Med Inform Assoc JT - Journal of the American Medical Informatics Association : JAMIA JID - 9430800 SB - IM MH - Computer Simulation MH - Electronic Health Records MH - Feedback MH - Humans MH - *Population Health PMC - PMC9552284 OTO - NOTNLM OT - electronic health records (EHRs) OT - generative adversarial networks (GANs) OT - longitudinal simulation OT - privacy OT - synthetic data EDAT- 2022/08/06 06:00 MHDA- 2022/10/13 06:00 PMCR- 2023/08/04 CRDT- 2022/08/05 01:53 PHST- 2022/05/13 00:00 [received] PHST- 2022/06/25 00:00 [revised] PHST- 2022/07/22 00:00 [accepted] PHST- 2022/08/06 06:00 [pubmed] PHST- 2022/10/13 06:00 [medline] PHST- 2022/08/05 01:53 [entrez] PHST- 2023/08/04 00:00 [pmc-release] AID - 6655786 [pii] AID - ocac131 [pii] AID - 10.1093/jamia/ocac131 [doi] PST - ppublish SO - J Am Med Inform Assoc. 2022 Oct 7;29(11):1890-1898. doi: 10.1093/jamia/ocac131.