PMID- 30830398 OWN - NLM STAT- MEDLINE DCOM- 20190919 LR - 20230928 IS - 1573-3890 (Electronic) IS - 1573-3882 (Print) IS - 1573-3882 (Linking) VI - 14 IP - 10 DP - 2018 Sep 20 TI - Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies. PG - 128 LID - 10.1007/s11306-018-1420-2 [doi] LID - 128 AB - BACKGROUND: Untargeted mass spectrometry (MS)-based metabolomics data often contain missing values that reduce statistical power and can introduce bias in biomedical studies. However, a systematic assessment of the various sources of missing values and strategies to handle these data has received little attention. Missing data can occur systematically, e.g. from run day-dependent effects due to limits of detection (LOD); or it can be random as, for instance, a consequence of sample preparation. METHODS: We investigated patterns of missing data in an MS-based metabolomics experiment of serum samples from the German KORA F4 cohort (n = 1750). We then evaluated 31 imputation methods in a simulation framework and biologically validated the results by applying all imputation approaches to real metabolomics data. We examined the ability of each method to reconstruct biochemical pathways from data-driven correlation networks, and the ability of the method to increase statistical power while preserving the strength of established metabolic quantitative trait loci. RESULTS: Run day-dependent LOD-based missing data accounts for most missing values in the metabolomics dataset. Although multiple imputation by chained equations performed well in many scenarios, it is computationally and statistically challenging. K-nearest neighbors (KNN) imputation on observations with variable pre-selection showed robust performance across all evaluation schemes and is computationally more tractable. CONCLUSION: Missing data in untargeted MS-based metabolomics data occur for various reasons. Based on our results, we recommend that KNN-based imputation is performed on observations with variable pre-selection since it showed robust results in all evaluation schemes. FAU - Do, Kieu Trinh AU - Do KT AD - Institute of Computational Biology, Helmholtz-Zentrum Munchen, Neuherberg, Germany. FAU - Wahl, Simone AU - Wahl S AD - Institute of Epidemiology II, German Research Center for Environmental Health, Helmholtz Zentrum Munchen, Neuherberg, Germany. AD - Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, Helmholtz Zentrum Munchen, Neuherberg, Germany. AD - German Center for Diabetes Research (DZD e.V.), Neuherberg, Germany. FAU - Raffler, Johannes AU - Raffler J AD - Institute of Bioinformatics and Systems Biology, Helmholtz-Zentrum Munchen, Neuherberg, Germany. FAU - Molnos, Sophie AU - Molnos S AD - Institute of Epidemiology II, German Research Center for Environmental Health, Helmholtz Zentrum Munchen, Neuherberg, Germany. AD - Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, Helmholtz Zentrum Munchen, Neuherberg, Germany. AD - German Center for Diabetes Research (DZD e.V.), Neuherberg, Germany. FAU - Laimighofer, Michael AU - Laimighofer M AD - Institute of Computational Biology, Helmholtz-Zentrum Munchen, Neuherberg, Germany. FAU - Adamski, Jerzy AU - Adamski J AD - Institute of Experimental Genetics, Genome Analysis Center, Helmholtz Zentrum Munchen, Neuherberg, Germany. AD - Lehrstuhl fur Experimentelle Genetik, Technische Universitat Munchen, Freising, Germany. AD - German Center for Cardiovascular Disease Research (DZHK e.V.), Munich, Germany. FAU - Suhre, Karsten AU - Suhre K AD - Department of Physiology and Biophysics, Weill Cornell Medical College in Qatar, Education City, Doha, Qatar. FAU - Strauch, Konstantin AU - Strauch K AD - Institute of Genetic Epidemiology, Helmholtz Zentrum Munchen-German Research Center for Environmental Health, Neuherberg, Germany. AD - Chair of Genetic Epidemiology, Institute of Medical Informatics, Biometry and Epidemiology, Ludwig-Maximilians-University, Munich, Germany. FAU - Peters, Annette AU - Peters A AD - Institute of Epidemiology II, German Research Center for Environmental Health, Helmholtz Zentrum Munchen, Neuherberg, Germany. AD - Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, Helmholtz Zentrum Munchen, Neuherberg, Germany. FAU - Gieger, Christian AU - Gieger C AD - Institute of Epidemiology II, German Research Center for Environmental Health, Helmholtz Zentrum Munchen, Neuherberg, Germany. AD - Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, Helmholtz Zentrum Munchen, Neuherberg, Germany. FAU - Langenberg, Claudia AU - Langenberg C AD - MRC Epidemiology Unit, University of Cambridge, Cambridge, UK. FAU - Stewart, Isobel D AU - Stewart ID AD - MRC Epidemiology Unit, University of Cambridge, Cambridge, UK. FAU - Theis, Fabian J AU - Theis FJ AD - Institute of Computational Biology, Helmholtz-Zentrum Munchen, Neuherberg, Germany. AD - Department of Mathematics, Technische Universitat Munchen, Garching, Germany. FAU - Grallert, Harald AU - Grallert H AD - Institute of Epidemiology II, German Research Center for Environmental Health, Helmholtz Zentrum Munchen, Neuherberg, Germany. AD - Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, Helmholtz Zentrum Munchen, Neuherberg, Germany. AD - German Center for Diabetes Research (DZD e.V.), Neuherberg, Germany. FAU - Kastenmuller, Gabi AU - Kastenmuller G AD - German Center for Diabetes Research (DZD e.V.), Neuherberg, Germany. g.kastenmueller@helmholtz-muenchen.de. AD - Institute of Bioinformatics and Systems Biology, Helmholtz-Zentrum Munchen, Neuherberg, Germany. g.kastenmueller@helmholtz-muenchen.de. FAU - Krumsiek, Jan AU - Krumsiek J AD - Institute of Computational Biology, Helmholtz-Zentrum Munchen, Neuherberg, Germany. jan.krumsiek@helmholtz-muenchen.de. AD - German Center for Diabetes Research (DZD e.V.), Neuherberg, Germany. jan.krumsiek@helmholtz-muenchen.de. AD - Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, USA. jan.krumsiek@helmholtz-muenchen.de. LA - eng GR - 01ZX1313C/Bundesministerium fur Bildung und Forschung/International GR - 03IS2061B/Bundesministerium fur Bildung und Forschung/International GR - 305280/European Union's Seventh Framework Programme [FP7-Health-F5-2012]/International GR - LatentCauses/European Research Council/International GR - Biomedical Research Program funds/Weill Cornell Medical College Qatar/International GR - MC_PC_13048/Medical Research Council/United Kingdom GR - MC_UU_12015/1/Medical Research Council/United Kingdom PT - Journal Article PT - Research Support, Non-U.S. Gov't DEP - 20180920 PL - United States TA - Metabolomics JT - Metabolomics : Official journal of the Metabolomic Society JID - 101274889 SB - IM MH - Chromatography, Liquid MH - Cohort Studies MH - Germany MH - *Mass Spectrometry MH - Metabolomics/*methods PMC - PMC6153696 OTO - NOTNLM OT - Batch effects OT - K-nearest neighbor OT - Limit of detection OT - MICE OT - Mass spectrometry OT - Missing values imputation OT - Untargeted metabolomics COIS- CONFLICT OF INTEREST: The authors declare that they have no conflict of interest. ETHICAL APPROVAL: The KORA study has been approved by the Bayerische Landesarztekammer and the EPIC-Norfolk study was approved by the Norfolk Local Research Ethics Committee. In all three cohorts written informed consent was obtained from all participants. EDAT- 2019/03/05 06:00 MHDA- 2019/09/20 06:00 PMCR- 2018/09/20 CRDT- 2019/03/05 06:00 PHST- 2018/04/11 00:00 [received] PHST- 2018/08/24 00:00 [accepted] PHST- 2019/03/05 06:00 [entrez] PHST- 2019/03/05 06:00 [pubmed] PHST- 2019/09/20 06:00 [medline] PHST- 2018/09/20 00:00 [pmc-release] AID - 10.1007/s11306-018-1420-2 [pii] AID - 1420 [pii] AID - 10.1007/s11306-018-1420-2 [doi] PST - epublish SO - Metabolomics. 2018 Sep 20;14(10):128. doi: 10.1007/s11306-018-1420-2.