PMID- 36406474 OWN - NLM STAT- PubMed-not-MEDLINE LR - 20230228 IS - 2624-8212 (Electronic) IS - 2624-8212 (Linking) VI - 5 DP - 2022 TI - An autoencoder-based deep learning method for genotype imputation. PG - 1028978 LID - 10.3389/frai.2022.1028978 [doi] LID - 1028978 AB - Genotype imputation has a wide range of applications in genome-wide association study (GWAS), including increasing the statistical power of association tests, discovering trait-associated loci in meta-analyses, and prioritizing causal variants with fine-mapping. In recent years, deep learning (DL) based methods, such as sparse convolutional denoising autoencoder (SCDA), have been developed for genotype imputation. However, it remains a challenging task to optimize the learning process in DL-based methods to achieve high imputation accuracy. To address this challenge, we have developed a convolutional autoencoder (AE) model for genotype imputation and implemented a customized training loop by modifying the training process with a single batch loss rather than the average loss over batches. This modified AE imputation model was evaluated using a yeast dataset, the human leukocyte antigen (HLA) data from the 1,000 Genomes Project (1KGP), and our in-house genotype data from the Louisiana Osteoporosis Study (LOS). Our modified AE imputation model has achieved comparable or better performance than the existing SCDA model in terms of evaluation metrics such as the concordance rate (CR), the Hellinger score, the scaled Euclidean norm (SEN) score, and the imputation quality score (IQS) in all three datasets. Taking the imputation results from the HLA data as an example, the AE model achieved an average CR of 0.9468 and 0.9459, Hellinger score of 0.9765 and 0.9518, SEN score of 0.9977 and 0.9953, and IQS of 0.9515 and 0.9044 at missing ratios of 10% and 20%, respectively. As for the results of LOS data, it achieved an average CR of 0.9005, Hellinger score of 0.9384, SEN score of 0.9940, and IQS of 0.8681 at the missing ratio of 20%. In summary, our proposed method for genotype imputation has a great potential to increase the statistical power of GWAS and improve downstream post-GWAS analyses. CI - Copyright (c) 2022 Song, Greenbaum, Luttrell, Zhou, Wu, Luo, Qiu, Zhao, Su, Tian, Shen, Hong, Gong, Shi, Deng and Zhang. FAU - Song, Meng AU - Song M AD - School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States. FAU - Greenbaum, Jonathan AU - Greenbaum J AD - Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States. FAU - Luttrell, Joseph 4th AU - Luttrell J 4th AD - School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States. FAU - Zhou, Weihua AU - Zhou W AD - College of Computing, Michigan Technological University, Houghton, MI, United States. FAU - Wu, Chong AU - Wu C AD - Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States. FAU - Luo, Zhe AU - Luo Z AD - Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States. FAU - Qiu, Chuan AU - Qiu C AD - Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States. FAU - Zhao, Lan Juan AU - Zhao LJ AD - Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States. FAU - Su, Kuan-Jui AU - Su KJ AD - Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States. FAU - Tian, Qing AU - Tian Q AD - Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States. FAU - Shen, Hui AU - Shen H AD - Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States. FAU - Hong, Huixiao AU - Hong H AD - Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States. FAU - Gong, Ping AU - Gong P AD - Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, United States. FAU - Shi, Xinghua AU - Shi X AD - Department of Computer & Information Sciences, Temple University, Philadelphia, PA, United States. FAU - Deng, Hong-Wen AU - Deng HW AD - Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States. FAU - Zhang, Chaoyang AU - Zhang C AD - School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States. LA - eng SI - figshare/10.6084/m9.figshare.21441078 GR - P20 GM109036/GM/NIGMS NIH HHS/United States GR - R01 AG061917/AG/NIA NIH HHS/United States GR - R01 AR069055/AR/NIAMS NIH HHS/United States GR - U19 AG055373/AG/NIA NIH HHS/United States PT - Journal Article DEP - 20221103 PL - Switzerland TA - Front Artif Intell JT - Frontiers in artificial intelligence JID - 101770551 PMC - PMC9671213 OTO - NOTNLM OT - GWAS OT - autoencoder OT - deep learning OT - genotype imputation OT - paired sample t-test COIS- The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. EDAT- 2022/11/22 06:00 MHDA- 2022/11/22 06:01 PMCR- 2022/11/03 CRDT- 2022/11/21 04:18 PHST- 2022/08/26 00:00 [received] PHST- 2022/09/29 00:00 [accepted] PHST- 2022/11/21 04:18 [entrez] PHST- 2022/11/22 06:00 [pubmed] PHST- 2022/11/22 06:01 [medline] PHST- 2022/11/03 00:00 [pmc-release] AID - 10.3389/frai.2022.1028978 [doi] PST - epublish SO - Front Artif Intell. 2022 Nov 3;5:1028978. doi: 10.3389/frai.2022.1028978. eCollection 2022.