PMID- 34814629 OWN - NLM STAT- MEDLINE DCOM- 20211125 LR - 20211125 IS - 0254-6450 (Print) IS - 0254-6450 (Linking) VI - 42 IP - 10 DP - 2021 Oct 10 TI - [Simulation study on missing data imputation methods for longitudinal data in cohort studies]. PG - 1889-1894 LID - 10.3760/cma.j.cn112338-20201130-01363 [doi] AB - Objective: Data being missed is an unavoidable problem in cohort studies. This paper compares the imputation effect of eight common missing data imputation methods involved in cutting longitudinal data through simulation study to provide a valuable reference for the treatment of missing data in longitudinal studies. Methods: The simulation study is based on R language software and generates missing longitudinal data by the Monte Carlo method. By comparing the average absolute deviation, average relative deviation, and TypeⅠerror from the regression analysis of different imputation methods, the imputation effect of varying imputation methods on missing longitudinal data and the influence on subsequent multivariate analysis are evaluated. Results: The mean imputation, k nearest neighbor (KNN), regression imputation, and random forest all have a similar imputation effect, which is also steady. However, the hot deck is inferior to the above imputation methods. K-means clustering and expectation maximization (EM) algorithm are among the worst and unstable. Mean imputation, EM algorithm, random forest, KNN, and regression imputation can control TypeⅠerror. Still, multiple imputations, hot deck, and K-means clustering cannot effectively manage the TypeⅠerror. Conclusions: For missing data in longitudinal studies, mean imputation, KNN, regression imputation, and random forest can be used as better imputation methods under the mechanism of missing at random. When the missing ratio is not too large, multiple imputations and hot deck can also perform well, but K-means clustering and EM algorithm are not recommended. FAU - Li, Y M AU - Li YM AD - Department of Epidemiology and Biostatistics, School of Public Health of Xi'an Jiaotong University Health Science Center, Xi'an 710061, China. FAU - Zhao, P AU - Zhao P AD - Department of Epidemiology and Biostatistics, School of Public Health of Xi'an Jiaotong University Health Science Center, Xi'an 710061, China. FAU - Yang, Y H AU - Yang YH AD - Department of Epidemiology and Biostatistics, School of Public Health of Xi'an Jiaotong University Health Science Center, Xi'an 710061, China. FAU - Wang, J X AU - Wang JX AD - Department of Epidemiology and Biostatistics, School of Public Health of Xi'an Jiaotong University Health Science Center, Xi'an 710061, China. FAU - Yan, H AU - Yan H AD - Department of Epidemiology and Biostatistics, School of Public Health of Xi'an Jiaotong University Health Science Center, Xi'an 710061, China. FAU - Chen, F Y AU - Chen FY AD - Department of Epidemiology and Biostatistics, School of Public Health of Xi'an Jiaotong University Health Science Center, Xi'an 710061, China. LA - chi GR - 81703325/National Natural Science Foundation of China/ GR - 2017YFC0907200, 2017YFC0907201/National Key Research and Development Program of China/ PT - Journal Article PL - China TA - Zhonghua Liu Xing Bing Xue Za Zhi JT - Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi JID - 8208604 SB - IM MH - *Algorithms MH - Cohort Studies MH - Computer Simulation MH - Humans MH - Longitudinal Studies MH - Regression Analysis EDAT- 2021/11/25 06:00 MHDA- 2021/11/26 06:00 CRDT- 2021/11/24 02:19 PHST- 2021/11/24 02:19 [entrez] PHST- 2021/11/25 06:00 [pubmed] PHST- 2021/11/26 06:00 [medline] AID - 10.3760/cma.j.cn112338-20201130-01363 [doi] PST - ppublish SO - Zhonghua Liu Xing Bing Xue Za Zhi. 2021 Oct 10;42(10):1889-1894. doi: 10.3760/cma.j.cn112338-20201130-01363.