PMID- 34814629
OWN - NLM
STAT- MEDLINE
DCOM- 20211125
LR  - 20211125
IS  - 0254-6450 (Print)
IS  - 0254-6450 (Linking)
VI  - 42
IP  - 10
DP  - 2021 Oct 10
TI  - [Simulation study on missing data imputation methods for longitudinal data in 
      cohort studies].
PG  - 1889-1894
LID - 10.3760/cma.j.cn112338-20201130-01363 [doi]
AB  - Objective: Data being missed is an unavoidable problem in cohort studies. This 
      paper compares the imputation effect of eight common missing data imputation 
      methods involved in cutting longitudinal data through simulation study to provide 
      a valuable reference for the treatment of missing data in longitudinal studies. 
      Methods: The simulation study is based on R language software and generates 
      missing longitudinal data by the Monte Carlo method. By comparing the average 
      absolute deviation, average relative deviation, and TypeⅠerror from the 
      regression analysis of different imputation methods, the imputation effect of 
      varying imputation methods on missing longitudinal data and the influence on 
      subsequent multivariate analysis are evaluated. Results: The mean imputation, k 
      nearest neighbor (KNN), regression imputation, and random forest all have a 
      similar imputation effect, which is also steady. However, the hot deck is 
      inferior to the above imputation methods. K-means clustering and expectation 
      maximization (EM) algorithm are among the worst and unstable. Mean imputation, EM 
      algorithm, random forest, KNN, and regression imputation can control TypeⅠerror. 
      Still, multiple imputations, hot deck, and K-means clustering cannot effectively 
      manage the TypeⅠerror. Conclusions: For missing data in longitudinal studies, 
      mean imputation, KNN, regression imputation, and random forest can be used as 
      better imputation methods under the mechanism of missing at random. When the 
      missing ratio is not too large, multiple imputations and hot deck can also 
      perform well, but K-means clustering and EM algorithm are not recommended.
FAU - Li, Y M
AU  - Li YM
AD  - Department of Epidemiology and Biostatistics, School of Public Health of Xi'an 
      Jiaotong University Health Science Center, Xi'an 710061, China.
FAU - Zhao, P
AU  - Zhao P
AD  - Department of Epidemiology and Biostatistics, School of Public Health of Xi'an 
      Jiaotong University Health Science Center, Xi'an 710061, China.
FAU - Yang, Y H
AU  - Yang YH
AD  - Department of Epidemiology and Biostatistics, School of Public Health of Xi'an 
      Jiaotong University Health Science Center, Xi'an 710061, China.
FAU - Wang, J X
AU  - Wang JX
AD  - Department of Epidemiology and Biostatistics, School of Public Health of Xi'an 
      Jiaotong University Health Science Center, Xi'an 710061, China.
FAU - Yan, H
AU  - Yan H
AD  - Department of Epidemiology and Biostatistics, School of Public Health of Xi'an 
      Jiaotong University Health Science Center, Xi'an 710061, China.
FAU - Chen, F Y
AU  - Chen FY
AD  - Department of Epidemiology and Biostatistics, School of Public Health of Xi'an 
      Jiaotong University Health Science Center, Xi'an 710061, China.
LA  - chi
GR  - 81703325/National Natural Science Foundation of China/
GR  - 2017YFC0907200, 2017YFC0907201/National Key Research and Development Program of 
      China/
PT  - Journal Article
PL  - China
TA  - Zhonghua Liu Xing Bing Xue Za Zhi
JT  - Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi
JID - 8208604
SB  - IM
MH  - *Algorithms
MH  - Cohort Studies
MH  - Computer Simulation
MH  - Humans
MH  - Longitudinal Studies
MH  - Regression Analysis
EDAT- 2021/11/25 06:00
MHDA- 2021/11/26 06:00
CRDT- 2021/11/24 02:19
PHST- 2021/11/24 02:19 [entrez]
PHST- 2021/11/25 06:00 [pubmed]
PHST- 2021/11/26 06:00 [medline]
AID - 10.3760/cma.j.cn112338-20201130-01363 [doi]
PST - ppublish
SO  - Zhonghua Liu Xing Bing Xue Za Zhi. 2021 Oct 10;42(10):1889-1894. doi: 
      10.3760/cma.j.cn112338-20201130-01363.