PMID- 29784893 OWN - NLM STAT- MEDLINE DCOM- 20181023 LR - 20181023 IS - 2228-7809 (Electronic) IS - 2228-7795 (Linking) VI - 18 IP - 2 DP - 2018 Apr 24 TI - A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes. PG - e00412 AB - BACKGROUND: We aimed to identify the associated risk factors of type 2 diabetes mellitus (T2DM) using data mining approach, decision tree and random forest techniques using the Mashhad Stroke and Heart Atherosclerotic Disorders (MASHAD) Study program. STUDY DESIGN: A cross-sectional study. METHODS: The MASHAD study started in 2010 and will continue until 2020. Two data mining tools, namely decision trees, and random forests, are used for predicting T2DM when some other characteristics are observed on 9528 subjects recruited from MASHAD database. This paper makes a comparison between these two models in terms of accuracy, sensitivity, specificity and the area under ROC curve. RESULTS: The prevalence rate of T2DM was 14% among these subjects. The decision tree model has 64.9% accuracy, 64.5% sensitivity, 66.8% specificity, and area under the ROC curve measuring 68.6%, while the random forest model has 71.1% accuracy, 71.3% sensitivity, 69.9% specificity, and area under the ROC curve measuring 77.3% respectively. CONCLUSIONS: The random forest model, when used with demographic, clinical, and anthropometric and biochemical measurements, can provide a simple tool to identify associated risk factors for type 2 diabetes. Such identification can substantially use for managing the health policy to reduce the number of subjects with T2DM . FAU - Esmaily, Habibollah AU - Esmaily H AD - Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran. FAU - Tayefi, Maryam AU - Tayefi M AD - Clinical Research Unit, Mashhad university of Medical Sciences, Mashhad, Iran. FAU - Doosti, Hassan AU - Doosti H AD - Department of Statistics, Macquarie University, Sydney, NSW, Australia. FAU - Ghayour-Mobarhan, Majid AU - Ghayour-Mobarhan M AD - Biochemistry of Nutrition Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran. FAU - Nezami, Hossein AU - Nezami H AD - Department of Basic Sciences, Faculty of Medicine, Gonabad University of Medical Sciences, Gonabad, Iran. FAU - Amirabadizadeh, Alireza AU - Amirabadizadeh A AD - Medical Toxicology and Drug Abuse Research Center (MTDRC), Birjand University of Medical Sciences, Moallem Avenue, Birjand, Iran. amirabadiza921@gmail.com. LA - eng PT - Comparative Study PT - Journal Article DEP - 20180424 PL - Iran TA - J Res Health Sci JT - Journal of research in health sciences JID - 101480094 SB - IM MH - Aged MH - Cross-Sectional Studies MH - Data Mining/*methods MH - Decision Trees MH - Diabetes Mellitus, Type 2/*etiology MH - Female MH - Humans MH - Iran MH - Male MH - Mass Screening/methods MH - Middle Aged MH - ROC Curve MH - Risk Factors MH - Sensitivity and Specificity OTO - NOTNLM OT - Decision tree OT - Diabetes mellitus OT - Iran OT - Random forest OT - data mining EDAT- 2018/05/23 06:00 MHDA- 2018/10/24 06:00 CRDT- 2018/05/23 06:00 PHST- 2017/12/24 00:00 [received] PHST- 2018/04/17 00:00 [accepted] PHST- 2018/04/16 00:00 [revised] PHST- 2018/05/23 06:00 [entrez] PHST- 2018/05/23 06:00 [pubmed] PHST- 2018/10/24 06:00 [medline] AID - 3777 [pii] PST - epublish SO - J Res Health Sci. 2018 Apr 24;18(2):e00412.