PMID- 33322123 OWN - NLM STAT- MEDLINE DCOM- 20210208 LR - 20210208 IS - 1660-4601 (Electronic) IS - 1661-7827 (Print) IS - 1660-4601 (Linking) VI - 17 IP - 24 DP - 2020 Dec 13 TI - Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees. LID - 10.3390/ijerph17249322 [doi] LID - 9322 AB - Substances that do not degrade over time have proven to be harmful to the environment and are dangerous to living organisms. Being able to predict the biodegradability of substances without costly experiments is useful. Recently, the quantitative structure-activity relationship (QSAR) models have proposed effective solutions to this problem. However, the molecular descriptor datasets usually suffer from the problems of unbalanced class distribution, which adversely affects the efficiency and generalization of the derived models. Accordingly, this study aims at validating the performances of balanced random trees (RTs) and boosted C5.0 decision trees (DTs) to construct QSAR models to classify the ready biodegradation of substances and their abilities to deal with unbalanced data. The balanced RTs model algorithm builds individual trees using balanced bootstrap samples, while the boosted C5.0 DT is modeled using cost-sensitive learning. We employed the two-dimensional molecular descriptor dataset, which is publicly available through the University of California, Irvine (UCI) machine learning repository. The molecular descriptors were ranked according to their contributions to the balanced RTs classification process. The performance of the proposed models was compared with previously reported results. Based on the statistical measures, the experimental results showed that the proposed models outperform the classification results of the support vector machine (SVM), K-nearest neighbors (KNN), and discrimination analysis (DA). Classification measures were analyzed in terms of accuracy, sensitivity, specificity, precision, false positive rate, false negative rate, F1 score, receiver operating characteristic (ROC) curve, and area under the ROC curve (AUROC). FAU - Elsayad, Alaa M AU - Elsayad AM AUID- ORCID: 0000-0001-8053-9759 AD - Department of Electrical Engineering, College of Engineering, Prince Sattam Bin Abdulaziz University, P.O. Box 54, Wadi Aldawaser 11991, Saudi Arabia. AD - Computers and Systems Department, Electronics Research Institute, Giza 12622, Egypt. FAU - Nassef, Ahmed M AU - Nassef AM AUID- ORCID: 0000-0001-9604-5737 AD - Department of Electrical Engineering, College of Engineering, Prince Sattam Bin Abdulaziz University, P.O. Box 54, Wadi Aldawaser 11991, Saudi Arabia. AD - Department of Computers and Automatic Control Engineering, Faculty of Engineering, Tanta University, Tanta 31733, Egypt. FAU - Al-Dhaifallah, Mujahed AU - Al-Dhaifallah M AUID- ORCID: 0000-0002-8441-2146 AD - Systems Engineering Department, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia. FAU - Elsayad, Khaled A AU - Elsayad KA AD - Pharmacy Department, Cairo University Hospitals, Cairo University, Cairo 11662, Egypt. LA - eng PT - Journal Article PT - Research Support, Non-U.S. Gov't DEP - 20201213 PL - Switzerland TA - Int J Environ Res Public Health JT - International journal of environmental research and public health JID - 101238455 SB - IM MH - Algorithms MH - *Decision Trees MH - Discriminant Analysis MH - Humans MH - *Machine Learning MH - ROC Curve MH - *Support Vector Machine PMC - PMC7763457 OTO - NOTNLM OT - C5.0 decision tree OT - K-nearest neighbors OT - QSAR OT - biodegradable substances OT - discrimination analysis OT - machine learning OT - random trees OT - support vector machine COIS- The authors declare no conflict of interest. EDAT- 2020/12/17 06:00 MHDA- 2021/02/09 06:00 PMCR- 2020/12/01 CRDT- 2020/12/16 01:02 PHST- 2020/10/18 00:00 [received] PHST- 2020/11/28 00:00 [revised] PHST- 2020/12/11 00:00 [accepted] PHST- 2020/12/16 01:02 [entrez] PHST- 2020/12/17 06:00 [pubmed] PHST- 2021/02/09 06:00 [medline] PHST- 2020/12/01 00:00 [pmc-release] AID - ijerph17249322 [pii] AID - ijerph-17-09322 [pii] AID - 10.3390/ijerph17249322 [doi] PST - epublish SO - Int J Environ Res Public Health. 2020 Dec 13;17(24):9322. doi: 10.3390/ijerph17249322.