PMID- 33552026 OWN - NLM STAT- PubMed-not-MEDLINE LR - 20210210 IS - 1664-302X (Print) IS - 1664-302X (Electronic) IS - 1664-302X (Linking) VI - 11 DP - 2020 TI - Tracking Major Sources of Water Contamination Using Machine Learning. PG - 616692 LID - 10.3389/fmicb.2020.616692 [doi] LID - 616692 AB - Current microbial source tracking techniques that rely on grab samples analyzed by individual endpoint assays are inadequate to explain microbial sources across space and time. Modeling and predicting host sources of microbial contamination could add a useful tool for watershed management. In this study, we tested and evaluated machine learning models to predict the major sources of microbial contamination in a watershed. We examined the relationship between microbial sources, land cover, weather, and hydrologic variables in a watershed in Northern California, United States. Six models, including K-nearest neighbors (KNN), Naive Bayes, Support vector machine (SVM), simple neural network (NN), Random Forest, and XGBoost, were built to predict major microbial sources using land cover, weather and hydrologic variables. The results showed that these models successfully predicted microbial sources classified into two categories (human and non-human), with the average accuracy ranging from 69% (Naive Bayes) to 88% (XGBoost). The area under curve (AUC) of the receiver operating characteristic (ROC) illustrated XGBoost had the best performance (average AUC = 0.88), followed by Random Forest (average AUC = 0.84), and KNN (average AUC = 0.74). The importance index obtained from Random Forest indicated that precipitation and temperature were the two most important factors to predict the dominant microbial source. These results suggest that machine learning models, particularly XGBoost, can predict the dominant sources of microbial contamination based on the relationship of microbial contaminants with daily weather and land cover, providing a powerful tool to understand microbial sources in water. CI - Copyright (c) 2021 Wu, Song, Dubinsky and Stewart. FAU - Wu, Jianyong AU - Wu J AD - Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, Chapel Hill, NC, United States. FAU - Song, Conghe AU - Song C AD - Department of Geography, University of North Carolina, Chapel Hill, Chapel Hill, NC, United States. FAU - Dubinsky, Eric A AU - Dubinsky EA AD - Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, United States. FAU - Stewart, Jill R AU - Stewart JR AD - Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, Chapel Hill, NC, United States. LA - eng PT - Journal Article DEP - 20210120 PL - Switzerland TA - Front Microbiol JT - Frontiers in microbiology JID - 101548977 PMC - PMC7854693 OTO - NOTNLM OT - XGBoost OT - fecal contamination OT - land use OT - machine learning OT - microbial source tracking OT - rainfall COIS- The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. EDAT- 2021/02/09 06:00 MHDA- 2021/02/09 06:01 PMCR- 2021/01/20 CRDT- 2021/02/08 05:36 PHST- 2020/10/13 00:00 [received] PHST- 2020/12/29 00:00 [accepted] PHST- 2021/02/08 05:36 [entrez] PHST- 2021/02/09 06:00 [pubmed] PHST- 2021/02/09 06:01 [medline] PHST- 2021/01/20 00:00 [pmc-release] AID - 10.3389/fmicb.2020.616692 [doi] PST - epublish SO - Front Microbiol. 2021 Jan 20;11:616692. doi: 10.3389/fmicb.2020.616692. eCollection 2020.