PMID- 16054254 OWN - NLM STAT- MEDLINE DCOM- 20060518 LR - 20060320 IS - 0167-7012 (Print) IS - 0167-7012 (Linking) VI - 65 IP - 1 DP - 2006 Apr TI - An eco-informatics tool for microbial community studies: supervised classification of Amplicon Length Heterogeneity (ALH) profiles of 16S rRNA. PG - 49-62 AB - Support vector machines (SVM) and K-nearest neighbors (KNN) are two computational machine learning tools that perform supervised classification. This paper presents a novel application of such supervised analytical tools for microbial community profiling and to distinguish patterning among ecosystems. Amplicon length heterogeneity (ALH) profiles from several hypervariable regions of 16S rRNA gene of eubacterial communities from Idaho agricultural soil samples and from Chesapeake Bay marsh sediments were separately analyzed. The profiles from all available hypervariable regions were concatenated to obtain a combined profile, which was then provided to the SVM and KNN classifiers. Each profile was labeled with information about the location or time of its sampling. We hypothesized that after a learning phase using feature vectors from labeled ALH profiles, both these classifiers would have the capacity to predict the labels of previously unseen samples. The resulting classifiers were able to predict the labels of the Idaho soil samples with high accuracy. The classifiers were less accurate for the classification of the Chesapeake Bay sediments suggesting greater similarity within the Bay's microbial community patterns in the sampled sites. The profiles obtained from the V1+V2 region were more informative than that obtained from any other single region. However, combining them with profiles from the V1 region (with or without the profiles from the V3 region) resulted in the most accurate classification of the samples. The addition of profiles from the V 9 region appeared to confound the classifiers. Our results show that SVM and KNN classifiers can be effectively applied to distinguish between eubacterial community patterns from different ecosystems based only on their ALH profiles. FAU - Yang, Chengyong AU - Yang C AD - Bioinformatics Research Group (BioRG), School of Computer Science, Florida International University, Miami, Florida, 33199, USA. FAU - Mills, DeEtta AU - Mills D FAU - Mathee, Kalai AU - Mathee K FAU - Wang, Yong AU - Wang Y FAU - Jayachandran, Krish AU - Jayachandran K FAU - Sikaroodi, Masoumeh AU - Sikaroodi M FAU - Gillevet, Patrick AU - Gillevet P FAU - Entry, Jim AU - Entry J FAU - Narasimhan, Giri AU - Narasimhan G LA - eng PT - Journal Article DEP - 20050727 PL - Netherlands TA - J Microbiol Methods JT - Journal of microbiological methods JID - 8306883 RN - 0 (DNA, Bacterial) RN - 0 (RNA, Ribosomal, 16S) SB - IM MH - Artificial Intelligence MH - DNA, Bacterial/chemistry/*genetics MH - *Ecosystem MH - Geologic Sediments/*microbiology MH - Informatics/*methods MH - Pattern Recognition, Automated/methods MH - Polymerase Chain Reaction MH - RNA, Ribosomal, 16S/chemistry/*genetics MH - *Soil Microbiology EDAT- 2005/08/02 09:00 MHDA- 2006/05/19 09:00 CRDT- 2005/08/02 09:00 PHST- 2005/01/18 00:00 [received] PHST- 2005/04/22 00:00 [revised] PHST- 2005/06/24 00:00 [accepted] PHST- 2005/08/02 09:00 [pubmed] PHST- 2006/05/19 09:00 [medline] PHST- 2005/08/02 09:00 [entrez] AID - S0167-7012(05)00179-X [pii] AID - 10.1016/j.mimet.2005.06.012 [doi] PST - ppublish SO - J Microbiol Methods. 2006 Apr;65(1):49-62. doi: 10.1016/j.mimet.2005.06.012. Epub 2005 Jul 27.