PMID- 36007229 OWN - NLM STAT- MEDLINE DCOM- 20220926 LR - 20221007 IS - 1477-4054 (Electronic) IS - 1467-5463 (Linking) VI - 23 IP - 5 DP - 2022 Sep 20 TI - Principal microbial groups: compositional alternative to phylogenetic grouping of microbiome data. LID - bbac328 [pii] LID - 10.1093/bib/bbac328 [doi] AB - Statistical and machine learning techniques based on relative abundances have been used to predict health conditions and to identify microbial biomarkers. However, high dimensionality, sparsity and the compositional nature of microbiome data represent statistical challenges. On the other hand, the taxon grouping allows summarizing microbiome abundance with a coarser resolution in a lower dimension, but it presents new challenges when correlating taxa with a disease. In this work, we present a novel approach that groups Operational Taxonomical Units (OTUs) based only on relative abundances as an alternative to taxon grouping. The proposed procedure acknowledges the compositional data making use of principal balances. The identified groups are called Principal Microbial Groups (PMGs). The procedure reduces the need for user-defined aggregation of $\textrmOTU$s and offers the possibility of working with coarse group of $\textrmOTU$s, which are not present in a phylogenetic tree. PMGs can be used for two different goals: (1) as a dimensionality reduction method for compositional data, (2) as an aggregation procedure that provides an alternative to taxon grouping for construction of microbial balances afterward used for disease prediction. We illustrate the procedure with a cirrhosis study data. PMGs provide a coherent data analysis for the search of biomarkers in human microbiota. The source code and demo data for PMGs are available at: https://github.com/asliboyraz/PMGs. CI - (c) The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com. FAU - Boyraz, Asli AU - Boyraz A AD - Department of Computer Programming, Recep Tayyip Erdogan University, Ardesen Vocational School, Rize, 53400, Turkey. FAU - Pawlowsky-Glahn, Vera AU - Pawlowsky-Glahn V AD - Department of Computer Sciences, Applied Mathematics and Statistics, University of Girona, Campus Montilivi, 17003 Girona, Spain. FAU - Egozcue, Juan Jose AU - Egozcue JJ AD - Department of Civil and Environmental Engineering, Universitat Politecnica de Catalunya, Barcelona, 08034, Spain. FAU - Acar, Aybar Can AU - Acar AC AD - Department of Medical Informatics, Middle East Technical University, Ankara Turkey. LA - eng PT - Journal Article PT - Research Support, Non-U.S. Gov't PL - England TA - Brief Bioinform JT - Briefings in bioinformatics JID - 100912837 SB - IM MH - Data Analysis MH - Humans MH - *Microbiota/genetics MH - Phylogeny OTO - NOTNLM OT - balance OT - compositional data OT - microbial biomarkers OT - microbiome EDAT- 2022/08/26 06:00 MHDA- 2022/09/28 06:00 CRDT- 2022/08/25 16:22 PHST- 2022/02/01 00:00 [received] PHST- 2022/07/19 00:00 [revised] PHST- 2022/07/20 00:00 [accepted] PHST- 2022/08/26 06:00 [pubmed] PHST- 2022/09/28 06:00 [medline] PHST- 2022/08/25 16:22 [entrez] AID - 6675749 [pii] AID - 10.1093/bib/bbac328 [doi] PST - ppublish SO - Brief Bioinform. 2022 Sep 20;23(5):bbac328. doi: 10.1093/bib/bbac328.