PMID- 29949981 OWN - NLM STAT- MEDLINE DCOM- 20190827 LR - 20240327 IS - 1367-4811 (Electronic) IS - 1367-4803 (Print) IS - 1367-4803 (Linking) VI - 34 IP - 13 DP - 2018 Jul 1 TI - Covariate-dependent negative binomial factor analysis of RNA sequencing data. PG - i61-i69 LID - 10.1093/bioinformatics/bty237 [doi] AB - MOTIVATION: High-throughput sequencing technologies, in particular RNA sequencing (RNA-seq), have become the basic practice for genomic studies in biomedical research. In addition to studying genes individually, for example, through differential expression analysis, investigating co-ordinated expression variations of genes may help reveal the underlying cellular mechanisms to derive better understanding and more effective prognosis and intervention strategies. Although there exists a variety of co-expression network based methods to analyze microarray data for this purpose, instead of blindly extending these methods for microarray data that may introduce unnecessary bias, it is crucial to develop methods well adapted to RNA-seq data to identify the functional modules of genes with similar expression patterns. RESULTS: We have developed a fully Bayesian covariate-dependent negative binomial factor analysis (dNBFA) method-dNBFA-for RNA-seq count data, to capture coordinated gene expression changes, while considering effects from covariates reflecting different influencing factors. Unlike existing co-expression network based methods, our proposed model does not require multiple ad-hoc choices on data processing, transformation, as well as co-expression measures and can be directly applied to RNA-seq data. Furthermore, being capable of incorporating covariate information, the proposed method can tackle setups with complex confounding factors in different experiment designs. Finally, the natural model parameterization removes the need for a normalization preprocessing step, as commonly adopted to compensate for the effect of sequencing-depth variations. Efficient Bayesian inference of model parameters is derived by exploiting conditional conjugacy via novel data augmentation techniques. Experimental results on several real-world RNA-seq datasets on complex diseases suggest dNBFA as a powerful tool for discovering the gene modules with significant differential expression and meaningful biological insight. AVAILABILITY AND IMPLEMENTATION: dNBFA is implemented in R language and is available at https://github.com/siamakz/dNBFA. FAU - Zamani Dadaneh, Siamak AU - Zamani Dadaneh S AD - Department of Electrical and Computer Engineering, TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, USA. FAU - Zhou, Mingyuan AU - Zhou M AD - Department of Information, Risk, and Operations Management, The University of Texas at Austin, Austin, TX, USA. FAU - Qian, Xiaoning AU - Qian X AD - Department of Electrical and Computer Engineering, TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, USA. LA - eng PT - Journal Article PT - Research Support, Non-U.S. Gov't PT - Research Support, U.S. Gov't, Non-P.H.S. PL - England TA - Bioinformatics JT - Bioinformatics (Oxford, England) JID - 9808944 SB - IM MH - Autistic Disorder/genetics MH - Bayes Theorem MH - Factor Analysis, Statistical MH - Gene Expression Profiling/*methods MH - *Gene Regulatory Networks MH - High-Throughput Nucleotide Sequencing/*methods MH - Humans MH - Neoplasms/genetics MH - Sequence Analysis, RNA/*methods MH - *Software PMC - PMC6022606 EDAT- 2018/06/29 06:00 MHDA- 2019/08/28 06:00 PMCR- 2018/06/27 CRDT- 2018/06/29 06:00 PHST- 2018/06/29 06:00 [entrez] PHST- 2018/06/29 06:00 [pubmed] PHST- 2019/08/28 06:00 [medline] PHST- 2018/06/27 00:00 [pmc-release] AID - 5045747 [pii] AID - bty237 [pii] AID - 10.1093/bioinformatics/bty237 [doi] PST - ppublish SO - Bioinformatics. 2018 Jul 1;34(13):i61-i69. doi: 10.1093/bioinformatics/bty237.