PMID- 34917425 OWN - NLM STAT- PubMed-not-MEDLINE LR - 20220429 IS - 2167-8359 (Print) IS - 2167-8359 (Electronic) IS - 2167-8359 (Linking) VI - 9 DP - 2021 TI - CNV-P: a machine-learning framework for predicting high confident copy number variations. PG - e12564 LID - 10.7717/peerj.12564 [doi] LID - e12564 AB - BACKGROUND: Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. METHODS: Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. RESULTS: The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. CONCLUSIONS: Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases. CI - (c) 2021 Wang et al. FAU - Wang, Taifu AU - Wang T AUID- ORCID: 0000-0003-4674-4454 AD - BGI-Shenzhen, Shenzhen, China. FAU - Sun, Jinghua AU - Sun J AD - BGI-Shenzhen, Shenzhen, China. AD - College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China. FAU - Zhang, Xiuqing AU - Zhang X AD - BGI-Shenzhen, Shenzhen, China. AD - College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China. AD - Guangdong Enterprise Key Laboratory of Human Disease Genomics, Beishan Industrial Zone, Shenzhen, China. FAU - Wang, Wen-Jing AU - Wang WJ AUID- ORCID: 0000-0002-4527-1168 AD - BGI-Shenzhen, Shenzhen, China. FAU - Zhou, Qing AU - Zhou Q AD - BGI-Shenzhen, Shenzhen, China. LA - eng PT - Journal Article DEP - 20211202 PL - United States TA - PeerJ JT - PeerJ JID - 101603425 PMC - PMC8645205 OTO - NOTNLM OT - Copy number variant OT - Genome sequencing OT - Machine learning COIS- The authors declare that they have no competing interests. EDAT- 2021/12/18 06:00 MHDA- 2021/12/18 06:01 PMCR- 2021/12/02 CRDT- 2021/12/17 06:57 PHST- 2021/08/04 00:00 [received] PHST- 2021/11/08 00:00 [accepted] PHST- 2021/12/17 06:57 [entrez] PHST- 2021/12/18 06:00 [pubmed] PHST- 2021/12/18 06:01 [medline] PHST- 2021/12/02 00:00 [pmc-release] AID - 12564 [pii] AID - 10.7717/peerj.12564 [doi] PST - epublish SO - PeerJ. 2021 Dec 2;9:e12564. doi: 10.7717/peerj.12564. eCollection 2021.