PMID- 21328710 OWN - NLM STAT- MEDLINE DCOM- 20110614 LR - 20211020 IS - 1757-6334 (Electronic) IS - 0219-7200 (Print) IS - 0219-7200 (Linking) VI - 9 IP - 1 DP - 2011 Feb TI - A novel approach to DNA copy number data segmentation. PG - 131-48 AB - DNA copy number (DCN) is the number of copies of DNA at a region of a genome. The alterations of DCN are highly associated with the development of different tumors. Recently, microarray technologies are being employed to detect DCN changes at many loci at the same time in tumor samples. The resulting DCN data are often very noisy, and the tumor sample is often contaminated by normal cells. The goal of computational analysis of array-based DCN data is to infer the underlying DCNs from raw DCN data. Previous methods for this task do not model the tumor/normal cell mixture ratio explicitly and they cannot output segments with DCN annotations. We developed a novel model-based method using the minimum description length (MDL) principle for DCN data segmentation. Our new method can output underlying DCN for each chromosomal segment, and at the same time, infer the underlying tumor proportion in the test samples. Empirical results show that our method achieves better accuracies on average as compared to three previous methods, namely Circular Binary Segmentation, Hidden Markov Model and Ultrasome. FAU - Wang, Siling AU - Wang S AD - Department of Computer Science and Engineering, Southern Methodist University, Dallas, Texas 75205, USA. silingw@smu.edu FAU - Wang, Yuhang AU - Wang Y FAU - Xie, Yang AU - Xie Y FAU - Xiao, Guanghua AU - Xiao G LA - eng GR - R33 DA027592/DA/NIDA NIH HHS/United States GR - 1R01CA152301-01/CA/NCI NIH HHS/United States GR - R21 DA027592-02/DA/NIDA NIH HHS/United States GR - 1R21DA027592/DA/NIDA NIH HHS/United States GR - R01 CA152301/CA/NCI NIH HHS/United States GR - R21 DA027592/DA/NIDA NIH HHS/United States PT - Comparative Study PT - Evaluation Study PT - Journal Article PT - Research Support, N.I.H., Extramural PT - Research Support, U.S. Gov't, Non-P.H.S. PL - Singapore TA - J Bioinform Comput Biol JT - Journal of bioinformatics and computational biology JID - 101187344 RN - 0 (DNA, Neoplasm) SB - IM MH - Algorithms MH - Computational Biology MH - Computer Simulation MH - *DNA Copy Number Variations MH - DNA, Neoplasm/genetics MH - Data Interpretation, Statistical MH - Databases, Nucleic Acid/statistics & numerical data MH - Humans MH - Markov Chains MH - Models, Statistical MH - Neoplasms/genetics MH - Oligonucleotide Array Sequence Analysis/statistics & numerical data MH - Software PMC - PMC3084615 MID - NIHMS288631 EDAT- 2011/02/18 06:00 MHDA- 2011/06/15 06:00 PMCR- 2011/04/29 CRDT- 2011/02/18 06:00 PHST- 2010/02/10 00:00 [received] PHST- 2010/11/02 00:00 [revised] PHST- 2010/11/04 00:00 [accepted] PHST- 2011/02/18 06:00 [entrez] PHST- 2011/02/18 06:00 [pubmed] PHST- 2011/06/15 06:00 [medline] PHST- 2011/04/29 00:00 [pmc-release] AID - S0219720011005343 [pii] AID - 10.1142/s0219720011005343 [doi] PST - ppublish SO - J Bioinform Comput Biol. 2011 Feb;9(1):131-48. doi: 10.1142/s0219720011005343.