PMID- 16761918 OWN - NLM STAT- MEDLINE DCOM- 20060830 LR - 20061115 IS - 1066-5277 (Print) IS - 1066-5277 (Linking) VI - 13 IP - 4 DP - 2006 May TI - An optimal algorithm for perfect phylogeny haplotyping. PG - 897-928 AB - Inferring haplotype data from genotype data is a crucial step in linking SNPs to human diseases. Given n genotypes over m SNP sites, the haplotype inference (HI) problem deals with finding a set of haplotypes so that each given genotype can be formed by a combining a pair of haplotypes from the set. The perfect phylogeny haplotyping (PPH) problem is one of the many computational approaches to the HI problem. Though it was conjectured that the complexity of the PPH problem was O(nm), the complexity of all the solutions presented until recently was O(nm (2)). In this paper, we make complete use of the column-ordering that was presented earlier and show that there must be some interdependencies among the pairwise relationships between SNP sites in order for the given genotypes to allow a perfect phylogeny. Based on these interdependencies, we introduce the FlexTree (flexible tree) data structure that represents all the pairwise relationships in O(m) space. The FlexTree data structure provides a compact representation of all the perfect phylogenies for the given set of genotypes. We also introduce an ordering of the genotypes that allows the genotypes to be added to the FlexTree sequentially. The column ordering, the FlexTree data structure, and the row ordering we introduce make the O(nm) OPPH algorithm possible. We present some results on simulated data which demonstrate that the OPPH algorithm performs quiet impressively when compared to the previous algorithms. The OPPH algorithm is one of the first O(nm) algorithms presented for the PPH problem. FAU - Vijayasatya, Ravi AU - Vijayasatya R AD - School of Electrical Engineering and Computer Science, University of Central Florida, Orlando, 32816-2362, USA. FAU - Mukherjee, Amar AU - Mukherjee A LA - eng PT - Journal Article PT - Research Support, U.S. Gov't, Non-P.H.S. PL - United States TA - J Comput Biol JT - Journal of computational biology : a journal of computational molecular cell biology JID - 9433358 SB - IM MH - *Algorithms MH - *Computational Biology MH - Data Interpretation, Statistical MH - *Haplotypes MH - *Phylogeny EDAT- 2006/06/10 09:00 MHDA- 2006/08/31 09:00 CRDT- 2006/06/10 09:00 PHST- 2006/06/10 09:00 [pubmed] PHST- 2006/08/31 09:00 [medline] PHST- 2006/06/10 09:00 [entrez] AID - 10.1089/cmb.2006.13.897 [doi] PST - ppublish SO - J Comput Biol. 2006 May;13(4):897-928. doi: 10.1089/cmb.2006.13.897.