PMID- 16761918
OWN - NLM
STAT- MEDLINE
DCOM- 20060830
LR  - 20061115
IS  - 1066-5277 (Print)
IS  - 1066-5277 (Linking)
VI  - 13
IP  - 4
DP  - 2006 May
TI  - An optimal algorithm for perfect phylogeny haplotyping.
PG  - 897-928
AB  - Inferring haplotype data from genotype data is a crucial step in linking SNPs to 
      human diseases. Given n genotypes over m SNP sites, the haplotype inference (HI) 
      problem deals with finding a set of haplotypes so that each given genotype can be 
      formed by a combining a pair of haplotypes from the set. The perfect phylogeny 
      haplotyping (PPH) problem is one of the many computational approaches to the HI 
      problem. Though it was conjectured that the complexity of the PPH problem was 
      O(nm), the complexity of all the solutions presented until recently was O(nm 
      (2)). In this paper, we make complete use of the column-ordering that was 
      presented earlier and show that there must be some interdependencies among the 
      pairwise relationships between SNP sites in order for the given genotypes to 
      allow a perfect phylogeny. Based on these interdependencies, we introduce the 
      FlexTree (flexible tree) data structure that represents all the pairwise 
      relationships in O(m) space. The FlexTree data structure provides a compact 
      representation of all the perfect phylogenies for the given set of genotypes. We 
      also introduce an ordering of the genotypes that allows the genotypes to be added 
      to the FlexTree sequentially. The column ordering, the FlexTree data structure, 
      and the row ordering we introduce make the O(nm) OPPH algorithm possible. We 
      present some results on simulated data which demonstrate that the OPPH algorithm 
      performs quiet impressively when compared to the previous algorithms. The OPPH 
      algorithm is one of the first O(nm) algorithms presented for the PPH problem.
FAU - Vijayasatya, Ravi
AU  - Vijayasatya R
AD  - School of Electrical Engineering and Computer Science, University of Central 
      Florida, Orlando, 32816-2362, USA.
FAU - Mukherjee, Amar
AU  - Mukherjee A
LA  - eng
PT  - Journal Article
PT  - Research Support, U.S. Gov't, Non-P.H.S.
PL  - United States
TA  - J Comput Biol
JT  - Journal of computational biology : a journal of computational molecular cell 
      biology
JID - 9433358
SB  - IM
MH  - *Algorithms
MH  - *Computational Biology
MH  - Data Interpretation, Statistical
MH  - *Haplotypes
MH  - *Phylogeny
EDAT- 2006/06/10 09:00
MHDA- 2006/08/31 09:00
CRDT- 2006/06/10 09:00
PHST- 2006/06/10 09:00 [pubmed]
PHST- 2006/08/31 09:00 [medline]
PHST- 2006/06/10 09:00 [entrez]
AID - 10.1089/cmb.2006.13.897 [doi]
PST - ppublish
SO  - J Comput Biol. 2006 May;13(4):897-928. doi: 10.1089/cmb.2006.13.897.