PMID- 21903627 OWN - NLM STAT- MEDLINE DCOM- 20120313 LR - 20240104 IS - 1367-4811 (Electronic) IS - 1367-4803 (Print) IS - 1367-4803 (Linking) VI - 27 IP - 21 DP - 2011 Nov 1 TI - A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. PG - 2987-93 LID - 10.1093/bioinformatics/btr509 [doi] AB - MOTIVATION: Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. RESULTS: We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors. AVAILABILITY: http://samtools.sourceforge.net. CONTACT: hengli@broadinstitute.org. FAU - Li, Heng AU - Li H AD - Medical Population Genetics Program, Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA. hengli@broadinstitute.org LA - eng GR - U01 HG005208/HG/NHGRI NIH HHS/United States GR - 1U01HG005208-01/HG/NHGRI NIH HHS/United States PT - Journal Article PT - Research Support, N.I.H., Extramural DEP - 20110908 PL - England TA - Bioinformatics JT - Bioinformatics (Oxford, England) JID - 9808944 SB - IM MH - Alleles MH - Data Interpretation, Statistical MH - Gene Frequency MH - Genetic Association Studies MH - Genetics, Population/methods MH - Genotype MH - Humans MH - *Mutation MH - *Polymorphism, Single Nucleotide MH - *Sequence Analysis, DNA PMC - PMC3198575 EDAT- 2011/09/10 06:00 MHDA- 2012/03/14 06:00 PMCR- 2012/11/01 CRDT- 2011/09/10 06:00 PHST- 2011/09/10 06:00 [entrez] PHST- 2011/09/10 06:00 [pubmed] PHST- 2012/03/14 06:00 [medline] PHST- 2012/11/01 00:00 [pmc-release] AID - btr509 [pii] AID - 10.1093/bioinformatics/btr509 [doi] PST - ppublish SO - Bioinformatics. 2011 Nov 1;27(21):2987-93. doi: 10.1093/bioinformatics/btr509. Epub 2011 Sep 8.