PMID- 28407084 OWN - NLM STAT- MEDLINE DCOM- 20190409 LR - 20221207 IS - 1477-4054 (Electronic) IS - 1467-5463 (Print) IS - 1467-5463 (Linking) VI - 19 IP - 5 DP - 2018 Sep 28 TI - Comparative analysis of de novo assemblers for variation discovery in personal genomes. PG - 893-904 LID - 10.1093/bib/bbx037 [doi] AB - Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also, mapping-based approaches are less sensitive to large INDELs and complex variations and provide little phase information in personal genomes. A few de novo assemblers have been developed to identify variants through direct variant calling from the assembly graph, micro-assembly and whole-genome assembly, but mainly for whole-genome sequencing (WGS) data. We developed SGVar, a de novo assembly workflow for haplotype-based variant discovery from whole-exome sequencing (WES) data. Using simulated human exome data, we compared SGVar with five variation-aware de novo assemblers and with BWA-MEM together with three haplotype- or local de novo assembly-based callers. SGVar outperforms the other assemblers in sensitivity and tolerance of sequencing errors. We recapitulated the findings on whole-genome and exome data from a Utah residents with Northern and Western European ancestry (CEU) trio, showing that SGVar had high sensitivity both in the highly divergent human leukocyte antigen (HLA) region and in non-HLA regions of chromosome 6. In particular, SGVar is robust to sequencing error, k-mer selection, divergence level and coverage depth. Unlike mapping-based approaches, SGVar is capable of resolving long-range phase and identifying large INDELs from WES, more prominently from WGS. We conclude that SGVar represents an ideal platform for WES-based variant discovery in highly divergent regions and across the whole genome. FAU - Tian, Shulan AU - Tian S AD - Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA. FAU - Yan, Huihuang AU - Yan H AD - Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA. FAU - Klee, Eric W AU - Klee EW AD - Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA. AD - Center for Individualized Medicine Bioinformatics Program, Mayo Clinic, USA. FAU - Kalmbach, Michael AU - Kalmbach M AD - Division of Information Management and Analytics, Department of Information Technology, Mayo Clinic, USA. FAU - Slager, Susan L AU - Slager SL AD - Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA. LA - eng GR - U01 CA118444/CA/NCI NIH HHS/United States GR - UL1 TR000135/TR/NCATS NIH HHS/United States PT - Comparative Study PT - Journal Article PT - Research Support, N.I.H., Extramural PL - England TA - Brief Bioinform JT - Briefings in bioinformatics JID - 100912837 RN - 0 (HLA Antigens) SB - IM MH - Chromosome Mapping/methods/statistics & numerical data MH - Chromosomes, Human, Pair 6/genetics MH - Computational Biology/methods MH - Computer Simulation MH - Female MH - *Genetic Variation MH - Genome, Human MH - HLA Antigens/genetics MH - Haplotypes MH - Humans MH - INDEL Mutation MH - Polymorphism, Single Nucleotide MH - Pregnancy MH - Exome Sequencing/*methods/statistics & numerical data MH - Whole Genome Sequencing/methods/statistics & numerical data PMC - PMC6169673 EDAT- 2017/04/14 06:00 MHDA- 2019/04/10 06:00 PMCR- 2017/04/11 CRDT- 2017/04/14 06:00 PHST- 2016/01/12 00:00 [received] PHST- 2017/03/08 00:00 [accepted] PHST- 2017/04/14 06:00 [pubmed] PHST- 2019/04/10 06:00 [medline] PHST- 2017/04/14 06:00 [entrez] PHST- 2017/04/11 00:00 [pmc-release] AID - 3603524 [pii] AID - bbx037 [pii] AID - 10.1093/bib/bbx037 [doi] PST - ppublish SO - Brief Bioinform. 2018 Sep 28;19(5):893-904. doi: 10.1093/bib/bbx037.