PMID- 25201145 OWN - NLM STAT- MEDLINE DCOM- 20150602 LR - 20220321 IS - 1756-0500 (Electronic) IS - 1756-0500 (Linking) VI - 7 DP - 2014 Sep 8 TI - SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets. PG - 618 LID - 10.1186/1756-0500-7-618 [doi] LID - 618 AB - BACKGROUND: Next-generation sequencing (NGS) is now a commonplace tool for molecular characterisation of virtually any species of interest. Despite the ever-increasing use of NGS in laboratories worldwide, analysis of whole genome re-sequencing (WGS) datasets from start to finish remains nontrivial due to the fragmented nature of NGS software and the lack of experienced bioinformaticists in many research teams. FINDINGS: We describe SPANDx (Synergised Pipeline for Analysis of NGS Data in Linux), a new tool for high-throughput comparative analysis of haploid WGS datasets comprising one through thousands of genomes. SPANDx consolidates several well-validated, open-source packages into a single tool, mitigating the need to learn and manipulate individual NGS programs. SPANDx incorporates BWA for alignment of raw NGS reads against a reference genome or pan-genome, followed by data filtering, variant calling and annotation using Picard, GATK, SAMtools and SnpEff. BEDTools has also been included for genetic locus presence/absence (P/A) determination to easily visualise the core and accessory genomes. Additional SPANDx features include construction of error-corrected single-nucleotide polymorphism (SNP) and insertion-deletion matrices, and P/A matrices, to enable user-friendly visualisation of genetic variants. The SNP matrices generated using VCFtools and GATK are directly importable into PAUP*, PHYLIP or RAxML for downstream phylogenetic analysis. SPANDx has been developed to handle NGS data from Illumina, Ion Personal Genome Machine (PGM) and 454 platforms, and we demonstrate that it has comparable performance across Illumina MiSeq/HiSeq2000 and Ion PGM data. CONCLUSION: SPANDx is an all-in-one tool for comprehensive haploid WGS analysis. SPANDx is open source and is freely available at: http://sourceforge.net/projects/spandx/. FAU - Sarovich, Derek S AU - Sarovich DS AD - Global and Tropical Health Division, Menzies School of Health Research, Charles Darwin University, PO Box 41096, Casuarina 0811, NT, Australia. derek.sarovich@menzies.edu.au. FAU - Price, Erin P AU - Price EP LA - eng PT - Comparative Study PT - Journal Article PT - Research Support, Non-U.S. Gov't DEP - 20140908 PL - England TA - BMC Res Notes JT - BMC research notes JID - 101462768 SB - IM MH - *Genome MH - *Haploidy MH - Phylogeny MH - Polymorphism, Single Nucleotide PMC - PMC4169827 EDAT- 2014/09/10 06:00 MHDA- 2015/06/03 06:00 PMCR- 2014/09/08 CRDT- 2014/09/10 06:00 PHST- 2014/08/14 00:00 [received] PHST- 2014/08/27 00:00 [accepted] PHST- 2014/09/10 06:00 [entrez] PHST- 2014/09/10 06:00 [pubmed] PHST- 2015/06/03 06:00 [medline] PHST- 2014/09/08 00:00 [pmc-release] AID - 1756-0500-7-618 [pii] AID - 3161 [pii] AID - 10.1186/1756-0500-7-618 [doi] PST - epublish SO - BMC Res Notes. 2014 Sep 8;7:618. doi: 10.1186/1756-0500-7-618.