PMID- 26495297 OWN - NLM STAT- MEDLINE DCOM- 20160802 LR - 20191210 IS - 2314-6141 (Electronic) IS - 2314-6133 (Print) VI - 2015 DP - 2015 TI - Development of self-compressing BLSOM for comprehensive analysis of big sequence data. PG - 506052 LID - 10.1155/2015/506052 [doi] LID - 506052 AB - With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method's suitability for efficient knowledge discovery from big sequence data. FAU - Kikuchi, Akihito AU - Kikuchi A AD - Graduate School of Science and Technology, Niigata University, Niigata-shi, Niigata-ken 950-2181, Japan. FAU - Ikemura, Toshimichi AU - Ikemura T AD - Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Shiga-ken 526-0829, Japan. FAU - Abe, Takashi AU - Abe T AD - Graduate School of Science and Technology, Niigata University, Niigata-shi, Niigata-ken 950-2181, Japan. LA - eng PT - Evaluation Study PT - Journal Article PT - Research Support, Non-U.S. Gov't DEP - 20151001 PL - United States TA - Biomed Res Int JT - BioMed research international JID - 101600173 SB - IM MH - *Algorithms MH - Chromosome Mapping/*methods MH - Data Compression/*methods MH - Genome, Bacterial/genetics MH - High-Throughput Nucleotide Sequencing/*methods MH - Pattern Recognition, Automated/*methods MH - Sequence Analysis, DNA/*methods MH - Software PMC - PMC4606171 EDAT- 2015/10/27 06:00 MHDA- 2016/08/03 06:00 PMCR- 2015/10/01 CRDT- 2015/10/24 06:00 PHST- 2015/03/27 00:00 [received] PHST- 2015/06/25 00:00 [revised] PHST- 2015/07/12 00:00 [accepted] PHST- 2015/10/24 06:00 [entrez] PHST- 2015/10/27 06:00 [pubmed] PHST- 2016/08/03 06:00 [medline] PHST- 2015/10/01 00:00 [pmc-release] AID - 10.1155/2015/506052 [doi] PST - ppublish SO - Biomed Res Int. 2015;2015:506052. doi: 10.1155/2015/506052. Epub 2015 Oct 1.