PMID- 30101316 OWN - NLM STAT- MEDLINE DCOM- 20191231 LR - 20191231 IS - 1367-4811 (Electronic) IS - 1367-4803 (Linking) VI - 35 IP - 5 DP - 2019 Mar 1 TI - Processing of big heterogeneous genomic datasets for tertiary analysis of Next Generation Sequencing data. PG - 729-736 LID - 10.1093/bioinformatics/bty688 [doi] AB - MOTIVATION: We previously proposed a paradigm shift in genomic data management, based on the Genomic Data Model (GDM) for mediating existing data formats and on the GenoMetric Query Language (GMQL) for supporting, at a high level of abstraction, data extraction and the most common data-driven computations required by tertiary data analysis of Next Generation Sequencing datasets. Here, we present a new GMQL-based system with enhanced accessibility, portability, scalability and performance. RESULTS: The new system has a well-designed modular architecture featuring: (i) an intermediate representation supporting many different implementations (including Spark, Flink and SciDB); (ii) a high-level technology-independent repository abstraction, supporting different repository technologies (e.g., local file system, Hadoop File System, database or others); (iii) several system interfaces, including a user-friendly Web-based interface, a Web Service interface, and a programmatic interface for Python language. Biological use case examples, using public ENCODE, Roadmap Epigenomics and TCGA datasets, demonstrate the relevance of our work. AVAILABILITY AND IMPLEMENTATION: The GMQL system is freely available for non-commercial use as open source project at: http://www.bioinformatics.deib.polimi.it/GMQLsystem/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CI - (c) The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. FAU - Masseroli, Marco AU - Masseroli M AD - Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy. FAU - Canakoglu, Arif AU - Canakoglu A AD - Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy. FAU - Pinoli, Pietro AU - Pinoli P AD - Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy. FAU - Kaitoua, Abdulrahman AU - Kaitoua A AD - The German Research Center for Artificial Intelligence (DFKI), Berlin, Germany. FAU - Gulino, Andrea AU - Gulino A AD - Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy. FAU - Horlova, Olha AU - Horlova O AD - Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy. FAU - Nanni, Luca AU - Nanni L AD - Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy. FAU - Bernasconi, Anna AU - Bernasconi A AD - Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy. FAU - Perna, Stefano AU - Perna S AD - Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy. FAU - Stamoulakatou, Eirini AU - Stamoulakatou E AD - Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy. FAU - Ceri, Stefano AU - Ceri S AD - Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy. LA - eng PT - Journal Article PT - Research Support, Non-U.S. Gov't PL - England TA - Bioinformatics JT - Bioinformatics (Oxford, England) JID - 9808944 SB - IM MH - Epigenomics MH - Genome MH - Genomics MH - *High-Throughput Nucleotide Sequencing MH - *Software EDAT- 2018/08/14 06:00 MHDA- 2020/01/01 06:00 CRDT- 2018/08/14 06:00 PHST- 2018/03/31 00:00 [received] PHST- 2018/08/01 00:00 [revised] PHST- 2018/08/06 00:00 [accepted] PHST- 2018/08/14 06:00 [pubmed] PHST- 2020/01/01 06:00 [medline] PHST- 2018/08/14 06:00 [entrez] AID - 5067860 [pii] AID - 10.1093/bioinformatics/bty688 [doi] PST - ppublish SO - Bioinformatics. 2019 Mar 1;35(5):729-736. doi: 10.1093/bioinformatics/bty688.