PMID- 21370079 OWN - NLM STAT- MEDLINE DCOM- 20110609 LR - 20110303 IS - 1940-6029 (Electronic) IS - 1064-3745 (Linking) VI - 719 DP - 2011 TI - Omics data management and annotation. PG - 71-96 LID - 10.1007/978-1-61779-027-0_3 [doi] AB - Technological Omics breakthroughs, including next generation sequencing, bring avalanches of data which need to undergo effective data management to ensure integrity, security, and maximal knowledge-gleaning. Data management system requirements include flexible input formats, diverse data entry mechanisms and views, user friendliness, attention to standards, hardware and software platform definition, as well as robustness. Relevant solutions elaborated by the scientific community include Laboratory Information Management Systems (LIMS) and standardization protocols facilitating data sharing and managing. In project planning, special consideration has to be made when choosing relevant Omics annotation sources, since many of them overlap and require sophisticated integration heuristics. The data modeling step defines and categorizes the data into objects (e.g., genes, articles, disorders) and creates an application flow. A data storage/warehouse mechanism must be selected, such as file-based systems and relational databases, the latter typically used for larger projects. Omics project life cycle considerations must include the definition and deployment of new versions, incorporating either full or partial updates. Finally, quality assurance (QA) procedures must validate data and feature integrity, as well as system performance expectations. We illustrate these data management principles with examples from the life cycle of the GeneCards Omics project (http://www.genecards.org), a comprehensive, widely used compendium of annotative information about human genes. For example, the GeneCards infrastructure has recently been changed from text files to a relational database, enabling better organization and views of the growing data. Omics data handling benefits from the wealth of Web-based information, the vast amount of public domain software, increasingly affordable hardware, and effective use of data management and annotation principles as outlined in this chapter. FAU - Harel, Arye AU - Harel A AD - Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel. FAU - Dalah, Irina AU - Dalah I FAU - Pietrokovski, Shmuel AU - Pietrokovski S FAU - Safran, Marilyn AU - Safran M FAU - Lancet, Doron AU - Lancet D LA - eng PT - Journal Article PT - Research Support, Non-U.S. Gov't PL - United States TA - Methods Mol Biol JT - Methods in molecular biology (Clifton, N.J.) JID - 9214969 SB - IM MH - Animals MH - Computational Biology/*methods/standards MH - Data Display MH - Databases, Genetic MH - Humans MH - Information Management/*methods/standards MH - Molecular Sequence Annotation/*methods/standards MH - Quality Control MH - Research Personnel MH - Software EDAT- 2011/03/04 06:00 MHDA- 2011/06/10 06:00 CRDT- 2011/03/04 06:00 PHST- 2011/03/04 06:00 [entrez] PHST- 2011/03/04 06:00 [pubmed] PHST- 2011/06/10 06:00 [medline] AID - 10.1007/978-1-61779-027-0_3 [doi] PST - ppublish SO - Methods Mol Biol. 2011;719:71-96. doi: 10.1007/978-1-61779-027-0_3.