PMID- 16339286 OWN - NLM STAT- MEDLINE DCOM- 20060331 LR - 20191210 IS - 1367-4803 (Print) IS - 1367-4803 (Linking) VI - 22 IP - 4 DP - 2006 Feb 15 TI - Gene sequence signatures revealed by mining the UniGene affiliation network. PG - 385-91 AB - BACKGROUND: In the post-genomic era, developing tools to decode biological information from genomic sequences is important. Inspired by affiliation network theory, we investigated gene sequences of two kinds of UniGene clusters (UCs): narrowly expressed transcripts (NETs), whose expression is confined to a few tissues; and prevalently expressed transcripts (PETs) that are expressed in many tissues. RESULTS: We explored the human and the mouse UniGene databases to compare NETs and PETs from different perspectives. We found that NETs were associated with smaller cluster size, shorter sequence length, a lower likelihood of having LocusLink annotations, and lower and more sporadic levels of expression. Significantly, the dinucleotide frequencies of NETs are similar to those of intergenic sequences in the genome, and they differ from those of PETs. We used these differences in dinucleotide frequencies to develop a discriminant analysis model to distinguish PETs from intergenic sequences. CONCLUSIONS: Our results show that most NETs resemble intergenic sequences, casting doubts on the quality of such UniGene clusters. However, we also noted that a fraction of NETs resemble PETs in terms of dinucleotide frequencies and other features. Such NETs may have fewer quality problems. This work may be helpful in the studies of non-coding RNAs and in the validation of gene sequence databases. FAU - Zhang, Jiexin AU - Zhang J AD - Department of Biostatistics and Applied Mathematics, The University of Texas M.D. Anderson Cancer Center, 1515 Holcombe Boulevard, Box 447, Houston, TX 77030-4009, USA. FAU - Zhang, Li AU - Zhang L FAU - Coombes, Kevin R AU - Coombes KR LA - eng PT - Evaluation Study PT - Journal Article DEP - 20051208 PL - England TA - Bioinformatics JT - Bioinformatics (Oxford, England) JID - 9808944 RN - 0 (Transcription Factors) SB - IM MH - Animals MH - Chromosome Mapping/*methods MH - *Database Management Systems MH - *Databases, Genetic MH - Humans MH - Information Storage and Retrieval/*methods MH - Mice MH - Multigene Family/*genetics MH - Sequence Alignment/*methods MH - Sequence Analysis, DNA/*methods MH - Transcription Factors/genetics EDAT- 2005/12/13 09:00 MHDA- 2006/04/01 09:00 CRDT- 2005/12/13 09:00 PHST- 2005/12/13 09:00 [pubmed] PHST- 2006/04/01 09:00 [medline] PHST- 2005/12/13 09:00 [entrez] AID - bti796 [pii] AID - 10.1093/bioinformatics/bti796 [doi] PST - ppublish SO - Bioinformatics. 2006 Feb 15;22(4):385-91. doi: 10.1093/bioinformatics/bti796. Epub 2005 Dec 8.