PMID- 18348511 OWN - NLM STAT- MEDLINE DCOM- 20080922 LR - 20191210 IS - 1535-3893 (Print) IS - 1535-3893 (Linking) VI - 7 IP - 5 DP - 2008 May TI - PepLine: a software pipeline for high-throughput direct mapping of tandem mass spectrometry data on genomic sequences. PG - 1873-83 LID - 10.1021/pr070415k [doi] AB - PepLine is a fully automated software which maps MS/MS fragmentation spectra of trypsic peptides to genomic DNA sequences. The approach is based on Peptide Sequence Tags (PSTs) obtained from partial interpretation of QTOF MS/MS spectra (first module). PSTs are then mapped on the six-frame translations of genomic sequences (second module) giving hits. Hits are then clustered to detect potential coding regions (third module). Our work aimed at optimizing the algorithms of each component to allow the whole pipeline to proceed in a fully automated manner using raw nucleic acid sequences (i.e., genomes that have not been "reduced" to a database of ORFs or putative exons sequences). The whole pipeline was tested on controlled MS/MS spectra sets from standard proteins and from Arabidopsis thaliana envelope chloroplast samples. Our results demonstrate that PepLine competed with protein database searching softwares and was fast enough to potentially tackle large data sets and/or high size genomes. We also illustrate the potential of this approach for the detection of the intron/exon structure of genes. FAU - Ferro, Myriam AU - Ferro M AD - CEA, DSV, iRTSV, Laboratoire d'Etude de la Dynamique des Proteomes, Grenoble, F-38054, France. FAU - Tardif, Marianne AU - Tardif M FAU - Reguer, Erwan AU - Reguer E FAU - Cahuzac, Romain AU - Cahuzac R FAU - Bruley, Christophe AU - Bruley C FAU - Vermat, Thierry AU - Vermat T FAU - Nugues, Estelle AU - Nugues E FAU - Vigouroux, Marielle AU - Vigouroux M FAU - Vandenbrouck, Yves AU - Vandenbrouck Y FAU - Garin, Jerome AU - Garin J FAU - Viari, Alain AU - Viari A LA - eng PT - Evaluation Study PT - Journal Article PT - Research Support, Non-U.S. Gov't DEP - 20080319 PL - United States TA - J Proteome Res JT - Journal of proteome research JID - 101128775 RN - 0 (Arabidopsis Proteins) RN - 0 (Peptides) SB - IM MH - Algorithms MH - Amino Acid Sequence MH - Animals MH - Arabidopsis/chemistry/cytology/genetics MH - Arabidopsis Proteins/*analysis/genetics MH - Base Sequence MH - Chloroplasts/chemistry/genetics MH - *Genome MH - *Mass Spectrometry/instrumentation/methods MH - Molecular Sequence Data MH - Peptides/*analysis/genetics MH - Sequence Alignment MH - *Software EDAT- 2008/03/20 09:00 MHDA- 2008/09/23 09:00 CRDT- 2008/03/20 09:00 PHST- 2008/03/20 09:00 [pubmed] PHST- 2008/09/23 09:00 [medline] PHST- 2008/03/20 09:00 [entrez] AID - 10.1021/pr070415k [doi] PST - ppublish SO - J Proteome Res. 2008 May;7(5):1873-83. doi: 10.1021/pr070415k. Epub 2008 Mar 19.