PMID- 23080114 OWN - NLM STAT- MEDLINE DCOM- 20130729 LR - 20220316 IS - 1367-4811 (Electronic) IS - 1367-4803 (Linking) VI - 28 IP - 24 DP - 2012 Dec 15 TI - Discriminative modelling of context-specific amino acid substitution probabilities. PG - 3240-7 LID - 10.1093/bioinformatics/bts622 [doi] AB - MOTIVATION: Protein sequence searching and alignment are fundamental tools of modern biology. Alignments are assessed using their similarity scores, essentially the sum of substitution matrix scores over all pairs of aligned amino acids. We previously proposed a generative probabilistic method that yields scores that take the sequence context around each aligned residue into account. This method showed drastically improved sensitivity and alignment quality compared with standard substitution matrix-based alignment. RESULTS: Here, we develop an alternative discriminative approach to predict sequence context-specific substitution scores. We applied our approach to compute context-specific sequence profiles for Basic Local Alignment Search Tool (BLAST) and compared the new tool (CS-BLASTdis) to BLAST and the previous context-specific version (CS-BLASTgen). On a dataset filtered to 20% maximum sequence identity, CS-BLASTdisis was 51% more sensitive than BLAST and 17% more sensitive than CS-BLASTgenin, detecting remote homologues at 10% false discovery rate. At 30% maximum sequence identity, its alignments contain 21 and 12% more correct residue pairs than those of BLAST and CS-BLASTgen, respectively. Clear improvements are also seen when the approach is combined with PSI-BLAST and HHblits. We believe the context-specific approach should replace substitution matrices wherever sensitivity and alignment quality are critical. FAU - Angermuller, Christof AU - Angermuller C AD - Gene Center Munich and Department of Biochemistry, Ludwig-Maximilians-Universtat Munchen, 81377 Munich, Germany. FAU - Biegert, Andreas AU - Biegert A FAU - Soding, Johannes AU - Soding J LA - eng PT - Journal Article PT - Research Support, Non-U.S. Gov't DEP - 20121017 PL - England TA - Bioinformatics JT - Bioinformatics (Oxford, England) JID - 9808944 SB - IM MH - Algorithms MH - *Amino Acid Substitution MH - Models, Statistical MH - Probability MH - Sequence Alignment/*methods MH - *Sequence Analysis, Protein MH - *Software EDAT- 2012/10/20 06:00 MHDA- 2013/07/31 06:00 CRDT- 2012/10/20 06:00 PHST- 2012/10/20 06:00 [entrez] PHST- 2012/10/20 06:00 [pubmed] PHST- 2013/07/31 06:00 [medline] AID - bts622 [pii] AID - 10.1093/bioinformatics/bts622 [doi] PST - ppublish SO - Bioinformatics. 2012 Dec 15;28(24):3240-7. doi: 10.1093/bioinformatics/bts622. Epub 2012 Oct 17.