PMID- 32898222 OWN - NLM STAT- MEDLINE DCOM- 20210608 LR - 20210608 IS - 1367-4811 (Electronic) IS - 1367-4803 (Linking) VI - 37 IP - 7 DP - 2021 May 17 TI - SMI-BLAST: a novel supervised search framework based on PSI-BLAST for protein remote homology detection. PG - 913-920 LID - 10.1093/bioinformatics/btaa772 [doi] AB - MOTIVATION: As one of the most important and widely used mainstream iterative search tool for protein sequence search, an accurate Position-Specific Scoring Matrix (PSSM) is the key of PSI-BLAST. However, PSSMs containing non-homologous information obviously reduce the performance of PSI-BLAST for protein remote homology. RESULTS: To further study this problem, we summarize three types of Incorrectly Selected Homology (ISH) errors in PSSMs. A new search tool Supervised-Manner-based Iterative BLAST (SMI-BLAST) is proposed based on PSI-BLAST for solving these errors. SMI-BLAST obviously outperforms PSI-BLAST on the Structural Classification of Proteins-extended (SCOPe) dataset. Compared with PSI-BLAST on the ISH error subsets of SCOPe dataset, SMI-BLAST detects 1.6-2.87 folds more remote homologous sequences, and outperforms PSI-BLAST by 35.66% in terms of ROC1 scores. Furthermore, this framework is applied to JackHMMER, DELTA-BLAST and PSI-BLASTexB, and their performance is further improved. AVAILABILITY AND IMPLEMENTATION: User-friendly webservers for SMI-BLAST, JackHMMER, DELTA-BLAST and PSI-BLASTexB are established at http://bliulab.net/SMI-BLAST/, by which the users can easily get the results without the need to go through the mathematical details. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CI - (c) The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. FAU - Jin, Xiaopeng AU - Jin X AD - School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China. FAU - Liao, Qing AU - Liao Q AD - School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China. FAU - Wei, Hang AU - Wei H AD - School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China. FAU - Zhang, Jun AU - Zhang J AD - School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China. FAU - Liu, Bin AU - Liu B AD - School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China. AD - School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China. AD - Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China. LA - eng PT - Journal Article PT - Research Support, Non-U.S. Gov't PL - England TA - Bioinformatics JT - Bioinformatics (Oxford, England) JID - 9808944 RN - 0 (Proteins) SB - IM MH - *Algorithms MH - Amino Acid Sequence MH - Position-Specific Scoring Matrices MH - *Proteins MH - Sequence Alignment MH - Sequence Analysis, Protein EDAT- 2020/09/09 06:00 MHDA- 2021/06/09 06:00 CRDT- 2020/09/08 17:13 PHST- 2020/05/21 00:00 [received] PHST- 2020/08/14 00:00 [revised] PHST- 2020/08/28 00:00 [accepted] PHST- 2020/09/09 06:00 [pubmed] PHST- 2021/06/09 06:00 [medline] PHST- 2020/09/08 17:13 [entrez] AID - 5902827 [pii] AID - 10.1093/bioinformatics/btaa772 [doi] PST - ppublish SO - Bioinformatics. 2021 May 17;37(7):913-920. doi: 10.1093/bioinformatics/btaa772.