PMID- 33446856 OWN - NLM STAT- MEDLINE DCOM- 20210916 LR - 20240330 IS - 2045-2322 (Electronic) IS - 2045-2322 (Linking) VI - 11 IP - 1 DP - 2021 Jan 14 TI - Predicting bacteriophage hosts based on sequences of annotated receptor-binding proteins. PG - 1467 LID - 10.1038/s41598-021-81063-4 [doi] LID - 1467 AB - Nowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs. FAU - Boeckaerts, Dimitri AU - Boeckaerts D AD - KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium. AD - Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium. FAU - Stock, Michiel AU - Stock M AD - KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium. FAU - Criel, Bjorn AU - Criel B AD - Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium. FAU - Gerstmans, Hans AU - Gerstmans H AD - Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium. AD - Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Leuven, Belgium. AD - MeBioS-Biosensors group, Department of BioSystems, KU Leuven, Leuven, Belgium. FAU - De Baets, Bernard AU - De Baets B AD - KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium. FAU - Briers, Yves AU - Briers Y AD - Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium. Yves.Briers@UGent.be. LA - eng PT - Journal Article PT - Research Support, Non-U.S. Gov't DEP - 20210114 PL - England TA - Sci Rep JT - Scientific reports JID - 101563288 RN - 0 (Viral Tail Proteins) SB - IM MH - Animals MH - Bacteria/genetics MH - Bacteriophages/*genetics MH - Clostridioides difficile/genetics MH - Escherichia coli/genetics MH - Host Specificity/*genetics MH - Humans MH - Machine Learning MH - Metagenomics/methods MH - Protein Binding/genetics MH - Salmonella enterica/genetics MH - Sequence Analysis, DNA/*methods MH - Viral Tail Proteins/genetics MH - Virion/genetics PMC - PMC7809048 COIS- The authors declare no competing interests. EDAT- 2021/01/16 06:00 MHDA- 2021/09/18 06:00 PMCR- 2021/01/14 CRDT- 2021/01/15 05:56 PHST- 2020/05/04 00:00 [received] PHST- 2020/12/30 00:00 [accepted] PHST- 2021/01/15 05:56 [entrez] PHST- 2021/01/16 06:00 [pubmed] PHST- 2021/09/18 06:00 [medline] PHST- 2021/01/14 00:00 [pmc-release] AID - 10.1038/s41598-021-81063-4 [pii] AID - 81063 [pii] AID - 10.1038/s41598-021-81063-4 [doi] PST - epublish SO - Sci Rep. 2021 Jan 14;11(1):1467. doi: 10.1038/s41598-021-81063-4.