PMID- 38110863 OWN - NLM STAT- MEDLINE DCOM- 20231220 LR - 20240210 IS - 1471-2105 (Electronic) IS - 1471-2105 (Linking) VI - 24 IP - 1 DP - 2023 Dec 18 TI - PEPMatch: a tool to identify short peptide sequence matches in large sets of proteins. PG - 485 LID - 10.1186/s12859-023-05606-4 [doi] LID - 485 AB - BACKGROUND: Numerous tools exist for biological sequence comparisons and search. One case of particular interest for immunologists is finding matches for linear peptide T cell epitopes, typically between 8 and 15 residues in length, in a large set of protein sequences. Both to find exact matches or matches that account for residue substitutions. The utility of such tools is critical in applications ranging from identifying conservation across viral epitopes, identifying putative epitope targets for allergens, and finding matches for cancer-associated neoepitopes to examine the role of tolerance in tumor recognition. RESULTS: We defined a set of benchmarks that reflect the different practical applications of short peptide sequence matching. We evaluated a suite of existing methods for speed and recall and developed a new tool, PEPMatch. The tool uses a deterministic k-mer mapping algorithm that preprocesses proteomes before searching, achieving a 50-fold increase in speed over methods such as the Basic Local Alignment Search Tool (BLAST) without compromising recall. PEPMatch's code and benchmark datasets are publicly available. CONCLUSIONS: PEPMatch offers significant speed and recall advantages for peptide sequence matching. While it is of immediate utility for immunologists, the developed benchmarking framework also provides a standard against which future tools can be evaluated for improvements. The tool is available at https://nextgen-tools.iedb.org , and the source code can be found at https://github.com/IEDB/PEPMatch . CI - (c) 2023. The Author(s). FAU - Marrama, Daniel AU - Marrama D AD - Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, San Diego, CA, USA. FAU - Chronister, William D AU - Chronister WD AD - Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, San Diego, CA, USA. FAU - Westernberg, Luise AU - Westernberg L AD - Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, San Diego, CA, USA. FAU - Vita, Randi AU - Vita R AD - Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, San Diego, CA, USA. FAU - Kosaloglu-Yalcin, Zeynep AU - Kosaloglu-Yalcin Z AD - Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, San Diego, CA, USA. FAU - Sette, Alessandro AU - Sette A AD - Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, San Diego, CA, USA. AD - University of California San Diego School of Medicine, La Jolla, San Diego, CA, USA. FAU - Nielsen, Morten AU - Nielsen M AD - Department of Health Technology, Technical University of Denmark, Lyngby, Denmark. FAU - Greenbaum, Jason A AU - Greenbaum JA AD - Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, San Diego, CA, USA. FAU - Peters, Bjoern AU - Peters B AD - Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, San Diego, CA, USA. bpeters@lji.org. AD - University of California San Diego School of Medicine, La Jolla, San Diego, CA, USA. bpeters@lji.org. LA - eng GR - U24 CA248138/CA/NCI NIH HHS/United States GR - U24CA248138/NH/NIH HHS/United States PT - Journal Article DEP - 20231218 PL - England TA - BMC Bioinformatics JT - BMC bioinformatics JID - 100965194 RN - 0 (Peptides) RN - 0 (Epitopes, T-Lymphocyte) RN - 0 (Proteome) SB - IM MH - Humans MH - Amino Acid Sequence MH - *Software MH - Peptides/chemistry MH - Algorithms MH - *Neoplasms MH - Epitopes, T-Lymphocyte MH - Proteome PMC - PMC10726511 OTO - NOTNLM OT - BLAST comparison OT - Benchmarking OT - Immunology OT - K-mer mapping OT - Peptide matching OT - Sequence searching OT - T-cell epitopes COIS- The authors declare no competing interests. EDAT- 2023/12/19 06:42 MHDA- 2023/12/20 06:43 PMCR- 2023/12/18 CRDT- 2023/12/19 00:24 PHST- 2023/09/30 00:00 [received] PHST- 2023/12/06 00:00 [accepted] PHST- 2023/12/20 06:43 [medline] PHST- 2023/12/19 06:42 [pubmed] PHST- 2023/12/19 00:24 [entrez] PHST- 2023/12/18 00:00 [pmc-release] AID - 10.1186/s12859-023-05606-4 [pii] AID - 5606 [pii] AID - 10.1186/s12859-023-05606-4 [doi] PST - epublish SO - BMC Bioinformatics. 2023 Dec 18;24(1):485. doi: 10.1186/s12859-023-05606-4.