PMID- 33407098 OWN - NLM STAT- MEDLINE DCOM- 20210115 LR - 20210115 IS - 1471-2105 (Electronic) IS - 1471-2105 (Linking) VI - 22 IP - 1 DP - 2021 Jan 6 TI - MATHLA: a robust framework for HLA-peptide binding prediction integrating bidirectional LSTM and multiple head attention mechanism. PG - 7 LID - 10.1186/s12859-020-03946-z [doi] LID - 7 AB - BACKGROUND: Accurate prediction of binding between class I human leukocyte antigen (HLA) and neoepitope is critical for target identification within personalized T-cell based immunotherapy. Many recent prediction tools developed upon the deep learning algorithms and mass spectrometry data have indeed showed improvement on the average predicting power for class I HLA-peptide interaction. However, their prediction performances show great variability over individual HLA alleles and peptides with different lengths, which is particularly the case for HLA-C alleles due to the limited amount of experimental data. To meet the increasing demand for attaining the most accurate HLA-peptide binding prediction for individual patient in the real-world clinical studies, more advanced deep learning framework with higher prediction accuracy for HLA-C alleles and longer peptides is highly desirable. RESULTS: We present a pan-allele HLA-peptide binding prediction framework-MATHLA which integrates bi-directional long short-term memory network and multiple head attention mechanism. This model achieves better prediction accuracy in both fivefold cross-validation test and independent test dataset. In addition, this model is superior over existing tools regarding to the prediction accuracy for longer ligand ranging from 11 to 15 amino acids. Moreover, our model also shows a significant improvement for HLA-C-peptide-binding prediction. By investigating multiple-head attention weight scores, we depicted possible interaction patterns between three HLA I supergroups and their cognate peptides. CONCLUSION: Our method demonstrates the necessity of further development of deep learning algorithm in improving and interpreting HLA-peptide binding prediction in parallel to increasing the amount of high-quality HLA ligandome data. FAU - Ye, Yilin AU - Ye Y AD - Shenzhen Neocura Biotechnology Co. Ltd., Shenzhen, 518055, China. AD - School of Computer Science and Technology, Heilongjiang University, Harbin, 150080, China. FAU - Wang, Jian AU - Wang J AD - Shenzhen Neocura Biotechnology Co. Ltd., Shenzhen, 518055, China. FAU - Xu, Yunwan AU - Xu Y AD - Shenzhen Neocura Biotechnology Co. Ltd., Shenzhen, 518055, China. FAU - Wang, Yi AU - Wang Y AD - Shenzhen Neocura Biotechnology Co. Ltd., Shenzhen, 518055, China. FAU - Pan, Youdong AU - Pan Y AD - Shenzhen Neocura Biotechnology Co. Ltd., Shenzhen, 518055, China. FAU - Song, Qi AU - Song Q AD - Shenzhen Neocura Biotechnology Co. Ltd., Shenzhen, 518055, China. FAU - Liu, Xing AU - Liu X AD - The Center for Microbes, Development and Health, Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Chinese Academy of Sciences, Shanghai, 200031, China. FAU - Wan, Ji AU - Wan J AUID- ORCID: 0000-0002-5279-0345 AD - Shenzhen Neocura Biotechnology Co. Ltd., Shenzhen, 518055, China. jiw@neocura.net. LA - eng PT - Journal Article DEP - 20210106 PL - England TA - BMC Bioinformatics JT - BMC bioinformatics JID - 100965194 RN - 0 (Histocompatibility Antigens Class I) RN - 0 (Peptides) SB - IM MH - Algorithms MH - Computational Biology/*methods MH - *Histocompatibility Antigens Class I/chemistry/metabolism MH - Humans MH - Models, Statistical MH - *Neural Networks, Computer MH - *Peptides/chemistry/metabolism MH - *Protein Binding PMC - PMC7787246 OTO - NOTNLM OT - Cancer immunotherapy OT - Deep learning OT - HLA-peptide binding prediction COIS- The authors declared that they have no competing interests. EDAT- 2021/01/08 06:00 MHDA- 2021/01/16 06:00 PMCR- 2021/01/06 CRDT- 2021/01/07 05:42 PHST- 2020/07/30 00:00 [received] PHST- 2020/12/21 00:00 [accepted] PHST- 2021/01/07 05:42 [entrez] PHST- 2021/01/08 06:00 [pubmed] PHST- 2021/01/16 06:00 [medline] PHST- 2021/01/06 00:00 [pmc-release] AID - 10.1186/s12859-020-03946-z [pii] AID - 3946 [pii] AID - 10.1186/s12859-020-03946-z [doi] PST - epublish SO - BMC Bioinformatics. 2021 Jan 6;22(1):7. doi: 10.1186/s12859-020-03946-z.