PMID- 37420249 OWN - NLM STAT- MEDLINE DCOM- 20230721 LR - 20231116 IS - 1756-994X (Electronic) IS - 1756-994X (Linking) VI - 15 IP - 1 DP - 2023 Jul 7 TI - Mutation-Attention (MuAt): deep representation learning of somatic mutations for tumour typing and subtyping. PG - 47 LID - 10.1186/s13073-023-01204-4 [doi] LID - 47 AB - BACKGROUND: Cancer genome sequencing enables accurate classification of tumours and tumour subtypes. However, prediction performance is still limited using exome-only sequencing and for tumour types with low somatic mutation burden such as many paediatric tumours. Moreover, the ability to leverage deep representation learning in discovery of tumour entities remains unknown. METHODS: We introduce here Mutation-Attention (MuAt), a deep neural network to learn representations of simple and complex somatic alterations for prediction of tumour types and subtypes. In contrast to many previous methods, MuAt utilizes the attention mechanism on individual mutations instead of aggregated mutation counts. RESULTS: We trained MuAt models on 2587 whole cancer genomes (24 tumour types) from the Pan-Cancer Analysis of Whole Genomes (PCAWG) and 7352 cancer exomes (20 types) from the Cancer Genome Atlas (TCGA). MuAt achieved prediction accuracy of 89% for whole genomes and 64% for whole exomes, and a top-5 accuracy of 97% and 90%, respectively. MuAt models were found to be well-calibrated and perform well in three independent whole cancer genome cohorts with 10,361 tumours in total. We show MuAt to be able to learn clinically and biologically relevant tumour entities including acral melanoma, SHH-activated medulloblastoma, SPOP-associated prostate cancer, microsatellite instability, POLE proofreading deficiency, and MUTYH-associated pancreatic endocrine tumours without these tumour subtypes and subgroups being provided as training labels. Finally, scrunity of MuAt attention matrices revealed both ubiquitous and tumour-type specific patterns of simple and complex somatic mutations. CONCLUSIONS: Integrated representations of somatic alterations learnt by MuAt were able to accurately identify histological tumour types and identify tumour entities, with potential to impact precision cancer medicine. CI - (c) 2023. The Author(s). FAU - Sanjaya, Prima AU - Sanjaya P AD - Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland. AD - Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland. AD - iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland. FAU - Maljanen, Katri AU - Maljanen K AD - Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland. AD - Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland. AD - iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland. FAU - Katainen, Riku AU - Katainen R AD - Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland. AD - Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland. AD - iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland. AD - Department of Medical and Clinical Genetics, Faculty of Medicine, University of Helsinki, Helsinki, Finland. FAU - Waszak, Sebastian M AU - Waszak SM AD - Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo and Oslo University Hospital, Oslo, Norway. AD - Swiss Institute for Experimental Cancer Research School of Life Sciences, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland. AD - Department of Neurology, University of California, San Francisco (UCSF), San Francisco, CA, USA. CN - Genomics England Research Consortium FAU - Aaltonen, Lauri A AU - Aaltonen LA AD - Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland. AD - Department of Medical and Clinical Genetics, Faculty of Medicine, University of Helsinki, Helsinki, Finland. FAU - Stegle, Oliver AU - Stegle O AD - Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany. AD - Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany. FAU - Korbel, Jan O AU - Korbel JO AD - Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany. AD - Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany. AD - European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK. FAU - Pitkanen, Esa AU - Pitkanen E AUID- ORCID: 0000-0002-9818-6370 AD - Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland. esa.pitkanen@helsinki.fi. AD - Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland. esa.pitkanen@helsinki.fi. AD - iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland. esa.pitkanen@helsinki.fi. AD - Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany. esa.pitkanen@helsinki.fi. LA - eng GR - 322675/Academy of Finland/ GR - 328890/Academy of Finland/ GR - 187615/Norges Forskningsrad/ PT - Journal Article DEP - 20230707 PL - England TA - Genome Med JT - Genome medicine JID - 101475844 SB - IM MH - *Neoplasms/genetics/pathology MH - Humans MH - Deep Learning MH - Benchmarking MH - *Mutation PMC - PMC10326961 OTO - NOTNLM OT - Cancer genomics OT - Deep learning OT - Deep neural networks OT - Machine learning OT - Molecular tumour subtyping OT - Precision cancer medicine OT - Somatic mutations OT - Tumour type prediction OT - Whole exome sequencing OT - Whole genome sequencing COIS- The authors declare that they have no competing interests. FIR - Ambrose, J C IR - Ambrose JC FIR - Arumugam, P IR - Arumugam P FIR - Bevers, R IR - Bevers R FIR - Bleda, M IR - Bleda M FIR - Boardman-Pretty, F IR - Boardman-Pretty F FIR - Boustred, C R IR - Boustred CR FIR - Brittain, H IR - Brittain H FIR - Brown, M A IR - Brown MA FIR - Caulfield, M J IR - Caulfield MJ FIR - Chan, G C IR - Chan GC FIR - Giess, A IR - Giess A FIR - Griffin, J N IR - Griffin JN FIR - Hamblin, A IR - Hamblin A FIR - Henderson, S IR - Henderson S FIR - Hubbard, T J P IR - Hubbard TJP FIR - Jackson, R IR - Jackson R FIR - Jones, L J IR - Jones LJ FIR - Kasperaviciute, D IR - Kasperaviciute D FIR - Kayikci, M IR - Kayikci M FIR - Kousathanas, A IR - Kousathanas A FIR - Lahnstein, L IR - Lahnstein L FIR - Lakey, A IR - Lakey A FIR - Leigh, S E A IR - Leigh SEA FIR - Leong, I U S IR - Leong IUS FIR - Leong, F J IR - Leong FJ FIR - Maleady-Crowe, F IR - Maleady-Crowe F FIR - McEntagart, M IR - McEntagart M FIR - Minneci, F IR - Minneci F FIR - Mitchell, J IR - Mitchell J FIR - Moutsianas, L IR - Moutsianas L FIR - Mueller, M IR - Mueller M FIR - Murugaesu, N IR - Murugaesu N FIR - Need, A C IR - Need AC FIR - O'Donovan, P IR - O'Donovan P FIR - Odhams, C A IR - Odhams CA FIR - Patch, C IR - Patch C FIR - Perez-Gil, D IR - Perez-Gil D FIR - Perez-Gil, M B IR - Perez-Gil MB FIR - Pullinger, J IR - Pullinger J FIR - Rahim, T IR - Rahim T FIR - Rendon, A IR - Rendon A FIR - Rogers, T IR - Rogers T FIR - Savage, K IR - Savage K FIR - Sawant, K IR - Sawant K FIR - Scott, R H IR - Scott RH FIR - Siddiq, A IR - Siddiq A FIR - Siddiq, A IR - Siddiq A FIR - Smith, S C IR - Smith SC FIR - Sosinsky, A IR - Sosinsky A FIR - Stuckey, A IR - Stuckey A FIR - Tanguy, M IR - Tanguy M FIR - Taylor Tavares, A L IR - Taylor Tavares AL FIR - Thomas, E R A IR - Thomas ERA FIR - Thompson, S R IR - Thompson SR FIR - Tucci, A IR - Tucci A FIR - Welland, M J IR - Welland MJ FIR - Williams, E IR - Williams E FIR - Witkowska, K IR - Witkowska K FIR - Wood, S M IR - Wood SM FIR - Zarowiecki, M IR - Zarowiecki M EDAT- 2023/07/08 10:42 MHDA- 2023/07/10 06:42 PMCR- 2023/07/07 CRDT- 2023/07/07 23:38 PHST- 2022/06/19 00:00 [received] PHST- 2023/06/21 00:00 [accepted] PHST- 2023/07/10 06:42 [medline] PHST- 2023/07/08 10:42 [pubmed] PHST- 2023/07/07 23:38 [entrez] PHST- 2023/07/07 00:00 [pmc-release] AID - 10.1186/s13073-023-01204-4 [pii] AID - 1204 [pii] AID - 10.1186/s13073-023-01204-4 [doi] PST - epublish SO - Genome Med. 2023 Jul 7;15(1):47. doi: 10.1186/s13073-023-01204-4.