PMID- 38106027 OWN - NLM STAT- PubMed-not-MEDLINE LR - 20231229 DP - 2023 Dec 8 TI - Large-Scale Information Retrieval and Correction of Noisy Pharmacogenomic Datasets through Residual Thresholded Deep Matrix Factorization. LID - 2023.12.07.570723 [pii] LID - 10.1101/2023.12.07.570723 [doi] AB - Pharmacogenomics studies are attracting an increasing amount of interest from researchers in precision medicine. The advances in high-throughput experiments and multiplexed approaches allow the large-scale quantification of drug sensitivities in molecularly characterized cancer cell lines (CCLs), resulting in a number of open drug sensitivity datasets for drug biomarker discovery. However, a significant inconsistency in drug sensitivity values among these datasets has been noted. Such inconsistency indicates the presence of substantial noise, subsequently hindering downstream analyses. To address the noise in drug sensitivity data, we introduce a robust and scalable deep learning framework, Residual Thresholded Deep Matrix Factorization (RT-DMF). This method takes a single drug sensitivity data matrix as its sole input and outputs a corrected and imputed matrix. Deep Matrix Factorization (DMF) excels at uncovering subtle patterns, due to its minimal reliance on data structure assumptions. This attribute significantly boosts DMF's ability to identify complex hidden patterns among nuisance effects in the data, thereby facilitating the detection of signals that are therapeutically relevant. Furthermore, RT-DMF incorporates an iterative residual thresholding (RT) procedure, which plays a crucial role in retaining signals more likely to hold therapeutic importance. Validation using simulated datasets and real pharmacogenomics datasets demonstrates the effectiveness of our approach in correcting noise and imputing missing data in drug sensitivity datasets (open source package available at https://github.com/tomwhoooo/rtdmf). FAU - Hu, Zhiyue Tom AU - Hu ZT AD - Division of Biostatistics, University of California Berkeley, Berkeley, 94720, U.S.A. FAU - Yu, Yaodong AU - Yu Y AD - Department of Electrical Engineer and Computer Science, University of California Berkeley, Berkeley, 94720, U.S.A. FAU - Chen, Ruoqiao AU - Chen R AD - Department of Pharmacology and Toxicology, Michigan State University, 48824, U.S.A. FAU - Yeh, Shan-Ju AU - Yeh SJ AD - School of Medicine, National Tsing Hua University, Hsinchu, 300044, Taiwan R.O.C. FAU - Chen, Bin AU - Chen B AD - Department of Pharmacology and Toxicology, Michigan State University, 48824, U.S.A. AD - Department of Pediatrics and Human Development, Michigan State University, 48824, U.S.A. FAU - Huang, Haiyan AU - Huang H AD - Department of Statistics, University of California Berkeley, Berkeley, 94720, U.S.A. LA - eng GR - R01 GM134307/GM/NIGMS NIH HHS/United States PT - Preprint DEP - 20231208 PL - United States TA - bioRxiv JT - bioRxiv : the preprint server for biology JID - 101680187 PMC - PMC10723412 OTO - NOTNLM OT - Deep matrix factorization OT - Noisy data OT - Open-sourced OT - Pharmacogenomics datasets OT - drug sensitivity data EDAT- 2023/12/18 06:41 MHDA- 2023/12/18 06:42 PMCR- 2023/12/15 CRDT- 2023/12/18 04:32 PHST- 2023/12/18 06:41 [pubmed] PHST- 2023/12/18 06:42 [medline] PHST- 2023/12/18 04:32 [entrez] PHST- 2023/12/15 00:00 [pmc-release] AID - 2023.12.07.570723 [pii] AID - 10.1101/2023.12.07.570723 [doi] PST - epublish SO - bioRxiv [Preprint]. 2023 Dec 8:2023.12.07.570723. doi: 10.1101/2023.12.07.570723.