PMID- 36416734 OWN - NLM STAT- MEDLINE DCOM- 20221125 LR - 20221220 IS - 1552-9924 (Electronic) IS - 0091-6765 (Print) IS - 0091-6765 (Linking) VI - 130 IP - 11 DP - 2022 Nov TI - Principal Component Pursuit for Pattern Identification in Environmental Mixtures. PG - 117008 LID - 10.1289/EHP10479 [doi] LID - 117008 AB - BACKGROUND: Environmental health researchers often aim to identify sources or behaviors that give rise to potentially harmful environmental exposures. OBJECTIVE: We adapted principal component pursuit (PCP)-a robust and well-established technique for dimensionality reduction in computer vision and signal processing-to identify patterns in environmental mixtures. PCP decomposes the exposure mixture into a low-rank matrix containing consistent patterns of exposure across pollutants and a sparse matrix isolating unique or extreme exposure events. METHODS: We adapted PCP to accommodate nonnegative data, missing data, and values below a given limit of detection (LOD). We simulated data to represent environmental mixtures of two sizes with increasing proportions < LOD and three noise structures. We applied PCP-LOD to evaluate its performance in comparison with principal component analysis (PCA). We next applied principal component pursuit with limit of detection (PCP-LOD) to an exposure mixture of 21 persistent organic pollutants (POPs) measured in 1,000 U.S. adults from the 2001-2002 National Health and Nutrition Examination Survey (NHANES). We applied singular value decomposition to the estimated low-rank matrix to characterize the patterns. RESULTS: PCP-LOD recovered the true number of patterns through cross-validation for all simulations; based on an a priori specified criterion, PCA recovered the true number of patterns in 32% of simulations. PCP-LOD achieved lower relative predictive error than PCA for all simulated data sets with up to 50% of the data < LOD. When 75% of values were < LOD, PCP-LOD outperformed PCA only when noise was low. In the POP mixture, PCP-LOD identified a rank-three underlying structure and separated 6% of values as extreme events. One pattern represented comprehensive exposure to all POPs. The other patterns grouped chemicals based on known structure and toxicity. DISCUSSION: PCP-LOD serves as a useful tool to express multidimensional exposures as consistent patterns that, if found to be related to adverse health, are amenable to targeted public health messaging. https://doi.org/10.1289/EHP10479. FAU - Gibson, Elizabeth A AU - Gibson EA AUID- ORCID: 0000-0001-5119-5133 AD - Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, New York, New York, USA. FAU - Zhang, Junhui AU - Zhang J AD - Department of Applied Physics and Applied Mathematics, Columbia University, New York, New York, USA. FAU - Yan, Jingkai AU - Yan J AUID- ORCID: 0000-0002-2094-2092 AD - Department of Electrical Engineering, Columbia University Data Science Institute, New York, New York, USA. FAU - Chillrud, Lawrence AU - Chillrud L AUID- ORCID: 0000-0003-0727-0161 AD - Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, New York, New York, USA. FAU - Benavides, Jaime AU - Benavides J AUID- ORCID: 0000-0002-1851-5155 AD - Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, New York, New York, USA. FAU - Nunez, Yanelli AU - Nunez Y AD - Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, New York, New York, USA. FAU - Herbstman, Julie B AU - Herbstman JB AD - Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, New York, New York, USA. FAU - Goldsmith, Jeff AU - Goldsmith J AD - Department of Biostatistics, Columbia University Mailman School of Public Health, New York, New York, USA. FAU - Wright, John AU - Wright J AD - Department of Electrical Engineering, Columbia University Data Science Institute, New York, New York, USA. FAU - Kioumourtzoglou, Marianthi-Anna AU - Kioumourtzoglou MA AD - Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, New York, New York, USA. LA - eng GR - F31 ES030263/ES/NIEHS NIH HHS/United States GR - R01 ES028805/ES/NIEHS NIH HHS/United States GR - P30 ES009089/ES/NIEHS NIH HHS/United States PT - Journal Article PT - Research Support, N.I.H., Extramural DEP - 20221123 PL - United States TA - Environ Health Perspect JT - Environmental health perspectives JID - 0330411 RN - 0 (Environmental Pollutants) SB - IM MH - Nutrition Surveys MH - *Environmental Pollutants/toxicity MH - Environmental Exposure/analysis MH - Principal Component Analysis MH - Public Health PMC - PMC9683097 EDAT- 2022/11/24 06:00 MHDA- 2022/11/26 06:00 PMCR- 2022/11/23 CRDT- 2022/11/23 10:33 PHST- 2022/11/23 10:33 [entrez] PHST- 2022/11/24 06:00 [pubmed] PHST- 2022/11/26 06:00 [medline] PHST- 2022/11/23 00:00 [pmc-release] AID - EHP10479 [pii] AID - 10.1289/EHP10479 [doi] PST - ppublish SO - Environ Health Perspect. 2022 Nov;130(11):117008. doi: 10.1289/EHP10479. Epub 2022 Nov 23.