PMID- 27080919 OWN - NLM STAT- MEDLINE DCOM- 20161213 LR - 20181202 IS - 1098-2272 (Electronic) IS - 0741-0395 (Print) IS - 0741-0395 (Linking) VI - 40 IP - 4 DP - 2016 May TI - An Object-Oriented Regression for Building Disease Predictive Models with Multiallelic HLA Genes. PG - 315-32 LID - 10.1002/gepi.21968 [doi] AB - Recent genome-wide association studies confirm that human leukocyte antigen (HLA) genes have the strongest associations with several autoimmune diseases, including type 1 diabetes (T1D), providing an impetus to reduce this genetic association to practice through an HLA-based disease predictive model. However, conventional model-building methods tend to be suboptimal when predictors are highly polymorphic with many rare alleles combined with complex patterns of sequence homology within and between genes. To circumvent this challenge, we describe an alternative methodology; treating complex genotypes of HLA genes as "objects" or "exemplars," one focuses on systemic associations of disease phenotype with "objects" via similarity measurements. Conceptually, this approach assigns disease risks base on complex genotype profiles instead of specific disease-associated genotypes or alleles. Effectively, it transforms large, discrete, and sparse HLA genotypes into a matrix of similarity-based covariates. By the Kernel representative theorem and machine learning techniques, it uses a penalized likelihood method to select disease-associated exemplars in building predictive models. To illustrate this methodology, we apply it to a T1D study with eight HLA genes (HLA-DRB1, HLA-DRB3, HLA-DRB4, HLA-DRB5, HLA-DQA1, HLA-DQB1, HLA-DPA1, and HLA-DPB1) to build a predictive model. The resulted predictive model has an area under curve of 0.92 in the training set, and 0.89 in the validating set, indicating that this methodology is useful to build predictive models with complex HLA genotypes. CI - (c) 2016 WILEY PERIODICALS, INC. FAU - Zhao, Lue Ping AU - Zhao LP AD - Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America. AD - Department of Biostatistics, University of Washington School of Public Health, Seattle, Washington, United States of America. FAU - Bolouri, Hamid AU - Bolouri H AD - Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America. FAU - Zhao, Michael AU - Zhao M AD - Bellevue High School, Seattle, Washington, United States of America. FAU - Geraghty, Daniel E AU - Geraghty DE AD - Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America. FAU - Lernmark, Ake AU - Lernmark A AD - Department of Clinical Sciences, Lund University/CRC, Skane University Hospital, Malmo, Sweden. CN - Better Diabetes Diagnosis Study Group LA - eng GR - R01 DK026190/DK/NIDDK NIH HHS/United States GR - DK26190/DK/NIDDK NIH HHS/United States GR - P01 DK053004/DK/NIDDK NIH HHS/United States GR - UC4 DK063861/DK/NIDDK NIH HHS/United States GR - U01 DK063861/DK/NIDDK NIH HHS/United States GR - DK63861/DK/NIDDK NIH HHS/United States PT - Journal Article PT - Research Support, N.I.H., Extramural PT - Research Support, Non-U.S. Gov't PL - United States TA - Genet Epidemiol JT - Genetic epidemiology JID - 8411723 RN - 0 (HLA Antigens) SB - IM MH - *Alleles MH - Diabetes Mellitus, Type 1/*genetics MH - Genome-Wide Association Study MH - Genotype MH - HLA Antigens/*genetics MH - Humans MH - Likelihood Functions MH - Linear Models MH - *Models, Genetic MH - Reproducibility of Results PMC - PMC4834870 MID - NIHMS766725 OTO - NOTNLM OT - generalized linear model OT - kernel machine OT - multiallelic genotypes OT - penalized regression OT - prediction OT - similarity regression OT - statistical learning COIS- Conflict of Interest Authors declare that there is no conflict of interest with this work. EDAT- 2016/04/16 06:00 MHDA- 2016/12/15 06:00 PMCR- 2017/05/01 CRDT- 2016/04/16 06:00 PHST- 2015/12/03 00:00 [received] PHST- 2016/02/11 00:00 [revised] PHST- 2016/02/17 00:00 [accepted] PHST- 2016/04/16 06:00 [entrez] PHST- 2016/04/16 06:00 [pubmed] PHST- 2016/12/15 06:00 [medline] PHST- 2017/05/01 00:00 [pmc-release] AID - 10.1002/gepi.21968 [doi] PST - ppublish SO - Genet Epidemiol. 2016 May;40(4):315-32. doi: 10.1002/gepi.21968.