PMID- 34902551 OWN - NLM STAT- MEDLINE DCOM- 20220202 LR - 20230102 IS - 1532-0480 (Electronic) IS - 1532-0464 (Print) IS - 1532-0464 (Linking) VI - 125 DP - 2022 Jan TI - Deep graph convolutional network for US birth data harmonization. PG - 103974 LID - S1532-0464(21)00303-8 [pii] LID - 10.1016/j.jbi.2021.103974 [doi] AB - In this paper, we developed a feasible and efficient deep-learning-based framework to combine the United States (US) natality data for the last five decades, with changing variables and factors, into a consistent database. We constructed a graph based on the property and elements of databases, including variables, and conducted a graph convolutional network (GCN) to learn the embeddings of variables on the constructed graph, where the learned embeddings implied the similarity of variables. Specifically, we devised a loss function with a slack margin and a banlist mechanism (for a random walk) to learn the desired structure (two nodes sharing more information were more similar to each other.), and developed an active learning mechanism to conduct the harmonization. Toward a total of 9,321 variables from 49 databases (i.e., 783 stemmed variables, from 1970 to 2018), we applied our model iteratively together with human reviews for four rounds, then obtained 323 hyperchains of variables. During the harmonization, the first round of our model achieved recall and precision of 87.56%, 57.70%, respectively. Our harmonized graph neural network (HGNN) method provides a feasible and efficient way to connect relevant databases at a meta-level. Adapting to the database's property and characteristics, HGNN can learn patterns globally, which is powerful to discover the similarity between variables among databases. Our proposed method provides an effective way to reduce the manual effort in database harmonization and integration of fragmented data into useful databases for future research. CI - Copyright (c) 2021 Elsevier Inc. All rights reserved. FAU - Yu, Lishan AU - Yu L AD - School of Biomedical Informatics, UTHealth, Houston, TX, USA; Yau Mathematical Sciences Center, Tsinghua University, Beijing, China; Beijing Institute Mathematical Sciences and Applications, Beijing, China. FAU - Salihu, Hamisu M AU - Salihu HM AD - Department of Family and Community Medicine, Baylor College of Medicine, Houston, TX, USA; Center of Excellence in Health Equity, Training, and Research, Baylor College of Medicine, Houston, TX, USA. FAU - Dongarwar, Deepa AU - Dongarwar D AD - Center of Excellence in Health Equity, Training, and Research, Baylor College of Medicine, Houston, TX, USA. FAU - Chen, Luyao AU - Chen L AD - School of Biomedical Informatics, UTHealth, Houston, TX, USA. FAU - Jiang, Xiaoqian AU - Jiang X AD - School of Biomedical Informatics, UTHealth, Houston, TX, USA. LA - eng GR - R01 AG066749/AG/NIA NIH HHS/United States GR - R01 GM114612/GM/NIGMS NIH HHS/United States GR - U01 TR002062/TR/NCATS NIH HHS/United States PT - Journal Article PT - Research Support, N.I.H., Extramural PT - Research Support, Non-U.S. Gov't PT - Research Support, U.S. Gov't, Non-P.H.S. DEP - 20211210 PL - United States TA - J Biomed Inform JT - Journal of biomedical informatics JID - 100970413 SB - IM MH - Databases, Factual MH - Humans MH - *Neural Networks, Computer MH - United States PMC - PMC8766952 MID - NIHMS1766202 OTO - NOTNLM OT - Database harmonization OT - Deep learning OT - Graph neural network OT - Natality data COIS- CONFLICT OF INTEREST Authors have no conflict of interest to disclose. EDAT- 2021/12/14 06:00 MHDA- 2022/02/03 06:00 PMCR- 2023/01/01 CRDT- 2021/12/13 20:16 PHST- 2021/05/31 00:00 [received] PHST- 2021/09/03 00:00 [revised] PHST- 2021/12/04 00:00 [accepted] PHST- 2021/12/14 06:00 [pubmed] PHST- 2022/02/03 06:00 [medline] PHST- 2021/12/13 20:16 [entrez] PHST- 2023/01/01 00:00 [pmc-release] AID - S1532-0464(21)00303-8 [pii] AID - 10.1016/j.jbi.2021.103974 [doi] PST - ppublish SO - J Biomed Inform. 2022 Jan;125:103974. doi: 10.1016/j.jbi.2021.103974. Epub 2021 Dec 10.