PMID- 33431009 OWN - NLM STAT- PubMed-not-MEDLINE LR - 20210115 IS - 1758-2946 (Print) IS - 1758-2946 (Electronic) IS - 1758-2946 (Linking) VI - 12 IP - 1 DP - 2020 Jan 22 TI - Development of Natural Compound Molecular Fingerprint (NC-MFP) with the Dictionary of Natural Products (DNP) for natural product-based drug development. PG - 6 LID - 10.1186/s13321-020-0410-3 [doi] LID - 6 AB - Computer-aided research on the relationship between molecular structures of natural compounds (NC) and their biological activities have been carried out extensively because the molecular structures of new drug candidates are usually analogous to or derived from the molecular structures of NC. In order to express the relationship physically realistically using a computer, it is essential to have a molecular descriptor set that can adequately represent the characteristics of the molecular structures belonging to the NC's chemical space. Although several topological descriptors have been developed to describe the physical, chemical, and biological properties of organic molecules, especially synthetic compounds, and have been widely used for drug discovery researches, these descriptors have limitations in expressing NC-specific molecular structures. To overcome this, we developed a novel molecular fingerprint, called Natural Compound Molecular Fingerprints (NC-MFP), for explaining NC structures related to biological activities and for applying the same for the natural product (NP)-based drug development. NC-MFP was developed to reflect the structural characteristics of NCs and the commonly used NP classification system. NC-MFP is a scaffold-based molecular fingerprint method comprising scaffolds, scaffold-fragment connection points (SFCP), and fragments. The scaffolds of the NC-MFP have a hierarchical structure. In this study, we introduce 16 structural classes of NPs in the Dictionary of Natural Product database (DNP), and the hierarchical scaffolds of each class were calculated using the Bemis and Murko (BM) method. The scaffold library in NC-MFP comprises 676 scaffolds. To compare how well the NC-MFP represents the structural features of NCs compared to the molecular fingerprints that have been widely used for organic molecular representation, two kinds of binary classification tasks were performed. Task I is a binary classification of the NCs in commercially available library DB into a NC or synthetic compound. Task II is classifying whether NCs with inhibitory activity in seven biological target proteins are active or inactive. Two tasks were developed with some molecular fingerprints, including NC-MFP, using the 1-nearest neighbor (1-NN) method. The performance of task I showed that NC-MFP is a practical molecular fingerprint to classify NC structures from the data set compared with other molecular fingerprints. Performance of task II with NC-MFP outperformed compared with other molecular fingerprints, suggesting that the NC-MFP is useful to explain NC structures related to biological activities. In conclusion, NC-MFP is a robust molecular fingerprint in classifying NC structures and explaining the biological activities of NC structures. Therefore, we suggest NC-MFP as a potent molecular descriptor of the virtual screening of NC for natural product-based drug development. FAU - Seo, Myungwon AU - Seo M AUID- ORCID: 0000-0002-1974-4902 AD - Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea. FAU - Shin, Hyun Kil AU - Shin HK AUID- ORCID: 0000-0003-3665-0841 AD - Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon, Republic of Korea. FAU - Myung, Yoochan AU - Myung Y AUID- ORCID: 0000-0002-6763-9808 AD - Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, 3010, Australia. FAU - Hwang, Sungbo AU - Hwang S AUID- ORCID: 0000-0002-1610-5259 AD - Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea. AD - Bioinformatics and Molecular Design Research Center, Yonsei Engineering Research Park, Seoul, Republic of Korea. FAU - No, Kyoung Tai AU - No KT AUID- ORCID: 0000-0003-3187-8193 AD - Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea. ktno@yonsei.ac.kr. AD - Bioinformatics and Molecular Design Research Center, Yonsei Engineering Research Park, Seoul, Republic of Korea. ktno@yonsei.ac.kr. LA - eng PT - Journal Article DEP - 20200122 PL - England TA - J Cheminform JT - Journal of cheminformatics JID - 101516718 PMC - PMC6977316 OTO - NOTNLM OT - Dictionary of Natural Product database (DNP) OT - Molecular descriptor OT - Natural compound (NC) OT - Natural product (NP) OT - Natural product-based drug development OT - Virtual screening COIS- The authors declare they have no competing interests. EDAT- 2021/01/13 06:00 MHDA- 2021/01/13 06:01 PMCR- 2020/01/22 CRDT- 2021/01/12 05:44 PHST- 2019/10/16 00:00 [received] PHST- 2020/01/11 00:00 [accepted] PHST- 2021/01/12 05:44 [entrez] PHST- 2021/01/13 06:00 [pubmed] PHST- 2021/01/13 06:01 [medline] PHST- 2020/01/22 00:00 [pmc-release] AID - 10.1186/s13321-020-0410-3 [pii] AID - 410 [pii] AID - 10.1186/s13321-020-0410-3 [doi] PST - epublish SO - J Cheminform. 2020 Jan 22;12(1):6. doi: 10.1186/s13321-020-0410-3.