PMID- 20121045 OWN - NLM STAT- MEDLINE DCOM- 20100628 LR - 20220409 IS - 1549-960X (Electronic) IS - 1549-9596 (Linking) VI - 50 IP - 3 DP - 2010 Mar 22 TI - Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. PG - 339-48 LID - 10.1021/ci900450m [doi] AB - Modern drug discovery organizations generate large volumes of SAR data. A promising methodology that can be used to mine this chemical data to identify novel structure-activity relationships is the matched molecular pair (MMP) methodology. However, before the full potential of the MMP methodology can be utilized, a MMP identification method that is capable of identifying all MMPs in large chemical data sets on modest computational hardware is required. In this paper we report an algorithm that is capable of systematically generating all MMPs in chemical data sets. Additionally, the algorithm is computationally efficient enough to be applied on large data sets. As an example the algorithm was used to identify the MMPs in the approximately 300k NIH MLSMR set. The algorithm identified approximately 5.3 million matched molecular pairs in the set. These pairs cover approximately 2.6 million unique molecular transformations. FAU - Hussain, Jameed AU - Hussain J AD - Computational & Structural Chemistry, GlaxoSmithKline, Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire, U.K. jameed.x.hussain@gsk.com FAU - Rea, Ceara AU - Rea C LA - eng PT - Journal Article PL - United States TA - J Chem Inf Model JT - Journal of chemical information and modeling JID - 101230060 SB - IM MH - *Algorithms MH - *Databases, Factual MH - Drug Discovery/*methods MH - Structure-Activity Relationship EDAT- 2010/02/04 06:00 MHDA- 2010/06/29 06:00 CRDT- 2010/02/04 06:00 PHST- 2010/02/04 06:00 [entrez] PHST- 2010/02/04 06:00 [pubmed] PHST- 2010/06/29 06:00 [medline] AID - 10.1021/ci900450m [doi] PST - ppublish SO - J Chem Inf Model. 2010 Mar 22;50(3):339-48. doi: 10.1021/ci900450m.