PMID- 31066711 OWN - NLM STAT- PubMed-not-MEDLINE LR - 20200930 IS - 1929-0748 (Print) IS - 1929-0748 (Electronic) IS - 1929-0748 (Linking) VI - 8 IP - 5 DP - 2019 May 7 TI - The Adverse Drug Reactions From Patient Reports in Social Media Project: Protocol for an Evaluation Against a Gold Standard. PG - e11448 LID - 10.2196/11448 [doi] LID - e11448 AB - BACKGROUND: Social media is a potential source of information on postmarketing drug safety surveillance that still remains unexploited nowadays. Information technology solutions aiming at extracting adverse reactions (ADRs) from posts on health forums require a rigorous evaluation methodology if their results are to be used to make decisions. First, a gold standard, consisting of manual annotations of the ADR by human experts from the corpus extracted from social media, must be implemented and its quality must be assessed. Second, as for clinical research protocols, the sample size must rely on statistical arguments. Finally, the extraction methods must target the relation between the drug and the disease (which might be either treated or caused by the drug) rather than simple co-occurrences in the posts. OBJECTIVE: We propose a standardized protocol for the evaluation of a software extracting ADRs from the messages on health forums. The study is conducted as part of the Adverse Drug Reactions from Patient Reports in Social Media project. METHODS: Messages from French health forums were extracted. Entity recognition was based on Racine Pharma lexicon for drugs and Medical Dictionary for Regulatory Activities terminology for potential adverse events (AEs). Natural language processing-based techniques automated the ADR information extraction (relation between the drug and AE entities). The corpus of evaluation was a random sample of the messages containing drugs and/or AE concepts corresponding to recent pharmacovigilance alerts. A total of 2 persons experienced in medical terminology manually annotated the corpus, thus creating the gold standard, according to an annotator guideline. We will evaluate our tool against the gold standard with recall, precision, and f-measure. Interannotator agreement, reflecting gold standard quality, will be evaluated with hierarchical kappa. Granularities in the terminologies will be further explored. RESULTS: Necessary and sufficient sample size was calculated to ensure statistical confidence in the assessed results. As we expected a global recall of 0.5, we needed at least 384 identified ADR concepts to obtain a 95% CI with a total width of 0.10 around 0.5. The automated ADR information extraction in the corpus for evaluation is already finished. The 2 annotators already completed the annotation process. The analysis of the performance of the ADR information extraction module as compared with gold standard is ongoing. CONCLUSIONS: This protocol is based on the standardized statistical methods from clinical research to create the corpus, thus ensuring the necessary statistical power of the assessed results. Such evaluation methodology is required to make the ADR information extraction software useful for postmarketing drug safety surveillance. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR1-10.2196/11448. CI - (c)Armelle Arnoux-Guenegou, Yannick Girardeau, Xiaoyi Chen, Myrtille Deldossi, Rim Aboukhamis, Carole Faviez, Badisse Dahamna, Pierre Karapetiantz, Sylvie Guillemin-Lanne, Agnes Lillo-Le Louet, Nathalie Texier, Anita Burgun, Sandrine Katsahian. Originally published in JMIR Research Protocols (http://www.researchprotocols.org), 07.05.2019. FAU - Arnoux-Guenegou, Armelle AU - Arnoux-Guenegou A AUID- ORCID: 0000-0003-3427-7086 AD - INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France. FAU - Girardeau, Yannick AU - Girardeau Y AUID- ORCID: 0000-0003-3980-0104 AD - INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France. AD - Departement d'Informatique Medicale, Hopital Europeen Georges-Pompidou, Assistance Publique - Hopitaux de Paris, Paris, France. FAU - Chen, Xiaoyi AU - Chen X AUID- ORCID: 0000-0002-7378-5158 AD - INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France. FAU - Deldossi, Myrtille AU - Deldossi M AUID- ORCID: 0000-0001-8268-3922 AD - Innovative Projects - Text Mining, Expert System, Paris, France. FAU - Aboukhamis, Rim AU - Aboukhamis R AUID- ORCID: 0000-0003-3066-0125 AD - Centre Regional de Pharmacovigilance, Hopital Europeen Georges-Pompidou, Assistance Publique - Hopitaux de Paris, Paris, France. FAU - Faviez, Carole AU - Faviez C AUID- ORCID: 0000-0002-1500-0236 AD - Kappa Sante, Paris, France. FAU - Dahamna, Badisse AU - Dahamna B AUID- ORCID: 0000-0003-0762-2518 AD - Service d'Informatique Biomedicale, D2IM, Centre Hospitalier Universitaire de Rouen, Rouen, France. FAU - Karapetiantz, Pierre AU - Karapetiantz P AUID- ORCID: 0000-0001-6486-9838 AD - INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France. FAU - Guillemin-Lanne, Sylvie AU - Guillemin-Lanne S AUID- ORCID: 0000-0003-3528-2514 AD - Innovative Projects - Text Mining, Expert System, Paris, France. FAU - Lillo-Le Louet, Agnes AU - Lillo-Le Louet A AUID- ORCID: 0000-0001-8135-7340 AD - Centre Regional de Pharmacovigilance, Hopital Europeen Georges-Pompidou, Assistance Publique - Hopitaux de Paris, Paris, France. FAU - Texier, Nathalie AU - Texier N AUID- ORCID: 0000-0003-3749-254X AD - Kappa Sante, Paris, France. FAU - Burgun, Anita AU - Burgun A AUID- ORCID: 0000-0001-6855-4366 AD - INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France. AD - Departement d'Informatique Medicale, Hopital Europeen Georges-Pompidou, Assistance Publique - Hopitaux de Paris, Paris, France. AD - INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Paris Descartes University, Sorbonne Paris Cite, Paris, France. FAU - Katsahian, Sandrine AU - Katsahian S AUID- ORCID: 0000-0002-7261-0671 AD - INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France. AD - INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Paris Descartes University, Sorbonne Paris Cite, Paris, France. AD - Clinical Research Unit Hopitaux Universitaires Paris Ouest, Hopital Europeen Georges-Pompidou, Assistance Publique - Hopitaux de Paris, Paris, France. AD - INSERM CIC1418, Clinical Epidemiology, Hopital Europeen Georges-Pompidou, Paris, France. LA - eng PT - Journal Article DEP - 20190507 PL - Canada TA - JMIR Res Protoc JT - JMIR research protocols JID - 101599504 PMC - PMC6528435 OTO - NOTNLM OT - MedDRA OT - Racine Pharma OT - data mining OT - drug-related side effects and adverse reactions OT - natural language processing OT - social media COIS- Conflicts of Interest: None declared. EDAT- 2019/05/09 06:00 MHDA- 2019/05/09 06:01 PMCR- 2019/05/07 CRDT- 2019/05/09 06:00 PHST- 2018/06/29 00:00 [received] PHST- 2018/12/21 00:00 [accepted] PHST- 2018/11/16 00:00 [revised] PHST- 2019/05/09 06:00 [entrez] PHST- 2019/05/09 06:00 [pubmed] PHST- 2019/05/09 06:01 [medline] PHST- 2019/05/07 00:00 [pmc-release] AID - v8i5e11448 [pii] AID - 10.2196/11448 [doi] PST - epublish SO - JMIR Res Protoc. 2019 May 7;8(5):e11448. doi: 10.2196/11448.