Event: EuroDURG 2025, Uppsala, Sweden
Authors: Ghinwa Y. Hayek, Pascal Godbillot, Coriande Clemente, Sonia Zebachi, Gaëtan Pinon, Boris Kopin, Elisabeth Bakker, Sieta T. de Vries, Peter G.M. Mol, Billy Amzal
Abstract
The selection of appropriate real-world data (RWD) sources, particularly registries, is of primary concern for academia, industry, regulators, and health technology assessment bodies since the success of regulatory submissions depends on the quality and relevance of the data. The identification of a RWD appears as a lengthy and burdened process due to discoverability and accessibility gaps, notably the multiplicity of data catalogues, and unavailability of the metadata.
To address this gap, we developed an AI-driven tool to support identification, and selection of fit-for use registries throughout the drug development lifecycle.
The tool was developed focusing on three pillars: 1) Centralisation of data sources 2) Identification of relevant sources 3) Assessment of available metadata. For pillar 1: registries were extracted from HMA-EMA catalogues of real-world data sources, observational studies in Clinicaltrial.gov, and the published literature (PubMed and Semantic scholar). The available metadata were obtained and converged into a “common metadata model” based on PICOTS (population, intervention, comparator, time, setting). For pillar 2: a hybrid text search algorithm was created to capture context and intent behind the search query, to identify the most relevant sources. Further, filters based on population characteristics such as age group, geographical area and population size help the user narrow down their search. For pillar 3: a large language model was used to extract relevant information following PICOTS from unstructured data in publications.
This AI-powered tool is a comprehensive platform to identify adequate registries addressing diverse research questions throughout the entire drug development lifecycle.