An AI-Powered Tool to Identify and Assess Fit-for-Use Registries for Drug Development and Evaluation

Event: ISPOR Europe 2025, Glasgow, Scotland, UK

Authors: Ghinwa Y. Hayek, Boris Kopin, Sonia Zebachi, Gaëtan Pinon, Basile Ferry, Elisabeth Bakker, Alexandre Macquin, Sieta T. de Vries, Peter G.M. Mol, Billy Amzal 

OBJECTIVES

Selecting appropriate real-world data (RWD) sources, particularly registries, is a primary challenge for academia, industry, regulators, and health technology assessment (HTA) bodies, as successful submissions rely on data quality and relevance. The identification of a RWD source is often lengthy, and complex due to discoverability, and accessibility matters, notably the multiplicity of data catalogues, and unavailability of the metadata. Leveraging natural language processing techniques can address these challenges. In the context of the public-private More-EUROPA consortium, we propose an AI-powered tool to support identification, and selection of fit-for-use registries across the drug development lifecycle.

METHODS

The tool was developed around three key pillars:

1) Centralisation of data sources,

2) Assessment of available metadata,

3) Identification of relevant sources.

For pillar 1, registries were extracted from HMA-EMA catalogues of real-world data sources, observational studies in ClinicalTrial.gov, and published literature (PubMed and Semantic Scholar).

For pillar 2, metadata were normalised and converged into a “common metadata model” based on  PICOTS (population, intervention, comparator, outcome, time, setting). A Large Language Model (LLM) was applied to extract key information from unstructured publications data.

For pillar 3, a machine learning algorithm was developed to de-duplicate and identify registries across the four data sources of pillar 1.

RESULTS

The current AI-powered tool includes 245 registries extracted from the EMA catalogues, 8,300 observational studies, and 12,000 unique registries identified from 220,000 publications. During beta testing with consortium members, including regulators, HTA agencies, industry, and researchers, the trained LLM has demonstrated consistent and accurate extraction of PICOT-related metadata from publications.

CONCLUSIONS

The developed AI-powered tool is a comprehensive platform to identify adequate registries addressing diverse research questions across the drug development lifecycle. Future development phases will be co-designed with users, including regulators, HTA bodies, and industry, to ensure its practical adoption for evaluation purposes.

Let’s bring science to impact together

Whether you’re interested in our work, looking to co-publish, or exploring to explore how
our insights can support your objectives, our team is here to connect.

Whether you’re interested in our work, looking to co-publish, or exploring to explore how
our insights can support your objectives, our team is here to connect.