Event: ELCC 2025, Paris, France
Authors: Antoine Movschin, Lise Bosquet, David Pérol, Melissa Rollot, Louise Dry, Coriande Clemente, Farah Al Nakib, Mathilde Berthelot, Margaux Törnqvist
Background
Advances in personalized medicine have driven targeted therapy for NSCLC through mutation identification, but their rarity in real-world (RW) settings limits traditional RCTs. External control arms (ECAs) offer an innovative solution to complement RCTs, though their reliability and regulatory acceptance remain challenging. To address this, we developed an AI-driven framework to simulate and evaluate virtual NSCLC cohorts for potential ECA enrichment.
Methods
Study populations were selected from ESME, a French RW Lung Cancer database from Unicancer. Selected patients were those with stage IV advanced metastatic non-squamous NSCLC diagnosed between 2013 and 2023, harboring either EGFR or KRAS mutations. Patients were characterized by clinical variables, medical history and treatment category in first-line of treatment (1L). ACox Proportional Hazard (CPH) model was trained to predict patient-level progression-free survival in 1L, and evaluated on an independent test set with metrics like Concordance Index and Integrated Brier Score. Then, patient baseline characteristics were simulated using a Gaussian copula, and time-to-event outcomes using the trained CPH and inverse transform sampling. Synthetic data were evaluated against real data using Kaplan-Meier plots for entire cohorts and ECOG-based subgroups. Statistical metrics assessed the fidelity of variables’ distribution, correlations and data privacy preservation.
Results
It showed that synthetic cohorts closely matched the statistical properties of the original EGFR cohort (N = 1,476 patients) and KRAS cohort (N = 2,772 patients). Survival curves were accurately replicated at population and subgroup levels, but CPH models showed limited individual predictive performance, restricting this approach to subpopulations with sufficient sample sizes.
Conclusions
Our innovative framework showed potential in disease modelling and virtual cohort simulation in key NSCLC mutated subpopulations. This virtual cohort simulator could enrich ECAs and increase statistical power to strengthen efficacy evidence, in situations where RCTs are not feasible and RWD are scarce or underrepresent certain subgroups.