At ISPOR Europe 2025, Antoine Movschin presented Quinten Health’s simulation-based comparison of machine-learning approaches for estimating individualized treatment effects (ITE). As healthcare increasingly relies on real-world data and personalized medicine, robust estimation of patient-level treatment effects is essential for research, clinical practice, and HTA decision-making.
Why ITE Estimation Matters
RCTs provide unbiased average treatment effects but lack granularity to assess individual treatment responses. Several causal inference approaches exist to estimate ITE from observational data, but no consensus exists to guide their selection or evaluation.
A Comprehensive Simulation Study
The study simulated diverse realistic healthcare scenarios based on constraints such as sample size, confounding strength, outcome imbalance, treatment prevalence, and levels of complexity in treatment effect heterogeneity.
Methods evaluated included: meta-learners (S-, T-, X-, DR-, R-learners), Causal Forest, Bayesian approaches (BART, BCF), deep learning models (GANITE, CEVAE, CFR-Wass), and a baseline adjusted difference in means (ADM).
Performance was assessed using “oracle” metrics (i.e. metrics that can only be computed if the true treatment effect is known) such as precision of estimating heterogeneous effects (PEHE), coverage, policy risk, as well as “observable” metrics (i.e. metrics that can be computed from the observed data), such as R-loss, Brier score, and approximations of the PEHE (IF-PEHE, PEHEnn) and of the policy risk.
Key Findings
Most ITE methods showed reasonable accuracy, but performance varied widely by scenario. In general, complex heterogeneity of treatment effect and small sample size were poorly handled by all methods. No single method dominated across all settings. In average, the best method largely outperformed the baseline ADM on PEHE. However, confidence intervals were often overly wide. Importantly, observable metrics (usable in real-world settings) were often inconsistent with oracle metrics.
Implications for Decision-Makers
The study offers practical guidance for selecting ITE modelling approaches depending on data structure and analytical goals.
This has direct implications for:
- HTA bodies evaluating heterogeneous treatment effects,
- pharmaceutical teams conducting subgroup analyses and benefit–risk assessments,
- research partners designing personalized treatment strategies.
Conclusion
While many machine-learning methods can estimate individualized treatment effects, their reliability depends on context and evaluation metrics. Method selection must therefore be scenario-specific to ensure robust, interpretable, and clinically meaningful results.
Access the official ISPOR poster
You can view the official ISPOR abstract and poster page at the link below.