Comparative Evaluation of Sequence-Encoding Strategies for Clustering Inhaled Therapy Pathways in COPD Patients Using Real-World Data

– 1 min read

At ISPOR Europe 2025, Marie Génin presented Quinten Health’s study comparing three sequence encoding strategies for clustering inhaled therapy pathways in COPD patients based on real-world data from CONSTANCES linked to SNDS.

Marie Génin presenting COPD inhaled therapy pathway clustering poster at ISPOR Europe 2025
Marie Génin presenting Quinten Health’s poster at ISPOR Europe 2025.

Why Encoding Matters

COPD treatment pathways are complex and heterogeneous. Machine-learning algorithms require numerical representations of sequences to group patients meaningfully. The study evaluated three approaches: SeqMining, SeqToChar, and a deep-learning Autoencoder.


Methods

1926 COPD patients with five-year follow-up were included. Treatment sequences (ICS, LABA, LAMA, dual or triple therapy) were processed using temporal rules. Each encoding method generated numerical vectors that were clustered via k-means. Silhouette scores and UMAP projections assessed cluster quality.


Results

SeqToChar achieved the highest silhouette score (0.63), compared with SeqMining (0.54) and the Autoencoder (0.58). Clusters were largely driven by the initial treatment, and clinical inspection showed heterogeneous patterns ranging from durable trajectories (dual or triple therapy) to low persistence and frequent switching.

The study emphasized that technical performance metrics alone are insufficient and should be complemented by clinical relevance assessments.


Implications

This work contributes to improving methodological choices for HTA, pharma, and research teams interpreting patient trajectories. Clinical interpretability remains essential to ensure meaningful insights for real-world decision-making.


Conclusion

Although SeqToChar achieved the highest silhouette score in this dataset, the differences with SeqMining and the Autoencoder were relatively small, and all three methods produced overall coherent and clinically interpretable clusters. Given this narrow performance margin, it is difficult to establish a clear consensus based solely on technical metrics.

A more comprehensive evaluation – integrating predefined clinical relevance criteria, interpretability considerations, and the specificities of the therapeutic context – is therefore essential. Such a balanced assessment provides a stronger foundation for analyzing COPD treatment pathways and for generating robust evidence to support HTA and real-world decision-making.


Official ISPOR poster

You can view the official ISPOR Europe 2025 poster here: Comparative evaluation of sequence encoding strategies for COPD inhaled therapy pathways .