Comparative Evaluation of Sequence-Encoding Strategies for Clustering Inhaled Therapy Pathways in COPD Patients Using Real-World Data

Poster

2025, 10 November

Event: ISPOR Europe 2025, Glasgow, Scotland, UK

Authors: Romane Péan, Nina Temam, Marie Génin, Diane Vincent, Rachel Nadif, Sofiane Kab, Nicolas Roche, Pauline Guilmin

OBJECTIVES

Identifying and comparing patient treatment pathways is critical to inform healthcare decision-making, yet the high variability and complexity of real-world sequences pose methodological challenges.

In clustering, sequence encoding plays a pivotal role, but no consensus exists on the optimal strategy.

This study compares three encoding approaches applied to real-world therapeutic sequences in COPD to assess their ability to generate clinically meaningful patient clusters.

METHODS

Data were derived from the French CONSTANCES cohort linked to the national health claims database (SNDS). Participants were classified as COPD via spirometry or questionnaires and their five-year sequences of inhaled maintenance therapies (mono-, bi- or triple therapy) mapped using ATC level 7 codes were extracted.

Temporal rules captured therapy overlaps and durations. Three encoding approaches were applied:

A) SeqMining: frequent subsequence extraction via SPADE algorithm, generating binary feature vectors.

B) SeqToChar: character string representation of sequences with Jaro distance for pairwise similarity.

C) Autoencoder: deep learning model producing continuous embeddings in a reduced latent space.

Each encoding fed a k-medoids clustering, with cluster validity assessed via silhouette scores and UMAP projections.

The best-performing method was further examined through trajectory visualizations to assess clinical interpretability.

RESULTS

Among 4,982 participants with COPD, 1,926 met the five-year follow-up and treatment criteria.

They had two therapeutic combinations on average, with 90% receiving inhaled corticosteroids.

All encoding methods yielded interpretable clusters. SeqToChar achieved the highest silhouette score (0.63 vs. 0.54 for SeqMining, 0.58 for Autoencoder), and visual inspection suggested coherent clinical patterns.

CONCLUSIONS

Although SeqToChar showed a slight advantage, performance differences across encoding methods remained limited.

Relying solely on technical metrics may not sufficiently support method selection.

Introducing predefined clinical relevance criteria could help assess clustering quality from both methodological and real-world perspectives, offering a more comprehensive basis for interpreting patient trajectories in HTA settings.

Let’s bring science to impact together

Whether you’re interested in our work, looking to co-publish, or exploring to explore how
our insights can support your objectives, our team is here to connect.