Event: WORLDSymposium 2024 San Diego, USA
Authors: M. Domenica Cappellini, R. Giugliani, M. Törnqvist, P. Guilmin, C.Clémente, M. Montmerle, A. Chiorean, T. Reppelin, Stefaan Sansen, Alexandra Dumitriu, Neha Shah, Maja Gasparic
Abstract
Acid sphingomyelinase deficiency (ASMD) is a rare and debilitating lysosomal storage disease and delays in diagnosis are common.
We employed machine learning (ML) on electronic health records (EHRs) to develop a data-driven decision tree (algorithm) for identifying high-risk patients for ASMD based on clinical and laboratory traits. EHRs from Optum’s de-identified Market Clarity Data (2007-2020), were utilized for algorithm training. To generate the decision tree, the available ASMD cohort (N=51) was enriched with 199 clinical characteristics and 11 laboratory measurements, and a matched control cohort was extracted at a 1:20 ratio (N=1020).
The resulting decision tree made use of a combination of four laboratory measurements (HDL cholesterol, LDL cholesterol, platelet count and triglycerides) and one clinical feature (hydrocephalus). It distinguished ASMD from the matched control population with a sensitivity of ~80% and specificity of >99%.
We internally validated this decision tree and compared its performance to a published clinical algorithm (McGovern, 2017) from a more recent version of the same database (data coverage up to January 2023).
For this validation step, 5 newly diagnosed ASMD patients, and 250,000 randomly sampled controls (non-overlapping with the training cohort) were included. The ML-derived decision tree correctly flagged 2/5 ASMD patients (40% sensitivity), while also flagging 1,763/250,000 controls (0.61%; specificity >99%).
While the sensitivity of the clinical algorithm matched that of the decision tree, the clinical algorithm exhibited lower specificity (3,733/250,000; specificity 98.5%). The two approaches had the same sensitivity, whereby the ML-derived decision tree flagged fewer control patients.
However, due to the limited number of ASMD patients available in the validation cohort, further assessment in another independent EHR is advisable.
This study is funded by Sanofi.