Quinten Health has developed a machine learning algorithm to identify misdiagnosed or undiagnosed episodes of whooping cough in patients diagnosed with acute respiratory illness. The team used elements from clinician notes such as observed symptoms, and demographic information from electronic care records from Optum Humedica, a US population-based database.
Although tetanus-diphtheria-acellular pertussis (Tdap) vaccines for adolescents and adults were licensed in 2005 and immunization strategies proposed, the incidence of pertussis in this population remains poorly understood, mainly due to the atypical expression of the disease.
We used real-life data to develop an algorithm to identify episodes of pertussis. The aim was to better estimate the disease burden and risk in adolescents and adults. This will make it possible to assess the cost-effectiveness of vaccination strategies to optimize protection.
Two cohorts were used to develop the model. One in which patients presented with pertussis episodes, the other without. We used LightGBM as a machine learning model to identify these episodes. We validated the model using cohorts with or without laboratory confirmation of pertussis. As for the explicability of the model, Shapley values, a method for showing the relative impact of each characteristic was used. The presence of different types of cough was shown, in particular persistent, severe and long-lasting cough, as well as vomiting after coughing, were identified as major factors explaining the occurrence of a pertussis episode.
Developed from a US population database, our algorithm is likely to be biased towards US clinical practice. Generalization to other countries may be difficult. Furthermore, to better differentiate between episodes of pertussis and acute respiratory infections, other characteristics such as physician specialty and recurrence of visits due to persistent cough could improve the algorithm’s performance. As the minimum duration of cough is an important clinical feature of this pathology, including it as a feature could also optimize the algorithm.
As a conclusion, thanks to the algorithm developed in this project, it is now possible to better identify undiagnosed episodes, which in turn can help to correct under-reporting. Under-reporting is itself identified as a determining factor in the cost-effectiveness of the vaccination strategy. Protection through vaccination can be thus improved.