We developed a machine learning algorithm to identify undiagnosed pertussis episodes in adolescent and adult patients with reported acute respiratory disease (ARD) using clinician notes in an electronic healthcare record (EHR) database. Here, we utilized the algorithm to better estimate the overall pertussis incidence within the Optum Humedica clinical repository from 1 January 2007 through 31 December 2019.
Patients and methods
The incidence of diagnosed pertussis episodes was 1–5 per 100,000 annually, consistent with data registered by the US Centers for Disease Control and Prevention (CDC) over the same time period. Among 18,573,496 ARD episodes assessed, 1,053,946 were identified (i.e. algorithm-identified) as likely undiagnosed pertussis episodes.
Accounting for these undiagnosed pertussis episodes increased the estimated pertussis incidence by 110-fold on average (34–474 per 100,000 annually). Risk factors for pertussis episodes (diagnosed and algorithm-identified) included asthma (Odds ratio [OR] 2.14; 2.12–2.16), immunodeficiency (OR 1.85; 1.78–1.91), chronic obstructive pulmonary disease (OR 1.63; 1.61–1.65), obesity (OR 1.44; 1.43–1.45), Crohn’s disease (OR 1.39; 1.33–1.45), diabetes type 1 (OR 1.21; 1.17–1.24) and type 2 (OR 1.12; 1.1–1.13). Of note, all these risk factors, except Crohn’s disease, increased the likelihood of severe pertussis.
In conclusion, the incidence of pertussis in the adolescent and adult population in the USA is likely substantial, but considerably under-recognized, highlighting the need for improved clinical awareness of the disease and for improved control strategies in this population. These results will help better inform public health vaccination and booster programs, particularly in those with underlying comorbidities.