CONFERENCE UPDATE: AAN 2023
Machine learning predicts post-ischemic stroke seizure determinants
The incidence of stroke has been increasing throughout the years, and this rising trend is set to persist through 2050.1 As such, more patients are becoming survivors of stroke, along with an increase in the incidence of long-term complications.1 In fact, while up to 85% of strokes are ischemic strokes, about 1.5%-10% of those patients may eventually develop seizures after stroke.1 Patients who develop seizures after stroke are more like to die, and generally have a poorer quality of life.1 The first step of disease management is to identify those who are more likely to develop seizures after a stroke, thus prompting the need for prediction models.1 Currently, there is one in Switzerland that readily fits the criteria of a good model prediction.1 However, it is not applicable in the United States (US) due to several reasons, among which is the lack of social determinants of health.1
There has been a study that leveraged an electronic health record (EHR) to predict seizures after an ischemic stroke without disrupting the workflow in the clinic.1 Data from the TriNetX Diamond Network which consisted of over 20 million patients across over 90 Health organizations were collected from 2015 to the inception of this study, targeting patients aged ≥18 years.1 Different classification models were applied to the data, with the light gradient boosting machine (GBM) model being the most prominent.1 A 5-fold nested cross validation was conducted to develop the model of prediction.1
The time point of t0 was interpreted as when a patient received the diagnosis of ischemic stroke defined by the International Classification of Diseases (ICD) 163 (i.e., cerebral infarction), excluding 163.6 (i.e., cerebral infarction due to cerebral venous thrombosis, nonpyogenic) without previous occurrence of G40 (i.e., epilepsy and recurrent seizures) or G41 (i.e., status epilepticus).1 The outcome predictions were compared with any ICD G40 or G41 events in the EHR after t0 at 1 year and 5 years.1
The model performance was evaluated with well-known metrics, focusing mostly on the area under the receiver operating characteristics (AUROC) to assess model accuracy and balance accuracy, where the model was adjusted for patients who did not have seizures.1 The data were compared with the database by time points at 1 year and 5 years, and further grouped into all-comers and those who received antiseizure medications (ASMs).1 The rates of seizure development were similar to the aspirational literature with 4.2% at year 1 increasing to 5.5% at year 5.1 AUROC was consistent across both 1 year and 5 years after diagnoses, regardless of ASMs.1 It was above 65% in all groups with up to 78% in the model for all-comers at year 1.1 The range in accuracy was between 75% to 82%, with the 5-year no medication group being the lowest and the 1 year all-comer group being the highest.1 When balanced accuracy was included, the performance was still >60% across all 4 groups.1
A number of partial dependence plots (PDPs) were made to identify the contributions of each variable.1 Five variables were included in the PDPs based on age, the occurrence of stroke, and the use of aspirin, gabapentin, and levetiracetam.1 An inverse relationship between the likelihood of developing seizures and age 20-90 years was observed.1 Younger people were more likely to develop seizures compared with older adults.1 Other variables were binaries.1 The use of aspirin and gabapentin demonstrated an inverse relationship with epilepsy, while the use of levetiracetam and the occurrence of stroke indicated a strong positive relationship with seizures.1
In summary, when those factors are combined into 1 plot, their effects are addictive and can be used to predict seizures better than when used separately.1 However, it is noted that since it was merely a retrospective study, more research is required to incorporate this information into the EHR.1