Processing of invasive neurophysiology data for machine learning based brain signal decoding
We developed an open, integrative and modularized platform (Fig. 1), for standardized and reproducible implementation of machine learning based brain signal decoding algorithms (https://github.com/neuromodulation/py_neuromodulation).
The modularized feature estimation chains include a versatile set of state-of-the-art signal processing algorithms. In brief, oscillatory dynamics, waveform shape, interregional coherence6,7,10,11 and more can be extracted through the modular architecture which further allows for flexible extension for more advanced feature estimation metrics, such as direction of information flow with granger causality or phase amplitude coupling12 (Supplementary Table 1). A key problem for invasive brain signal decoding is the individualized localization of brain implants across patients, which significantly complicates the development of large-scale models that generalize across patients. To address this, we developed dedicated approaches for across-patient decoding based on normative MRI connectomics and latent embeddings from contrastive learning with CEBRA13. In the following, we highlight the utility of this platform across three invasive brain stimulation use-cases.
Generalizable models for movement decoding across 1480 invasive recordings with prospective real-time validation
For the treatment of movement disorders, the decoding of volitional and pathological motor output has the potential to provide critical information for therapeutic adjustment for different diseases such as Parkinson’s disease or essential tremor1,4. We implemented py_neuromodulation for electrocorticography (ECoG) based movement decoding without individual training (56 patients, 1480 channels, mean age: 50.6 ± 17.8, 24 females) from three independent PD (n=38; Berlin/Pittsburgh/Beijing) and one epilepsy cohort (n=18 from Washington14) performing different upper limb movements (Fig. 2a, Supplementary Tables 2 and 3). In PD patients, ECoG strips were introduced through DBS burr-holes15. In a sub-cohort of six patients (Berlin), recordings were repeated during clinically effective STN-DBS.
First, patient specific movement classification performances based on ECoG signals were analyzed. We trained ridge regularized logistic regression classifiers and evaluated their performance using the balanced accuracy metric with 3-fold cross-validation on consecutive data segments, at the single sample level (100 ms precision indicating presence or absence of movement) and at the individual movement level (300 ms or more movement time decoded consecutively). Performance was significantly above chance in every subject, with an average balanced accuracy of 0.8/0.98 ± 0.07/0.04 for single sample/movement detection in the best channel per subject (Fig. 2b-d).
Importantly, we could identify two key factors associated with relatively lower performances that will be crucial to account for in future clinical applications: disease severity and therapeutic DBS. First, we show that decoding performance is negatively correlated with clinical symptom severity in PD patients, as measured with Unified Parkinson’s Disease Rating Scale (Spearman’s rho = -0.36; p = 0.02) (Fig. 2e). We reported a similar result earlier for the Pittsburgh cohort11, and therefore repeated the analysis for Berlin and Beijing alone, excluding all previously reported data, and could empirically reproduce the negative correlation across new cohorts (rho=-0.43, p=0.03). We may speculate that neurodegeneration in PD may impact neural encoding of movement, which may also impede machine learning based decoding performance. Next and equally important, we show that therapeutic electrical stimulation (130 Hz STN-DBS) can significantly deteriorate sample-wise decoding in some but not all patients and that models trained separately for OFF and ON stimulation conditions outperform models trained on either condition alone. Nevertheless, movement detection remained acceptable with 0.88 ± 0.17 movement detection rate across movements even during high-frequency DBS (Fig. 2f,g). Our results highlight the necessity to account for therapies and disease related brain state changes in clinical BCI.
The more serious limitation of the abovementioned results is the dependence on patient individual training. For a real-world clinical application this means that every implant would need to undergo tedious model training sessions, which could be a burden to both patients and medical staff and may hinder a broad clinical adoption. To address this critical limitation, we explored three computational approaches that do not depend on patient individual training while accounting for individual differences in ECoG strip location. In a first approach, the data were spatially extrapolated to a manually defined 38-point cortical surface grid, similar to a previous study16 (Fig. 2h). The disadvantage of this however, is a) that extrapolation leads to inaccuracy in spatial estimation and b) large amounts of data would be required to train a grid of the whole brain, which limits the application to cohorts with very similar electrode locations.
To overcome these shortcomings, we have developed a connectomic approach for across-patient decoding that optimally accounts for the specific recording localization while being generalizable across the whole brain. It builds on functional or structural connectivity fingerprints extracted from brain signal recording locations in normative space. In brief, voxel-wise correlations between decoding performance and whole-brain connectivity maps seeded from channel MNI coordinates were calculated to identify an optimal connectomic template fingerprint for movement decoding (so called connectomic decoding network map) across all subjects (Fig. 2i). This allows for an optimized a priori channel selection in real-time, by identifying the individual recording channel that has most network overlap with the optimal template. Finally, we have transformed neural features from the selected channel into a lower dimensional embedding13. For this, a five-layer convolutional neural network with a temporal filter length of 1 s was trained using the InfoNCE (Noise-Contrastive Estimation) contrastive loss function17. The resulting embeddings showed exceptionally high consistency across subjects as investigated with linear identifiability (Fig. 2j,k).
All three approaches reached significant above chance balanced accuracy and movement detection metrics (Table 1; Fig. 2l,m, all p<0.05) for cross-validation without patient-individual training within and across cohorts, and even leaving entire cohorts out. This indicates high generalizability across movement types, neurological disorders, recording setups and individual implant trajectories. In addition to the conceptual advantage to account for specific recording location and underlying brain network affiliation, the connectomic approach with contrastive learning (CEBRA) significantly outperformed the linear model for the most challenging across cohort cross-validation in sample-wise balanced accuracy (p=0.001) and overall achieved highest average performances (see Table 1).
Table 1: Leave one subject out and leave one cohort out cross-validation results for generalized movement decoding across patients, diseases and movement types. Decoding performance is depicted as balanced accuracy, which accounts for class imbalances. Single sample estimates provide performance metrics at 100 ms precision. Movement detection estimates were defined as 300 ms consecutive classification output with respect to presence and absence of movement. On average, the connectomic approach combined with contrastive learning (CEBRA) provided best cross-validation performances.
|
Grid extrapolation
|
Connectomics with LM
|
Connectomics with CEBRA
|
|
Sample
|
Movement
|
Sample
|
Movement
|
Sample
|
Movement
|
Within cohort
|
0.71 ± 0.08
|
0.84 ± 0.12
|
0.72 ± 0.08
|
0.86 ± 0.10
|
0.74 ± 0.1
|
0.86 ± 0.14
|
Across cohorts
|
0.64 ± 0.1
|
0.78 ± 0.2
|
0.66 ± 0.09
|
0.78 ± 0.19
|
0.68 ± 0.09
|
0.81 ± 0.16
|
Leave one cohort out
|
0.54 ± 0.11
|
0.61 ± 0.22
|
0.57 ± 0.1
|
0.68 ± 0.19
|
0.6 ± 0.11
|
0.7 ± 0.19
|
Average
|
0.63 ± 0.1
|
0.74 ± 0.18
|
0.65 ± 0.09
|
0.77 ± 0.16
|
0.67 ± 0.1
|
0.79 ± 0.16
|
To investigate the individual variability of trainer vs. learner performance and potentially improve the training process according to diversity related factors, we established a subject-to-subject decoding matrix. This allowed to identify individual subjects that provide outstanding training data for other subjects respectively (Fig. 2n). The average sample-wise decoding performance trained on just the best trainer sub_002 (Berlin) for the remaining cohort of 55 patients was 0.71 ± 0.14 for sample-wise prediction and 0.85 ± 0.2 for movement detection. Conceptually, this process could allow for more fine-grained individualized precision medicine approaches that can account for diversity in all its humans forms, e.g. by matching training data to demographic or genetic profiles of patients receiving brain implants.
Finally, after successfully training across patient models offline, we validated the approach prospectively in a newly recruited patient in Berlin (Supplementary Video 1). Importantly, pretrained patient naïve models outperformed individually trained models within the same recording session, with best performance of the model trained on all patients from the Berlin cohort (sample-wise balanced accuracy: 0.71; movement detection rate: 0.97), with slightly lower performances for a model trained on the patient individual data within the same session (45 movements; ~9 minute recording; sample-wise balanced accuracy: 0.67; movement detection rate: 0.97) (Fig. 2o,p). In a third run, we could further validate the patient to patient decoding results, by demonstrating that a model trained on the single best trainer (sub_002) reached comparable performance to the model trained on the patient individual data.
Decoding emotions from subgenual cingulate cortex in major depressive disorder
Invasive brain signal decoding has clinical utility beyond movement decoding. In the following we highlight the clinical potential of py_neuromodulation for brain circuit discovery in the neuropsychiatric domain. In the future, closed-loop therapies for affective disorders may adapt neuromodulation to concurrent mood or may support patients in the valuation of perceived emotion18,19. Here, we employed py_neuromodulation to investigate optimal circuits, target features and computational approaches to decode perceived emotion from the primary DBS target for major depressive disorder, the subgenual cingulate cortex (SCC).
Machine learning decoders were trained on local field potential signals from the DBS electrodes in SCC in eight patients undergoing DBS for treatment resistant major depressive disorder as part of a clinical trial (mean age: 48 ± 11.4, 4 females; Supplementary Table 4). Neurophysiological recordings were conducted extraoperatively while DBS leads were externalized and acquired while patients participated in a visual emotion task (Fig. 3a). Visual stimuli included pleasant, unpleasant, and neutral stimuli from the international affective picture system (IAPS) database (for further information see20) and were presented for a duration of 1 s with an inter-stimulus interval of 6-8 s. We used py_neuromodulation to estimate a unique and novel feature set that included temporal waveform features, such as discharge prominence, sharpness, decay and rise time, and peak and trough interval in addition to traditional oscillatory FFT features.
Logistic regression classifiers were trained to distinguish emotional valences using samples during stimulus presentation and cross-validated within patients as described above (Fig. 3b). Above-chance performance was obtained in all patients, with performance rising at 150 ms and peaking at 600 ms post-stimulus onset (Fig. 3c). Sample-wise performances were above chance-level for all classifications (balanced accuracy 0.62 ± 0.05; neutral vs pleasant: 0.64 ± 0.07, neutral vs unpleasant performances: 0.6 ± 0.03, pleasant vs unpleasant: 0.63 ± 0.01) (Fig. 3d). Decoding performance was driven by oscillations including high-frequency (200-400 Hz) activity (HFA), low- (60-90 Hz) and high-gamma (90-200 Hz) activity, followed by waveform shape features including rise time and prominence (Fig. 3e).
To investigate a potential relationship with clinical scores, we correlated decoding performances from the most predictive channel contrasting neutral vs. positive/negative per patient with Beck’s Depression Inventory (BDI) at time of recording and after six months of chronic DBS. Decoding performance correlated with DBS induced improvement in BDI scores (rho=0.79, p=0.01), but not with concurrent symptom severity (Fig. 3f). The correlation could be driven by optimal targeting rather than by depressive symptoms themselves, which inspired us to explore the underlying whole-brain networks. To this end, we used both dMRI and fMRI connectomics as above and an additional fiber filtering approach recently introduced in the context of DBS for OCD21,22. In all cases connectivity fingerprints seeded from LFP channel locations were correlated with channel specific decoder test-set performance. The identified fiber tracts and whole-brain dMRI fingerprints were predictive of decoding performance and robust to leave-one-channel-out (Supplementary Fig. 1; fiber filtering rho=0.48, p<10-5, whole-brain dMRI fingerprint: rho=0.38, p=0.002) and leave-one-subject-out cross-validation (fiber filtering rho=0.46, p<10-5, whole-brain dMRI fingerprint: rho=0.46, p<10-5). Functional connectivity was robust to leave-one-channel out (rho=0.39, p=0.001) but not leave-one-subject out validation (p>0.05).
A consistent left lateralized prefrontal network emerged across modalities that has direct overlap with network targets for the treatment of depression for transcranial magnetic stimulation and has relevant similarity with networks associated with mood change23, alleviation of depression and affective changes with subthalamic DBS in Parkinson’s disease24,25 (Fig. 3g,h, Supplementary Fig. 2). In the future, emotion decoding may become relevant to adapt neurostimulation control and select optimal targets from electrophysiological biomarkers and connectomics5,26,27.
Optimization of seizure detection parameters for responsive neuromodulation in epilepsy patients
Responsive neurostimulation (RNS; Neuropace) is a closed-loop stimulation device for treatment for medication resistant epilepsy29. The device is comprised of a cranially mounted battery and processor connected to two electrodes which can be either an ECoG strip or a stereoencephalography (sEEG) depth-electrode in proximity to the putative seizure focus. RNS promises a superior reduction in the number of disabling seizures by processing LFP signals using programmable detectors of seizures (i.e. ictal) states to deliver temporally targeted stimulation. However, mechanisms of closed-loop stimulation for epilepsy are not well understood. For some patients, this therapy can be life-changing with complete remission of seizures, while for others the therapeutic success remains below expectations2. A key differentiating feature versus open loop therapies is the ability to target stimulation to specific neurophysiologic states (such as ictal, inter-ictal, or quiescent) via physician-selected detection parameter settings. Divining appropriate settings from this vast parameter space is a largely manual process taking months or years, if ever, to achieve optimal sensitivity and specificity30,31. Improvements in parameter selection may lead to faster and better clinical outcomes and identify new features of interest.
Here, we aim to inspire new ways to improve seizure detection accuracy by constraining the decoding platforms to the specifications of clinical brain implants and suggesting improved parameters from offline predictions that are implementable and testable through the clinical patient data management systems (PDMS) provided by Neuropace. For this, we analyzed over 100 hours of invasive human brain signals from neocortical depth electrodes (see Fig. 4a for an example) recorded with RNS devices in a cohort of nine epilepsy patients (mean age: 35.3 ± 8.2, 8 females, all focal epilepsy; mean number of available recordings: 636.9 ± 366.4, mean recording duration: 71.07 ± 10.78 s, Supplementary Table 5). With the aim to reduce the false positive rate of the implemented seizure detector, we performed a systematic feature optimization in a simulation of the RNS bandpass detector algorithm. This embedded algorithm is normally programmed by the clinical team via the selection of a single ictal event (“SimpleStart”) which provides the foundation for a semi-automatized parametrization of detection settings for this specified event in the programming environment.
We extracted brain signals and detection and stimulation parameters from RNS implants (Fig. 4b) using a previously described database access pipeline32. Individual recordings were annotated by a certified epileptologist for electrographic presence and onset of seizure activity. The resulting annotations served as ground truth for using machine learning methods that can identify embedded bandpass algorithm parameters using our platform. Three parameters need to be defined for the detector by the clinical team: i) the threshold direction, i.e. whether increase vs. decrease in band power is associated with ictal activity, ii) the corresponding threshold amplitude and iii) the required duration the threshold is crossed for a seizure event to be detected. We optimized these parameters for sample-wise seizure classification to maximize the F1-score. We focused on F1 scores instead of balanced accuracy, because ‘seizure present’ (true positive) predictions are more critical for clinical scenarios than correct ‘seizure absent’ (true negative) predictions, which is why they are commonly used as a metric for RNS seizure prediction performance33.
Brain signals aligned to seizure onset revealed high-frequency synchronization followed by activity in lower frequency bands, as previously described (Fig. 4c)34,35. An exemplar grid-search matrix spanning the threshold direction, amplitude and duration of each feature, demonstrates how py_neuromodulation can directly provide access to RNS detector parameters that can be used in the embedded framework and implemented through the patient data management system (Fig. 4d,e). The identified parameter combination significantly reduced false positives, while maintaining stable true positive rates leading to overall performance increases (F1 score for original RNS settings: 0.41 ± 0.12, F1 score for our optimized settings 0.92 ± 0.06) (Fig. 4f). It is important to note that the embedded RNS programming environment does not have cross-validation implemented which was mirrored in our simulation, and that RNS data are in part recorded because of true or false- positive seizure detection (114.16 ± 86.42 minutes out of 6515 minutes of total recording time were classified as ictal by expert annotations). Thus, the presented data should rather be interpreted as a proof-of-concept that requires further validation in prospective clinical trials.
To highlight the utility of multivariate brain signal decoding for seizure detection beyond the RNS device limitations we further evaluated an extensive set of 264 features (66 features per channel) (Fig. 4g). We then assessed the performance of linear models, support vector machines (SVM), and gradient boosted decision trees (XGBOOST) using a 3-fold cross-validation as before. XGBOOST achieved best performances (F1 score: 0.8 ± 0.2) and outperformed linear models (Linear Model 0.56 ± 0.23, Support Vector Machine 0.4 ± 0.3) (Fig. 4h), potentially by capturing non-linear interactions more robustly than SVMs and linear models.
Our results emphasize two key aspects regarding the utility of our platform for the development of decoding algorithms for brain implants: i) it can be of direct use for the parametrization of currently available and embedded algorithms in available brain implants, ii) it enables the discovery of optimized feature sets and machine learning methods for the next generation of clinical brain computer interfaces for the treatment of epilepsy and other brain disorders.