Study Participants
Subjects were recruited from Ruijin Hospital, Shanghai Jiaotong University School of Medicine (Shanghai, China). Inclusion criteria for the surgery were: 1) men and women aged 18-65 years; 2) meeting the international Classification of diseases (ICD)-10 definition of non-psychotic major depressive disorder (MDD), 3) current episode ≥ 2 years duration and/or more than 4 repeated episodes with current episode ≥ 1 year duration and a minimum of 5 years since the onset of the first depressive episode45, 4) lack of antidepressant response to a minimum of three antidepressant treatments of adequate dose and duration, including at least two medications from two different classes, failure of adequate psychotherapy and electroconvulsive therapy (ECT) (either poor response, intolerance or rejection), 5) remaining stable with the current anti-depressive medicine for the last month and 6) Hamilton Depression Scale-17 (HAMD-17) score ≥ 17. Exclusion criteria were: 1) schizophrenia or psychosis unrelated to MDD; 2) severe personality disorder, neurological disorders or medical disorders; 3) history of brain surgery; 4) contraindications for anesthesia or stereotactic surgery. We enrolled 26 patients with MDD in the study. All included patients provided written informed consent.
The study was conducted in Ruijin Hospital, Shanghai Jiaotong University School of Medicine. Approval for the trial was obtained from the Ruijin Hospital Ethics Committee, Shanghai JiaoTong University School of Medicine, under the approval number 2021-52. The research use of the implanted components authorized by the Shanghai Testing & Inspection Institute for Medical Devices and approved by China Food and Drug Administration. A total of twenty-six patients were enrolled in the study from March 29th, 2021, to July 28th, 2023.
Surgical Procedure and Stimulation
All patients received bilateral implants of 8 contact electrodes (lead model SR1202-S; SceneRay, Suzhou, China) using stereotactic frame-based, magnetic resonance localization with the middle contact point in the bed nucleus of the stria terminalis (BNST) and a lower contact point in or close to the nucleus accumbens (NAc). The length of each contact is 1.5mm and the spacing between contacts is 0.5mm. Target coordinates for the electrode tip were approximately 4-8mm lateral to the midline, 1-3mm anterior to the anterior border of the anterior commissure and 5-8mm inferior to the anterior commissure. Electrodes were connected via subcutaneous extensions to a stimulator (SR1103, SceneRay).
Deep brain stimulation (DBS) settings were initiated one-week post-lead implantation, with parameter selection guided by immediate responses to stimulation and careful monitoring for transient side effects. Stimulation parameters, including electrode contacts, voltage, pulse width, and frequency, were systematically adjusted approximately every two weeks. Monopolar stimulation configuration was predominantly utilized across the patient cohort. In practice, we increased the amplitude upon observing diminished beneficial effects or limited clinical improvement, potentially reaching 6mA. Subsequently, we tailored frequency and pulse width as needed. If these adjustments were ineffective, we introduced the second and, if required, the third preselected contact for monopolar stimulation.
Study Design of Phase II Trial
The study started with a longitudinal, open-label trial followed by a randomized, double-blind crossover design. Following electrode implantation, patients underwent an initial open phase, during which they were assessed monthly for symptom severity with study questionnaires, with clinical adjustments to stimulation parameters approximately every two weeks. DBS optimization was deemed complete after parameter adjustments maintaining for at least six months.
Following the open label phase, patients were randomly assigned to either receive active DBS for two weeks followed by sham DBS for two weeks (the on-off group), or vice versa (the off-on group). Prior to each active and sham phase, a 2-day washout period was implemented. If deemed necessary by the research team or at the request of the patients, the phase was terminated, with patients progressing to the next phase upon termination. Throughout the crossover phase, medication and DBS parameters remained constant. Patients underwent evaluation at five time points: pre-randomization, following two washout phases, and after each active or sham DBS phase.
Randomization and Blinding
We employed an envelope procedure for randomization, wherein each randomly assigned group was enclosed within a sealed, opaque envelope. An independent researcher sequentially opened these envelopes and conducted the randomization by switching DBS on or off accordingly. Throughout the trial, patients, two assessors (YW, LD, psychiatrists), and other relevant researchers were kept blinded to the stimulation condition. This double-blinding protocol remained in effect until the completion of the crossover phase by the last patient.
Clinical assessments
Depression symptoms were assessed using the HAMD-17, with scores ranging from 0 to 52, where higher scores indicated more severe symptoms46. A patient was considered a responder if they exhibited a decrease of at least 50% in their HAMD score compared to baseline. Remission was defined as achieving a HAMD-17 score of less than 8. Additionally, we utilized the Montgomery-Asberg Depression Rating Scale (MADRS), which ranges from 0 to 60, to evaluate depression symptoms47. Anxiety severity was evaluated using the Hamilton Anxiety Scale (HAMA), which has a range of 0 to 5648. Anhedonia was self-reported using the Dimensional Anhedonia Rating Scale (DARS), ranging from 17 to 85, with higher scores indicating lower levels of anhedonia.
Other outcome measurements included the 16-item Self-Report Quick Inventory of Depression Symptomatology (QIDS-SR16), Depression and Somatic Symptoms Scale (DSSS), World Health Organization Quality of Life-BREF (WHOQOL-BREF), 36-Item Short Form Survey Instrument (SF-36), Sheehan Disability Scale (SDS) and Pittsburgh Sleep Quality Index (PSQI).
The primary outcome measure of the trial was the difference in HAMD scores between the active and sham stimulation phases in the randomized, double-blind crossover phase. Secondary outcome measures were the difference in scores on the other scales between active and sham stimulation phases.
Affective Task
Emotional behavior exhibits a structured organization across two psychophysiological dimensions: valence, varying from negative to positive, and arousal, varying from low to high. The individual assessment of these dimensions can be achieved via an affective task, wherein patients were asked to self-report valence and arousal in response to affective imagery. The affective imagery was sourced from the well-established International Affective Picture System (IAPS) and presented in three distinct conditions: positive, neutral, and negative49. Each condition encompassed a set of ten different images, with half rated for valence and half rated for arousal. Following the presentation of each image, participants rated their valence and arousal levels using sliding visual analog scales (VAS) that ranged from 0 to 100. A score of 0 represented very negative (valence) or not exciting at all (arousal), while a score of 100 represented very positive (valence) or very exciting (arousal). The intertrial interval lasted between 1 to 1.5 seconds. The affective task was performed at baseline visit.
Localization
DBS electrode localization for the purposes of physiological analyses utilized Lead-DBS v2 (http://www.lead-dbs.org)28. In brief, postoperative CT images were first linearly co-registered to preoperative MRI and normalized into ICBM 2009b NLIN asymmetric space using the SyN approach implemented in Advanced Normalization Tools. DBS electrodes were then localized using Lead-DBS and warped into the Montreal Neurological Institute (MNI) space using the PaCER algorithm after visual review and refinement of the co-registrations and normalizations.
Volume of Tissue Activated and Connectivity Estimation
Volumes of tissue activated (VTA) were modelled following the Baldermann et al.’s approach50. Specifically, the E-Field was estimated using a finite element method on a four-compartment mesh describing local grey and white matter, as well as electrode contact and insulating material. Subsequently, voxel-wised normative structural connectivity seeding from bilateral E-fields were estimated using normative data sets from the Human Connectome Project at Massachusetts General Hospital (32 subjects, multi-shell diffusion-weighted imaging data). E-field values were used as weights to construct connectivity profiles. For each patient, fibers passing through a non-zero voxel of the E-field were extracted from the normative connectome and mapped onto a standardized voxelized volume with 2mm resolution. Each fiber received the weight of the maximal E-field magnitude of its passage and fiber densities were weighted by these values. Spearman rank correlation analysis was then performed to examine the relationship between structural connectivity and the identified physiology in the patient cohort. The uncorrected significant fiber tracts were integrated to construct the correlated map (Rmap).
Perioperative Signal Recordings
Twenty-six patients underwent perioperative signal recording. Nine patients were excluded: three due to data noise exceeding 50%, three for incomplete 3-month follow-up during analysis, and three as we were unable to record from contacts in the BNST region although stimulation remained intact. Consequently, the reported results are based on the remaining 17 patients.
The surgery followed a staged procedure. We recorded signals between 2 and 4 days postoperatively, during the period when the leads were externalized. Resting state recordings were obtained for a minimum of 3 minutes (maximum 5 minutes) while patients sat in a relaxed and wakeful state. These recordings were conducted under two distinct conditions: with eyes open and with eyes closed.
Scalp electroencephalogram (EEG) and local field potential (LFP) data were simultaneously recorded using a BrainAmp MR amplifier (Brain Products) at a 500 Hz sample rate. EEG was obtained from 7 frontal electrodes (Fp1, Fp2, F3, F4, F7, F8, Fz) using the 10-20 placement system. To ensure that the desired signals were not contaminated by blinks or saccades, electrooculogram (EOG) recordings were performed. The data was reference online using a left mastoid electrode. The ground was placed at the apex nasi.
Wireless Data Streaming
Ten patients participated in the daily in-lab electrophysiological recordings and momentary self-report behavioral assessments during the randomized, double-blind crossover phase. One patient was excluded due to insufficient data available during the active stimulation phase, leaving a total of nine patients included in the final analysis.
Patients completed 1-3 trials per day during the washout period or in the post-active and post-sham stimulation phases, while stimulation was turned off. In each trial, participants engaged in 5 minutes of wireless data streaming, followed by self-reporting on their mood or anxiety states using a Visual Analog Scale (VAS). Patients maintained an open-eye resting state throughout the trial. The intertrial interval was more than 4 hours.
Wireless data streaming was sampled at a rate of 415Hz or 1000Hz (for P3). LFPs were sensed in bipolar configuration where a pair of sensing contacts were recorded with one contact referenced to the other. The contact pairs within (defined as less than 1mm of distance from the target) or closest to the BNST were preselected for recording. Patients rated their mood or anxiety states on the VAS immediately after data streaming, where 0 corresponds to “very happy” or “not anxious at all”, 100 corresponds to “very sad” or “very anxious”, and 50 corresponds to “moderate states”.
Signal Processing
All data were analyzed offline using MATLAB R2023a with EEGLAB toolbox v2022.1 and custom-written scripts.
Perioperative LFP data applied offline bipolar referencing to reduce volume conduction effects. The data were high-pass filtered at 1Hz using a 2nd order Butterworth filter (pop_basicfilter) and band-stop filtered at 50Hz and its harmonics (pop_eegfiltnew). Independent Component Analysis (ICA) and then visual inspection were utilized for artifact rejection. Data epochs or electrodes containing artefactual activity (e.g., blinks, saccades, or cardiac activity or line noises) identified during visual inspection were omitted from further analyses. Following artefact rejection, we analyzed a total of 17 patients' eye-closed datasets, yielding a mean duration of 230 ± 55s (range: 140-308s); and a total of 15 patients' eye-open datasets, yielding a mean duration of 227 ± 52s (range: 94-282s).
Based on the atlas-defined lead reconstruction in the MNI space using Lead-DBS v2 toolbox, the LFP contact pairs within or closest to the one of the three brain structures (BNST, NAc, hypothalamus) was selected from each recorded hemisphere for final analysis. The BNST and hypothalamus contacts (n = 17) were all within or <1mm from the structures. The NAc contacts (n = 11) were within <2mm distance. To facilitate subsequent individual-level analysis, we computed the LFP activity in the BNST, NAc, and hypothalamus by averaging across both hemispheres. EEG data were averaged across 7 channels.
Wireless LFPs followed a preprocessing procedure akin to perioperative LFPs, with the addition of band-stop filtering at 40Hz and its harmonics (pop_eegfiltnew) to cancel out the sensing noise. To maintain consistency, wireless data obtained from P3 were down-sampled at 415Hz.
Spectral analysis was performed using the Welch’s power spectrum density (PSD) estimation (pwelch). Non-overlapping windows of 512 data points were analyzed, affording a spectral resolution at ~1Hz. LFP power at frequencies < 4Hz was excluded from further analysis to eliminate the movement artifacts and the influence of 1/f nature of the signal. Our primary area of interest was power within the 4-90Hz range. To mitigate the effects of individual variance in targeting and impedance, we computed the relative (%) powers by expressing the power within this range as a percentage of the total power. We summed power into canonical frequency bands: θ (4–8Hz), α (8–12Hz), lβ (13–20Hz), hβ (21-35Hz), γ (40–90Hz).
We fit spectral power features of regions or functional connectivity between regions into a cross-validated penalized ridge regression model with permutation feature selection to identify the electrophysiological features predictive of treatment response. The modelling pipeline consisted of built-in 3-fold cross-validation for model hyperparameter optimization and repeated 10000 iterations for stability and better performance. Permutation feature importance was then computed for the model based on the decrease in a model score when a single feature value is randomly shuffled.
Non-directed spectral connectivity was computed using coherence analysis. The continuous data were divided into 10-second epochs with a 50% stepwise overlap. The magnitude-squared coherence was then calculated for each epoch using the Welch's overlapped averaged periodogram method with blocks of 512 samples, a Hanning window, a 50% overlap, and frequency resolution of ~1Hz in 4-90Hz range (mscohere) and scaled by 100 for expressing percentages. The coherence values were summed across frequency bins and then averaged across epochs. These averaged values were subsequently grouped into five predefined frequency bands.
Directional spectral connectivity (estimates of both A -> B and B -> A connectivity) was estimated using spectrally resolved Granger causality51. This analysis utilized the MVGC multivariate Granger causality toolbox. To maintain data stationarity, LFP signals were initially segmented into 4-second time-windows with 50% overlap. A multivariate vector autoregressive (MVAR) model was fit to the full “universe of data” time series, with model order optimized using Bayesian information criterion (BIC). The corresponding MVAR model parameters for the selected model order was estimated and the resulting model was checked for stability. Then, the frequency conditional spectral causalities were computed for signal pairs and the significance of the resulting causalities was tested via nonparametric permutation testing. The subsequent analysis focused on the summed frequency-based connectivity within the theta (4-8Hz) band, based on the prior findings.
Statistical Analysis
The sample size was based on the assumption that a mean reduction of 9 points on HAMD score with a standard deviation of 8 when compared active DBS with sham DBS7. Consequently, it was estimated that 17 patients in the crossover phase would be sufficient to evaluate the efficacy of the trial with a type I error rate of 0.05 and a type II error rate of 0.9.
For the open-label outcome analysis, participants missing one or two assessments at months 3, 6, or 12 had data from adjacent month visits used. In the crossover phase outcome analysis, participants who terminated early during sham stimulation included data at the time of exit. For participants transitioning directly from the sham DBS phase to active DBS without a washout, scores during the second washout phase were considered equivalent to those during the sham DBS phase.
Data are presented as mean (standard deviation, SD) in tables and as mean (standard error, SE) in figures, and statistical significance was set at a 2-tailed 5% level. Data normality was determined by Shapiro-Wilk test. Correlational analyses were performed utilized Spearman rank correlation or Pearson correlation. Group differences were inferred using Wilcoxon signed-rank sum test or paired student t-tests. Linear regression was used to model the relationship between a dependent variable and one or more independent variables using the stepwise method. We applied false discovery rate (FDR) correction to account for multiple comparisons where necessary.
We performed intention-to-treat analyses (ITT) to results in this prospective randomized controlled trial (RCT). In the open-label phase, outcome measures were analyzed using paired t-tests with a Bonferroni correction applied. Descriptive statistics included the percentage of responders, non-responders, and remission cases. In the crossover phase, analyses of the primary efficacy outcome involved testing three effects: carryover, period (T3 and T5), and treatment (active and sham) effects. This was accomplished using a general linear mixed model analysis, with two-paired HAMD scores as the dependent variables, group (on-off and off-on group) and treatment as independent variables, and HAMD scores at T1 as a covariate52. Carryover effects were assessed examining the between-group main effect. Period effects were assessed through the interaction term of treatment × group effects. Additionally, treatment effects were assessed using the within-group main effects. A similar procedure was applied to analyze other measures.
Mediation model was used to test the indirect effect of negative emotional bias on clinical outcomes via theta biomarkers. It was performed using Model 4 in the PROCESS Marco (SPSS). Note that the negative emotional bias did not follow a normal distribution and therefore a pre-applied rank-based inverse normal transformation was performed. The mediation model investigated two paths: the indirect path from (a) negative emotional bias to theta biomarkers and (b) theta biomarkers to clinical outcomes, as well as the direct path from (c’) negative emotional bias to clinical outcomes. Model fit was evaluated based on a non-significant X2 (p > 0.05). The indirect effects were estimated using 5000-sample bootstrap procedure with bias-corrected confidence intervals.
Data analyses were performed using SPSS 26.0, Python 3.11, or Matlab R2023a. The figures were generated using R version 4.3.1.