2.2 fMRI acquisition and preprocessing
The T1w and functional MRI data were collected on the scanners of two vendors (Siemens and GE scanners), across which a harmonized MRI acquisition protocol with comparable acquisition parameters was established. The T1w images were acquired with the following parameters: matrix of 256\(\times\)256, 176 slices (Siemens) and 208 slices (GE), field of view (FOV) size = 256\(\times\)256, voxel size = 1.0mm3, repetition time (TR) = 2500ms, echo time (TE) = 2.88ms (Siemens) and 2ms (GE), inversion time (TI) = 1060ms, flip angle (FA) = 8°, and acquisition time 7’12" (Siemens) and 6’09" (GE). The fMRI data were acquired with: matrix of 90\(\times\)90, 60 slices, FOV size = 216\(\times\)216, voxel size = 2.4\(\times\)2.4\(\times\)2.4mm3, TR = 800ms, TE = 30ms, and FA = 52°. All MRI data have been run through standard modality-specific preprocessing stages, including raw file compression, distortion correction, movement correction, alignment to standard space, initial quality control, etc. More details about the MRI acquisition, scanning parameters, and preprocessing pipelines are reported in prior work50.
2.3 Extraction of fMRI measures
The minimally preprocessed fMRI data were further processed using the Data Processing Assistant for Resting-State fMRI (DPARSF v5.1) software (http://rfmri.org/DPARSF), which is based on the toolbox for Data Processing & Analysis of Brain Imaging (DPABI, http://rfmri.org/DPABI) and Statistical Parametric Mapping (SPM, http://www.fil.ion.ucl.ac.uk/spm)51. The first ten frames were removed to make data reach equilibrium. The time series of images for each subject were realigned and averaged to a mean volume. Then individual structural images (T1w) were co-registered to the mean fMRI and segmented into gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF). After resampling to 3\(\times\)3\(\times\)3mm3 of voxel size, fMRI images were transformed from individual native space to MNI space. Nuisance signals were removed by a linear regression model with head motion parameters, linear trends, and WM and CSF signals included as regressors. Finally, images were spatially smoothed with a 4 mm FWHM Gaussian kernel (except for ReHo) and temporally filtered to preserve the signals of 0.01–0.1 Hz (except for ALFF and fALFF) and remove the high-frequency physiological noise.
We calculated the traditional voxel-wise ALFF, fALFF, and ReHo with DPARSF v5.1. For ROI-wise measures, we first extracted the average time series within each ROI based on the automated anatomical labeling (AAL) template10 (detailed in Supplementary Table 3). Then the PC-derived LOFC and HOFC (tHOFC and dHOFC), as well as SR-derived FC (SR, GSR, SLR, and SSGSR), were measured with BrainNetClass toolbox v1.144. After that, the effects of main covariates, including sex, manufacturer, site, and head motion, were regressed, and the residuals were used as inputs for the classification models.
As for PC-derived HOFC, tHOFC can be calculated using the following Eq. 19, 44:
$$\begin{array}{c}\begin{array}{c}tHOF{C}_{ij}=Corr\left(F{C}_{i},F{C}_{j}\right)\\ =\frac{{\sum }_{k=1}^{N}\left({w}_{ik}-\overline{{w}_{i\bullet }}\right)\left({w}_{jk}-\overline{{w}_{j\bullet }}\right)}{\sqrt{{\sum }_{k=1}^{N}{\left({w}_{ik}-\overline{{w}_{i\bullet }}\right)}^{2}}\sqrt{{\sum }_{k=1}^{N}{\left({w}_{jk}-\overline{{w}_{j\bullet }}\right)}^{2}}}\end{array}\#\left(1.\right)\end{array}$$
where \({w}_{ik}\) is the PC between \({i}^{th}\) and \({k}^{th}\) ROI signals, and \(i,j,k\in \{\text{1,2},\dots ,N\}, k\ne i,j\), \(N\) is the number of ROIs. Note that tHOFC indicates the similarity of the LOFC topographical profiles by measuring the correlation of correlation rather than that of the BOLD time series, which reflects the high-level property of the brain network and makes providing supplementary information to the traditional LOFC promising. The calculation of dHOFC is defined as19, 44:
$$\begin{array}{c}\begin{array}{c}dHOF{C}_{ij,pq}=Corr\left(F{C}_{ij},F{C}_{pq}\right)\\ =\frac{{\sum }_{\theta =1}^{{\Theta }}\left({w}_{ij}\left(\theta \right)-\overline{{w}_{ij}\left(\bullet \right)}\right)\left({w}_{pq}\left(\theta \right)-\overline{{w}_{pq}\left(\bullet \right)}\right)}{\sqrt{{\sum }_{\theta =1}^{{\Theta }}{\left({w}_{ij}\left(\theta \right)-\overline{{w}_{ij}\left(\bullet \right)}\right)}^{2}}\sqrt{{\sum }_{\theta =1}^{{\Theta }}{\left({w}_{pq}\left(\theta \right)-\overline{{w}_{pq}\left(\bullet \right)}\right)}^{2}}}\end{array}\#\left(2.\right)\end{array}$$
where \({\Theta }\) is the number of sliding windows which depends on whole time length (T) and window length (L), \({w}_{ij}\left(\theta \right)\) means PC between \({i}^{th}\) and \({j}^{th}\) ROIs in the \({\theta }^{th}\) sliding window. By calculating the correlation between two correlation time series, \(dHOF{C}_{ij,pq}\) shows how the correlation between the \({i}^{th}\) and the \({i}^{th}\) ROIs influence the correlation between the \({p}^{th}\) and the \({q}^{th}\) ROIs. In this way, the total number of elements of dHOFC is proportional to \({N}^{2}\times {N}^{2}\), much larger than that of PC and tHOFC (\(N\times N\)). An unsupervised clustering algorithm can be applied to group the resulting FCs into K clusters to reduce the dimension of the large-scale network. Each cluster contains several FCs showing a similar pattern of variation along the time. Finally, the averaged correlation time series in each cluster can be calculated to construct a \(K\times K\) HOFC network. Thus, the final low-scale dHOFC characterizes the temporal synchronization of dynamic FC time series.
As for the SR method, a sparse FC matrix can be generated by minimizing the loss function:
$$\begin{array}{c}\underset{W}{\text{min}}\frac{1}{2}{‖X-XW‖}_{F}^{2}+\lambda {‖W‖}_{1}\#\left(3.\right)\end{array}$$
where \(X\) is the original fMRI data matrix, \(W\) is the FC matrix, \(\lambda >0\) controls network sparsity with an l1-norm penalty. Another SR-based approach named GSR is formulated as the following21, 44.
$$\begin{array}{c}\underset{{W}_{i}}{\text{min}}{\sum }_{s=1}^{S}\left(\frac{1}{2}{‖{x}_{i}^{s}-{X}_{i}^{s}{w}_{i}^{s}‖}_{2}^{2}\right)+\lambda {‖{W}_{i}‖}_{\text{2,1}}\#\left(4.\right)\end{array}$$
where \({w}_{i}\) is the regional one-to-all LOFC of the \({i}_{th}\) ROI, and \({W}_{i}=\left[{w}_{i}^{1},{w}_{i}^{2},\dots ,{w}_{i}^{S}\right]\), \(S\) is the number of subjects. Controlled by an l2,1-penalty, GSR resultant FC matrices have less inter-subject variability than SR. SSGSR, as an optimized GSR, is to minimize the following loss function26:
$$\begin{array}{c}\underset{{W}_{i}}{\text{min}}{\sum }_{s=1}^{S}\left(\frac{1}{2}{‖{x}_{i}^{s}-{X}_{i}^{s}{w}_{i}^{s}‖}_{2}^{2}\right)+{\lambda }_{1}{‖{B}_{i}\odot {W}_{i}‖}_{\text{2,1}}+{\lambda }_{2}{\sum }_{p,q=1}^{S}{l}_{i}^{p,q}{‖{w}_{i}^{p}-{w}_{i}^{q}‖}_{2}^{2}\#\left(5.\right)\end{array}$$
where \({B}_{i}=\left[{b}_{i}^{1},{b}_{i}^{2},\dots ,{b}_{i}^{S}\right]\) and \({b}_{i}={e}^{-{w0}_{i,j}^{2}}\), \(w{0}_{i,j}\) is the LOFC between the \({i}_{th}\) and the \({j}_{th}\) ROIs, hence \({B}_{i}\) is to penalize weak LOFC. \({l}_{i}^{p,q}={e}^{-{‖{w0}_{i}^{p}-{w0}_{i}^{q}‖}_{2}^{2}}\) defines the similarity between the \({p}_{th}\) and the \({q}_{th}\) subjects in terms of their one-to-all LOFC patterns for the \({i}_{th}\) ROI. Therefore, \({\lambda }_{1}\) and \({\lambda }_{2}\) can control the weighted sparsity and inter-subject variability separately. Formula (6) shows the loss function of SLR22, 44:
$$\begin{array}{c}\underset{W}{\text{min}}\frac{1}{2}{‖X-XW‖}_{F}^{2}+{\lambda }_{1}{‖W‖}_{1}+{\lambda }_{2}{‖W‖}_{*}\#\left(6.\right)\end{array}$$
where \({\lambda }_{1}\) controls sparsity with an l1-norm and \({\lambda }_{2}\) controls modularity with a trace norm. Thus, SLR resulting FC matrices are sparse and low-rank, i.e., modularized and more biologically meaningful52.
The aforementioned HOFCs and SR methods were proposed and applied to various mental diseases, including MCI22, 26, 42 and ASD43, etc. Concerning ADHD, a study53 built a diagnostic model based on the temporal variability of dynamic functional connectivity, but it did not depend on sliding windows. Another study utilized dynamic functional network connectivity (dFNC) to access FC differences among child, adolescent, and adult ADHD patients rather than between patients and healthy controls. Moreover, the dFNC analysis process is different from dHOFC54. To our knowledge, no other study applies tHOFC, dHOFC, and three SR-derived FC (GSR, SSGSR, and SLR) to the classification of ADHD. For the first time, we extracted these features combined with traditional voxel-wise measures and ROI-wise PC and SR to perform a classification task on ADHD and compare the discriminative power of different measures.
2.4 ADHD Classification
We tested the performance of nine classification algorithms, including four basic classifiers (Logistic Regression (LR), K-Nearest Neighbors (KNN), Ridge Classifier, and Gaussian Naïve Bayes (GaussianNB)), two SVM-based classifiers (Linear SVM and non-linear SVM), and three tree-based ensemble classifiers (Random Forest (RF), Light Gradient Boosting Model (LGBM) and Adaptive Boosting (AdaBoost)). A nested CV was applied, including an outer 10-fold CV and an inner 5-fold CV. In detail, the inner CV with grid search was to tune the hyperparameters, while the outer CV was to minimize the overfitting. Owing to the high dimension and redundancy of the candidate features, we used Boruta47 to identify the features with discriminative power on the training set. This RF-based feature selection method has been proven effective in our previous study39. Noting that indices including dHOFC and SR-derived FCs are based on parameter-required methods, we tested the parameter sensitivity and chose the suggested parameters according to a subsequent classification of 10-fold CV for \(M\) times (\(M\) is the number of parameter combinations), in which the non-linear SVM with default configuration (kernel=‘rbf’ and C = 1.0) was used as classifier. The parameters with the highest average AUC were chosen. The details of parameter combinations associated with different indices are shown in Table 2. Finally, we applied two multi-modal classification methods (model-agnostic early fusion strategy and model-based Multiple Kernel Learning (MKL) algorithm55, 56) to fusing the discriminative features from different modalities and improving the classification performance.
Table 2
Parameter combinations and optimal parameters for different measures
Measure | Parameter | Values | Suggested value |
dHOFC | K | \([100, 200, \dots , 600]\) | 600 |
L | \([20, 30, \dots , 70]\) | 50 |
SR | \(\lambda\) | \([0.01, 0.02,\dots ,0.1]\) | 0.01 |
GSR | \(\lambda\) | \([0.01, 0.02,\dots ,0.1]\) | 0.01 |
SSGSR | \({\lambda }_{1}\) | \([0.01, 0.02,\dots ,0.1]\) | 0.06 |
\({\lambda }_{2}\) | \([0.01, 0.02,\dots ,0.1]\) | 0.01 |
SLR | \({\lambda }_{1}\) | \([0.01, 0.02,\dots ,0.1]\) | 0.05 |
\({\lambda }_{2}\) | \([0.01, 0.02,\dots ,0.1]\) | 0.08 |
K: Cluster Number; L: Window Length. |
The classification accuracy (ACC), precision, recall, and F1 score can be calculated as follows:
$$ACC=\frac{TP+TN}{TP+TN+FN+FP}$$
$$precision=\frac{TP}{TP+FP}$$
$$recall=\frac{TP}{TP+FN}$$
$$F1 score=\frac{2\bullet precision\bullet recall}{precision+recall}$$
where \(TP\) and \(TN\) are the counts of correctly classified ADHD patients and healthy subjects respectively, while \(FN\) and \(FP\) are counts of falsely classified ADHD patients and healthy subjects respectively. In addition, classification AUC represents the area under the receiver operating characteristic (ROC) curve drawn when the discrimination cutoff varies.