This paper introduces a method aimed at distinguishing individuals with autism from control candidates. This classification is crucial for halting the advancement of disorders and enhancing the lives of those affected. The structure and flowchart detailing this proposed method are depicted in Fig. 1.
To begin, the EEG signal needs to be obtained or prepared. In this study, the signal was acquired by downloading it from [49].The dataset used in this research comprises EEG recordings gathered through the Biosemi ActiveTwo EEG system. It includes data from 28 individuals diagnosed with autism spectrum conditions and 28 neurotypical controls, spanning ages from 18 to 68 years. These recordings were conducted during a 2.5-minute (150 seconds) eyes-closed resting period. Permission for data collection and sharing was obtained from the Health Research Authority, specifically under the IRAS ID 212171[49].
Once the preprocessing stage is completed, the filtered signals are subjected to a source localization technique. This process aims to estimate the active brain sources or regions. Its primary goal is to simplify complexity by identifying a limited number of active sources. This leads to determining the active brain regions and their corresponding spatial coordinates as the final output. In the context of source localization or pinpointing active brain regions, the goal is to minimize the following function:
The raw EEG signals underwent several preprocessing stages to eliminate unwanted noise and artifacts. Initially, a bandpass Butterworth filter was applied, setting the cutoff frequency between 0.5 and 100 Hz to minimize noise. Subsequently, the independent component analysis (ICA) method was utilized to separate independent components associated with brain activity. Any components identified as unrelated to brain function, such as those related to blinking, electromyography, 50 Hz powerline interference, auditory artifacts, etc., were removed from the component list, resulting in a refined signal. For more detailed insights into the techniques used for identifying artifacts and brain-related components, please refer to[50–52].
\(F=‖{V}_{K}-G{J}_{K}‖+\alpha ‖{J}_{K}‖\) (Eq. 1)
where \({V}_{K}\left(m\times 1\right)\) is recorded EEG signal in Kth sample, \({J}_{K}\left(3n\times 1\right)\) is the source activity of the brain in Kth sample and G\(\left(m\times 3n\right)\) is leadfield matrix which is calculated by forward problem solution like finite element method(FEM) or boundary element method(BEM)[53, 54]. This function consists of two distinct parts: the first part relates to estimating errors, while the second part focuses on controlling noise reduction and managing sharp variations in source activities. A parameter called α determines how much these components affect the final function. Calculating this parameter involves methods like Tikhonov regularization or using techniques such as the L-curve[55].
Various approaches exist for minimizing Eq. 1, one of which is sLORETA, known for its ability to achieve zero localization error[55, 56]. In the case of sLORETA, the explicit solution can be obtained by utilizing the known values of G and \({V}_{K}\):
\(\widehat{{J}_{K}}=T{.V}_{K}\) (Eq. 2)
where \(\widehat{{J}_{K}}\left(3n\times 1\right)\) is estimated source activity of the brain and T\(\left(3n\times m\right)\) is a matrix which can be calculated to source activity estimation from EEG signals(\({V}_{K}\)) and in sLORETA it can calculated from the following equation:
\(T={G}^{T}H{\left[HG{G}^{T}H+\alpha H\right]}^{+}\) (Eq. 3)
where \({\left[\right]}^{+}\)is pseudoinverse of matrix and H is regularization matrix for smoothness control and is defined as follows:
\(H=I-1{1}^{T}/{1}^{T}1\) (Eq. 4)
In this study, we consider the use of the identity matrix, denoted as I, and the vector of ones, denoted as 1, with dimensions of (m×1). The purpose of employing these mathematical constructs is to facilitate the active region selection process.
To initiate the active region selection, the source activity estimation(Eq. 2) technique is employed on time-varying EEG signals. For each sample, the estimation identifies certain sources as active. The final selection of active sources is based on the identification of regions or sources that exhibit a higher probability of being active in each sample. In essence, this entails selecting the brain source that exhibits a greater frequency of activity compared to other sources throughout the duration of the analysis.
Subsequently, a linear dynamic model is fitted to the selected active sources. This process involves aligning the parameters of the model with the extracted active sources to establish a meaningful representation of their dynamics.
\({J}_{K}={{F}_{K}J}_{K-1}+{\eta }_{K}\) (Eq. 5)
where \({\eta }_{K}\) is state noise and \({F}_{K}\) is relation matrix at Kth sample. The \({F}_{K}\) characterizes the interdependencies among different brain active regions, as well as the relationship between several active regions and its own activity at sample K in relation to sample K-1. On the contrary, the relationship between active sources and EEG signals is influenced by the leadfield matrix, which is derived through the implementation of the forward problem method. This matrix encapsulates the spatial sensitivity patterns that connect the active sources to the measured EEG signals. In a simplified manner, this relationship can be expressed as follows:
\(\left\{\begin{array}{c}{J}_{K}={{F}_{K}J}_{K-1}+{\eta }_{K} \\ \\ {V}_{K}=G{J}_{K}+{\epsilon }_{K}\end{array}\right.\) (Eq. 6)
In this problem, the source activity during the time(\({J}_{K}\)) and their relation matrix in time and space(\({F}_{K}\)) are the unknown parameters and they are estimated by dual Kalman filter. the estimation equations and formulas are described in [57, 58]. After performing the calculation for dynamic source activity (\({J}_{K}\)) and by knowing \({V}_{K}\) as EEG recorded signal, the next step involves fitting a multivariate autoregressive moving average model on the EEG channels, with \({V}_{K}\) being modeled in relation to previous samples of V and J. This multivariate ARMA(p,q) model can be expressed as follows in its general form:
\({V}_{K}=\sum _{i=1}^{p}{a}_{i}{V}_{K-i}+ \sum _{i=0}^{q}{b}_{i}{J}_{K-i}\) (Eq. 7)
where \({a}_{i},{b}_{i}\) are the model parameter matrixes at ith sample. Given that V is a vector with a size of m and J is a vector with a size of n, the size of the coefficient matrix \({a}_{i}\) is m×p, and the size of the coefficient matrix \({b}_{i}\) is n×(q + 1).
In addition to the aforementioned multivariate ARMA model, our method also incorporates a second time series model that accounts for nonstationary EEG signals. This model is known as the autoregressive integrated moving average (ARIMA) model. The ARIMA(p,d,q) model is defined as follows:
\({V}_{K}=\sum _{i=1}^{p}{a}_{i}{V}_{K-i}+ \sum _{i=0}^{q}{b}_{i}{J}_{K-i}+\sum _{i=1}^{d}{c}_{i`}{(1-Z)}^{d}{Y}_{i}\) (Eq. 8)
where Z is the delay factor which are defined as follows
$$\left(1-Z\right){Y}_{i}={Y}_{i}-{Y}_{i-1}$$
\({\left(1-Z\right)}^{2}{Y}_{i}={Y}_{i}-2{Y}_{i-1}+{Y}_{i-2}\) (Eq. 9)
In this section, the source activities are considered as the input component of the model, while EEG signals are considered as the output component. The aim is to model each sample of the EEG signal using its delayed samples and source activities. To achieve this, ARMA matrices (denoted as 'a' and 'b') and ARIMA matrices (denoted as 'a', 'b', and 'c') are calculated for each sample. The size of these matrices depends on the model orders.
Then, various features are extracted from these model parameters for the purpose of classification. Firstly, to reduce complexity, these parameters are divided into two classes, and several features are extracted from each class:
-
Class 1: Parameters with small variation and a standard deviation lower than 20% of the mean value of the array. The mean values of these parameters are selected to contribute to the feature vectors.
-
Class 2: Parameters with values greater than those in Class 1. For this class, the following statistical features are selected for feature vector:
-
Mean value of the array signal.
-
Standard deviation of the array signal.
-
Kurtosis of the array signal.
-
Skewness of the array signal.
-
Entropy of the array signal.
Next, the horizontal[59] and natural visibility graphs[60, 61] of each array time series are constructed, and specific features from these graphs are selected to be part of the feature vectors. For a detailed understanding of visibility graphs, additional information can be found in [60, 62, 63]. Readers are encouraged to refer to these sources for further exploration.
-
The mean of the graph nodes
-
The standard deviation of graph nodes
-
The mean value of the shortest path from each node to other nodes
Following feature extraction and the creation of the feature vector, it becomes important to reduce the dimensionality of this vector to simplify complexity and enhance classification accuracy. Principal Component Analysis (PCA) is a highly recognized and effective method for feature vector dimension reduction. PCA aims to transform a set of potentially correlated high-dimensional features into a new set of uncorrelated variables known as principal components. This transformation retains the most important information from the original features while decreasing their dimensionality. PCA begins by normalizing or centering each feature by subtracting its mean value. Then, a correlation matrix is formed from these normalized features, and the eigenvalues of this matrix are calculated. The eigenvectors corresponding to high-magnitude eigenvalues are selected and arranged to construct a transformation matrix. This matrix helps project the original feature vector onto a lower-dimensional space defined by these selected eigenvectors. By discarding eigenvectors with low eigenvalues, PCA effectively reduces dimensionality while maintaining critical information.
In the final phase of the algorithm, a Support Vector Machine (SVM) is used as a classifier to differentiate and identify depressive subjects from normal individuals. SVM is selected for its strong capability in classifying high-dimensional features. It aims to find a hyperplane that maximizes the margin between different training classes, focusing on the closest points known as support vectors. While briefly described here, more comprehensive information about SVM's mechanisms and optimization techniques can be found in relevant literature. SVM exhibits robust classification performance and is widely applied across various fields due to its adeptness in handling complex feature spaces and efficiently managing classification tasks.