This section reviews two categories of methods used for classifying Autism Spectrum Disorder (ASD) and Typically Developing (TD) MRI images. The first category examines machine learning approaches for classification, while the second category focuses on deep learning methods utilized in ASD classification.
2.1. Machine Learning-based Methods
Machine learning has become a valuable asset in the early detection of Autism, although challenges arise with the increasing complexity of image features and massive datasets. Traditional ML algorithms, such as Support Vector Machines (SVM), Random Forest (RF), Decision Tree (DT), and K-Nearest Neighbors (KNN), are commonly employed for ASD vs. TD classification.
Traditional ML methods often necessitate manual feature extraction. For instance, Xu et al. proposed a novel approach using features like Gray Matter Volume, White Matter Volume, Fractional Anisotropy, Mean Diffusivity, Regional Homogeneity, Amplitude of Low-Frequency Fluctuation and applied machine learning techniques such as Decision Tree classifiers, SVM, and Logistic Regression for ASD classification [8]. Similarly, Liu et al. utilized MRI-based features including Resting-state, Task-Based, Dynamic, or High-Order Functional Connectivity Features, employing SVM, Logistic Regression, Random Forest, and Gaussian Naïve Bayes classifiers [9].
Haweel et al. introduced a method where features such as the mean of GLM parameter estimates, mean z-stat, mean, standard deviation, and counts of values above a threshold were computed, capturing different aspects of brain activity. These features were then inputted into a Random Forest classifier [10]. Kazeminejad et al. created functional connectivity networks from ROI time-series using correlation and mutual information metrics. Graph metrics were calculated, and features were selected via a sequential forward floating algorithm. Classification was performed using Gaussian SVM with 10-fold cross-validation, validated with Welch's t-tests, and FDR-corrected p-values were reported for comparison [11].
Sharif et al. proposed a technique incorporating features such as Corpus Callosum, Witelson's (W) parameters, Information gain, and Chi-Square, utilizing LDA (Linear Discriminant Analysis), SVM, Random Forest, Multi-Layer Perceptron, and KNN [12]. Abdullah et al. employed features like Chi-square and LASSO (Least Absolute Shrinkage and Selection Operator) with supervised machine learning algorithms like Random Forest, Logistic Regression, and KNN [13].
Chaitra et al. utilized Functional Connectivity and Complex Network Measures for feature extraction, employing the Recursive-Cluster-Elimination SVM Algorithm for classification [14]. Yassin et al. utilized features such as Cortical Thickness and Subcortical volume with classifiers such as SVM, Logistic Regression, and Decision Tree, comparing model performance [15]. Eslami et al. introduced a method in which Morphometric Features, including volume, area, thickness, curvature, and folding index of distinct brain regions, are inputted into an SVM and Random Forest classifier [16].
While traditional machine learning methods have demonstrated good performance in distinguishing between ASD and TD data, they rely on manual feature extraction. This process is time-consuming and may not capture the most relevant aspects of image data accurately. Additionally, conventional algorithms often struggle with managing the intricate relationships and high dimensionality inherent in image data, leading to potential misclassification.
On the other hand, deep learning models can automatically learn these features directly from the image data. This eliminates the need for manual feature crafting and ensures that critical aspects of the data are captured effectively. Moreover, deep learning models achieve significantly higher classification accuracy than machine learning methods due to their ability to capture complex relationships within the image data.
2.2 Deep Learning-based Methods
For classification of Autism, different types of deep learning approaches are used. Among these CNN is widely utilized in image-based disease diagnosis because of the following: (1) They excel at extracting and identifying complex aspects from the input images enabling them to identify minute patterns. (2) They employ a hierarchical processing approach which allows them to analyze images at multiple levels of abstraction and facilitates the detection of complex structures and relationships within the image data. (3) They leverage spatial relationships within images, enabling them to understand the spatial arrangement of features. (4) They utilize specialized operations such as convolution and pooling, which are computationally efficient and well-suited for processing large volumes of image data commonly encountered in medical imaging tasks.
Transfer learning involves utilizing pre-trained CNN models, where parameters learned from one model are applied to another model for making predictions. This approach is highly effective when the target data resembles the input data. Significant research has been conducted in Alzheimer's Disease classification using transfer learning techniques. Ahmad et al. utilized pre-trained CNNs like ResNet34, ResNet50, AlexNet, MobileNetV2, VGG16, and VGG19 for ASD diagnosis, where proposed model was ResNet34 [17]. Kanimozhi A et al. ResNet50, a deep convolutional neural network, to predict ASD [18]. Reddy et al. used pre-trained CNN models, VGG16, VGG19, and EfficientnetB0 to detect autism from facial images using deep learning [19]. It was inferred that ResNet gave better results in terms of accuracy for classification of AD.
Custom convolutional neural networks models are used because they are flexible with the layer creation and produce better results than pre-trained models. Sherkatghanad et al. developed a CNN model which used fewer parameters to detect ASD [20]. Five different algorithms were employed by M. F. Rabbi et al., which includes Multilayer Perceptron, Random Forest, Gradient Boosting Machine, AdaBoost, and Convolutional Neural Network. CNN is proposed for early detection of Autism Spectrum Disorder in children [21]. In medical image analysis, DL techniques—particularly CNNs—are essential for MRI segmentation. Techniques like data augmentation and transfer learning are used to overcome problems like data scarcity, which eventually improves patient care outcomes and advances the area of medical imaging [22]. Wenjing Jiang et al. introduced a novel ASD classification network called CNNG, combining Convolutional Neural Network and Gate Recurrent Unit to detect ASD [23]. Using MRI data, research investigates CNNs and traditional classifiers for predicting the progression of AD from MCI, addressing issues with generalization and accuracy. To improve predictions, recent research looks into feature fusion and age correction. Accuracy and AUC in predicting MCI-to-AD conversion are improved when CNNs are combined with structural brain features [24]. In order to improve classification accuracy, recent research has looked into new training criteria for deep neural networks, examining different loss functions such as cross-entropy and M3CE. Deep CNNs are being used more and more for image categorization and implicit feature extraction. To evaluate their effects, comparison studies between various loss functions and classifiers—including SVM and KNN—have been carried out [25]. Semantic segmentation and object detection have been extended by deep learning, particularly with CNNs, which has revolutionised image categorization. CNN-based techniques are superior in remote sensing for scene classification. The paper highlights attention processes, lightweight designs, hyperparameter optimization, and obstacles in semi-supervised learning as it covers the evolution from classic models to state-of-the-art architectures [26]. CNNs are used in computer vision to automatically classify images. Accuracy is improved through feature fusion from numerous layers. MatConvNet shortens training times and allows CPU training. The effectiveness of CNN in picture recognition is shown, providing support for future developments in neural network approaches as well as real-time applications [27].
NLP transformers work by dividing input sequences into tokens, using self-attention mechanisms to assess token relevance, and iteratively improving representations through transformer layers to capture complex linguistic patterns and distant relationships. Transformers are used in image classification to assess global spatial relationships and dependencies by interpreting images as a sequence of patches or tokens. This is important for jobs that need context-based predictions and holistic image knowledge. Swin Transformer is widely used for image classification specifically for biomedical disorders because: (1) They can able to handle large and varying image sizes and input resolutions. (2) The hierarchical attention method enables efficient image processing by reducing computational complexity. (3) Shifted windowing enhances efficiency by focusing self-attention on local, non-overlapping windows with cross window connections. (4) It enables effective acquisition of both local and global characteristics inside images. For instance, Hyaung et al. proposed a resizer swin transformer to extract multi-scale and cross-channel features from structural MRI (sMRI) brain images [28]. T. Illakiya et al. proposed a hybrid model of the swin transformer, the Dimension Centric Proximity Aware Attention Network (DCPAN), and the Age Deviation Factor (ADF), which combines global, local, proximal, and dimensional dependencies [29]. Asiri et al. used the Swin transformer to classify four different brain tumours, which outperformed models like CNN, DCNN, ViT, and their variants [30].
Hybrid deep learning models enhance classification accuracy by leveraging the strengths of various deep learning models that extract distinct features. This collaborative approach improves model performance by combining the unique capabilities of different models, thereby enhancing overall accuracy and robustness. Additionally, hybrid models can effectively handle complex data structures and patterns that may be challenging for a single model to capture comprehensively.
For example, Kumar et al. proposed a deep hybrid model by combining heterogeneous pre-existing convolutional neural networks such as VGG16 and Xception through fusion techniques [31]. Ulaganathan et al. extracted features from fMRI images using a Wiener filter, followed by ROI (region of interest) extraction. Subsequently, these features were refined using a driving training political optimizer (DTPO) and classified using a DQN (Deep Q Learning Network) and SpinalNet [32]. Jain et al. extracted features using the VGG-16 network for ROI-based functional connectivity and classified them with the DM-ResNet (Dwarf Mongoose optimized Residual Network) for binary classification [33]. Deep learning methods like CNNs, transfer learning, transformers, and play crucial roles in Autism classification. CNNs excel at robust feature extraction, transfer learning reduces computational costs, transformers understand complex relationship between pixels, and Swin Transformers adapt well to medical imaging data. Hybrid models further enhance accuracy by combining these strengths.