A Dual Track Feature Fusion Network for ASD Detection using Swin Transformers and Convolutional Neural Network

doi:10.21203/rs.3.rs-4341529/v1

Download PDF

Research Article

A Dual Track Feature Fusion Network for ASD Detection using Swin Transformers and Convolutional Neural Network

https://doi.org/10.21203/rs.3.rs-4341529/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Early diagnosis of Autism Spectrum Disorder (ASD) plays a crucial role in enriching a child's development, particularly in improving social communication, language development, and addressing behavioural challenges. Early signs of autism may be observable in childhood, but a formal diagnosis often occurs later in life. Behavioural-based assessments, such as the Autism Diagnostic Interview-Revised (ADI-R) and Autism Diagnostic Observation Schedule-Revised (ADOS-R), are currently used for diagnosing ASD. These methods of diagnosis are time-consuming and require trained professionals. Due to these disadvantages of the traditional method of diagnosis, deep learning is used, where feature extraction is done automatically from Magnetic Resonance Imaging (MRI) data, eliminating the reliance on subjective pre-defined features. This advancement not only captures subtle information that may be missed by human-defined features but also enhances accuracy significantly. The dataset comprises of axial view of MRI images from ABIDE-I dataset from Autism Brain Imaging Data Exchange (ABIDE) database. This study proposes a dual-track feature fusion network architecture comprising Swin Transformer and customised Convolutional Neural Network (CNN) for precise classification. Swin Transformers excel in capturing long-range dependencies within images, facilitating a deeper understanding of interrelations among different image components. Concurrently, CNNs are adept at extracting local features, thus contributing to improved classification performance by considering both local and global features. The experimental outcomes highlight the efficacy of the proposed feature fusion network, showcasing an accuracy rate of 98.7%, precision of 98.12%, recall of 98.77%, and an F1-score of 98.65% upon evaluation using the ABIDE dataset.

Autism Spectrum Disorder

Typical Development

Autism Brain Imaging Data Exchange

Convolutional Neural Networks

Quantum Support Vector Machine.

Autism Spectrum Disorder (ASD) is a neurodevelopmental condition impacting how individuals perceive and interact with the world world. It's characterized by a spectrum of challenges, primarily in social interaction, communication, and sensory processing. This diversity indicates that ASD manifestations vary greatly. Some individuals might struggle significantly with social cues and verbal communication, while others may showcase exceptional talents in specific areas. Epidemiological studies estimate that ASD affects approximately 1–2% of the general population [1]. With variations among areas, the prevalence of ASD among 8-year-olds in 2018 was 23.0 per 1,000, or one in 44. There are differences in the prevalence of rates based on areas and demographic groups, as shown by research performed by the CDC's Autism and Developmental Disabilities Monitoring Network. Age of diagnosis and identification rates vary among racial and ethnic groups, despite similar overall frequency, these differences highlight the need for focused interventions [2].

Autism spectrum disorder (ASD) diagnosis usually requires a thorough, multidisciplinary approach that incorporates a number of assessment modalities. Though more recent research points to a closer male-to-female ratio of 3:1, males are diagnosed more frequently than females. Due to masking behaviours and gender biases, females with ASD may present differently and may misdiagnose or receive a delayed diagnosis [3]. The DSM-5 (Diagnostic and Statistical Manual of Mental Disorders) is a worldwide diagnostic requirement for autism spectrum disorders that requires ongoing challenges in social interaction and communication in three areas: relationship development, social-emotional reciprocity, and nonverbal communication. A minimum of two of the four behaviors listed—inflexibility to routine changes, focused interests, sensory hypo- or hyperactivity, and repetitive motions or speech—must also be present for a diagnosis to be made. The Autism Diagnostic Interview-Revised (ADI-R) is a structured interview for parents or caregivers of children aged at least two years old. ADOS (Autism Diagnostic Observation Schedule) is a semi-structured interview between the clinician and the individual being assessed. ADI-R and ADOS are divided into five sections: initial inquiry, communication, social development and play, repetitive and restricted behavior questions, and general behavior concern questions. There are several other manual detection tools, such as M-CHAT (Modified Checklist for Autism in Toddlers) and CARS (Childhood Autism Rating Scale), which work on a scoring basis based on the information provided by the parents or caretakers.

The subjective aspect of manual autism detection depends on clinician interpretation and introduces biases and inconsistencies. Furthermore, differences in diagnostic standards and evaluation instruments across physicians may result in disparities in diagnosis, increasing the possibility of an under- or incorrect diagnosis of ASD. Gold-standard examinations such as ADOS and the Autism Diagnostic Interview-Revised can still be biased by examiner-child interactions and caregiver memory. Intrinsic constraints in clinical instruments, such as a lower age limit for diagnosis, limit early detection of ASD symptoms [4]. The DSM-5 ignores critical ASD behaviours such as difficulties in play, social anxiety, language delays, and behavioural issues, potentially resulting in missed or incorrect diagnoses and reducing diagnostic accuracy by failing to capture nuances such as poor imitation skills, leading to misdiagnosis or delayed recognition of ASD [5].

Objective detection of autism is imperative to reduce reliance on subjective interpretation, enhance diagnostic consistency, enable early intervention and accessibility, and address the growing prevalence of autism spectrum disorder, ensuring timely support and better outcomes for affected individuals. Clinical approaches to early autism identification are often time-consuming and underutilised unless the risk of ASD is high, whereas machine learning provides expedited and accurate ASD model training, which is critical for rapid risk assessment and streamlined diagnostics [6]. Experts with limited exposure to children with ASD may inaccurately diagnose due to diverse presentations, underscoring the necessity for automated diagnostic tools to assist clinicians in precise ASD diagnosis [7].

This section reviews two categories of methods used for classifying Autism Spectrum Disorder (ASD) and Typically Developing (TD) MRI images. The first category examines machine learning approaches for classification, while the second category focuses on deep learning methods utilized in ASD classification.

2.1. Machine Learning-based Methods

Machine learning has become a valuable asset in the early detection of Autism, although challenges arise with the increasing complexity of image features and massive datasets. Traditional ML algorithms, such as Support Vector Machines (SVM), Random Forest (RF), Decision Tree (DT), and K-Nearest Neighbors (KNN), are commonly employed for ASD vs. TD classification.

Traditional ML methods often necessitate manual feature extraction. For instance, Xu et al. proposed a novel approach using features like Gray Matter Volume, White Matter Volume, Fractional Anisotropy, Mean Diffusivity, Regional Homogeneity, Amplitude of Low-Frequency Fluctuation and applied machine learning techniques such as Decision Tree classifiers, SVM, and Logistic Regression for ASD classification [8]. Similarly, Liu et al. utilized MRI-based features including Resting-state, Task-Based, Dynamic, or High-Order Functional Connectivity Features, employing SVM, Logistic Regression, Random Forest, and Gaussian Naïve Bayes classifiers [9].

Haweel et al. introduced a method where features such as the mean of GLM parameter estimates, mean z-stat, mean, standard deviation, and counts of values above a threshold were computed, capturing different aspects of brain activity. These features were then inputted into a Random Forest classifier [10]. Kazeminejad et al. created functional connectivity networks from ROI time-series using correlation and mutual information metrics. Graph metrics were calculated, and features were selected via a sequential forward floating algorithm. Classification was performed using Gaussian SVM with 10-fold cross-validation, validated with Welch's t-tests, and FDR-corrected p-values were reported for comparison [11].

Sharif et al. proposed a technique incorporating features such as Corpus Callosum, Witelson's (W) parameters, Information gain, and Chi-Square, utilizing LDA (Linear Discriminant Analysis), SVM, Random Forest, Multi-Layer Perceptron, and KNN [12]. Abdullah et al. employed features like Chi-square and LASSO (Least Absolute Shrinkage and Selection Operator) with supervised machine learning algorithms like Random Forest, Logistic Regression, and KNN [13].

Chaitra et al. utilized Functional Connectivity and Complex Network Measures for feature extraction, employing the Recursive-Cluster-Elimination SVM Algorithm for classification [14]. Yassin et al. utilized features such as Cortical Thickness and Subcortical volume with classifiers such as SVM, Logistic Regression, and Decision Tree, comparing model performance [15]. Eslami et al. introduced a method in which Morphometric Features, including volume, area, thickness, curvature, and folding index of distinct brain regions, are inputted into an SVM and Random Forest classifier [16].

While traditional machine learning methods have demonstrated good performance in distinguishing between ASD and TD data, they rely on manual feature extraction. This process is time-consuming and may not capture the most relevant aspects of image data accurately. Additionally, conventional algorithms often struggle with managing the intricate relationships and high dimensionality inherent in image data, leading to potential misclassification.

On the other hand, deep learning models can automatically learn these features directly from the image data. This eliminates the need for manual feature crafting and ensures that critical aspects of the data are captured effectively. Moreover, deep learning models achieve significantly higher classification accuracy than machine learning methods due to their ability to capture complex relationships within the image data.

2.2 Deep Learning-based Methods

For classification of Autism, different types of deep learning approaches are used. Among these CNN is widely utilized in image-based disease diagnosis because of the following: (1) They excel at extracting and identifying complex aspects from the input images enabling them to identify minute patterns. (2) They employ a hierarchical processing approach which allows them to analyze images at multiple levels of abstraction and facilitates the detection of complex structures and relationships within the image data. (3) They leverage spatial relationships within images, enabling them to understand the spatial arrangement of features. (4) They utilize specialized operations such as convolution and pooling, which are computationally efficient and well-suited for processing large volumes of image data commonly encountered in medical imaging tasks.

Transfer learning involves utilizing pre-trained CNN models, where parameters learned from one model are applied to another model for making predictions. This approach is highly effective when the target data resembles the input data. Significant research has been conducted in Alzheimer's Disease classification using transfer learning techniques. Ahmad et al. utilized pre-trained CNNs like ResNet34, ResNet50, AlexNet, MobileNetV2, VGG16, and VGG19 for ASD diagnosis, where proposed model was ResNet34 [17]. Kanimozhi A et al. ResNet50, a deep convolutional neural network, to predict ASD [18]. Reddy et al. used pre-trained CNN models, VGG16, VGG19, and EfficientnetB0 to detect autism from facial images using deep learning [19]. It was inferred that ResNet gave better results in terms of accuracy for classification of AD.

Custom convolutional neural networks models are used because they are flexible with the layer creation and produce better results than pre-trained models. Sherkatghanad et al. developed a CNN model which used fewer parameters to detect ASD [20]. Five different algorithms were employed by M. F. Rabbi et al., which includes Multilayer Perceptron, Random Forest, Gradient Boosting Machine, AdaBoost, and Convolutional Neural Network. CNN is proposed for early detection of Autism Spectrum Disorder in children [21]. In medical image analysis, DL techniques—particularly CNNs—are essential for MRI segmentation. Techniques like data augmentation and transfer learning are used to overcome problems like data scarcity, which eventually improves patient care outcomes and advances the area of medical imaging [22]. Wenjing Jiang et al. introduced a novel ASD classification network called CNNG, combining Convolutional Neural Network and Gate Recurrent Unit to detect ASD [23]. Using MRI data, research investigates CNNs and traditional classifiers for predicting the progression of AD from MCI, addressing issues with generalization and accuracy. To improve predictions, recent research looks into feature fusion and age correction. Accuracy and AUC in predicting MCI-to-AD conversion are improved when CNNs are combined with structural brain features [24]. In order to improve classification accuracy, recent research has looked into new training criteria for deep neural networks, examining different loss functions such as cross-entropy and M3CE. Deep CNNs are being used more and more for image categorization and implicit feature extraction. To evaluate their effects, comparison studies between various loss functions and classifiers—including SVM and KNN—have been carried out [25]. Semantic segmentation and object detection have been extended by deep learning, particularly with CNNs, which has revolutionised image categorization. CNN-based techniques are superior in remote sensing for scene classification. The paper highlights attention processes, lightweight designs, hyperparameter optimization, and obstacles in semi-supervised learning as it covers the evolution from classic models to state-of-the-art architectures [26]. CNNs are used in computer vision to automatically classify images. Accuracy is improved through feature fusion from numerous layers. MatConvNet shortens training times and allows CPU training. The effectiveness of CNN in picture recognition is shown, providing support for future developments in neural network approaches as well as real-time applications [27].

NLP transformers work by dividing input sequences into tokens, using self-attention mechanisms to assess token relevance, and iteratively improving representations through transformer layers to capture complex linguistic patterns and distant relationships. Transformers are used in image classification to assess global spatial relationships and dependencies by interpreting images as a sequence of patches or tokens. This is important for jobs that need context-based predictions and holistic image knowledge. Swin Transformer is widely used for image classification specifically for biomedical disorders because: (1) They can able to handle large and varying image sizes and input resolutions. (2) The hierarchical attention method enables efficient image processing by reducing computational complexity. (3) Shifted windowing enhances efficiency by focusing self-attention on local, non-overlapping windows with cross window connections. (4) It enables effective acquisition of both local and global characteristics inside images. For instance, Hyaung et al. proposed a resizer swin transformer to extract multi-scale and cross-channel features from structural MRI (sMRI) brain images [28]. T. Illakiya et al. proposed a hybrid model of the swin transformer, the Dimension Centric Proximity Aware Attention Network (DCPAN), and the Age Deviation Factor (ADF), which combines global, local, proximal, and dimensional dependencies [29]. Asiri et al. used the Swin transformer to classify four different brain tumours, which outperformed models like CNN, DCNN, ViT, and their variants [30].

Hybrid deep learning models enhance classification accuracy by leveraging the strengths of various deep learning models that extract distinct features. This collaborative approach improves model performance by combining the unique capabilities of different models, thereby enhancing overall accuracy and robustness. Additionally, hybrid models can effectively handle complex data structures and patterns that may be challenging for a single model to capture comprehensively.

For example, Kumar et al. proposed a deep hybrid model by combining heterogeneous pre-existing convolutional neural networks such as VGG16 and Xception through fusion techniques [31]. Ulaganathan et al. extracted features from fMRI images using a Wiener filter, followed by ROI (region of interest) extraction. Subsequently, these features were refined using a driving training political optimizer (DTPO) and classified using a DQN (Deep Q Learning Network) and SpinalNet [32]. Jain et al. extracted features using the VGG-16 network for ROI-based functional connectivity and classified them with the DM-ResNet (Dwarf Mongoose optimized Residual Network) for binary classification [33]. Deep learning methods like CNNs, transfer learning, transformers, and play crucial roles in Autism classification. CNNs excel at robust feature extraction, transfer learning reduces computational costs, transformers understand complex relationship between pixels, and Swin Transformers adapt well to medical imaging data. Hybrid models further enhance accuracy by combining these strengths.

The section includes overview of the proposed work and explanation about each track including Swin Transformer, Custom CNN and Quantum SVM.

3.1 Overview of the proposed system

This study presents a novel framework for classifying Autism Spectrum Disorder (ASD) using Magnetic Resonance Imaging (MRI) data. Leveraging the synergy of deep learning and quantum computing, our system achieves high accuracy in ASD classification. The framework combines Swin Transformers and custom Convolutional Neural Networks (CNNs) to extract robust and informative features from MRI data, effectively capturing patterns specific to ASD and neurotypical brain function. These features are given to a MLP Classifier, which is shown in Fig. 1. These features are then inputted into a Quantum Support Vector Machine (QSVM) classifier. The QSVM harnesses the parallel processing capabilities of quantum computing with a 16-qubit input, significantly enhancing classification accuracy compared to classical machine learning methods. This integrated approach offers substantial advantages in both feature representation and classification efficiency, making it particularly well-suited for complex tasks like distinguishing between ASD and non-ASD brain activity in MRI images. By combining advanced deep learning techniques with cutting-edge principles from quantum computing, this system promises superior performance in ASD classification, paving the way for a valuable tool in medical image analysis and neuroimaging research.

3.2 Swin Transformer

The Swin Transformer is a vision transformer with a hierarchical way of classifying images and performing semantic segmentation using shifted windows [34]. Figure 2. represents the working of the Swin transformer, where the patch partition splits the input image 224x224 into patch sizes of 4x4, resulting in 3136 patches. Each patch is a colored image with 3 channels, giving the patch 48-dimensional features. The linear embedding part linearly transforms each patch into a C-dimensional vector, which is then passed to the attention layer inside the transformer block. The transformer block has two subunits, as seen in Fig. 3. Each subunit consists of a normalization layer (LN), followed by an attention module, another normalization layer, and a multilayer perceptron (MLP), but both subunits have different attention modules. Window-based Multilayer Self-Attention (W-MSA) uses a window as a collection of patches, computing attention only within the window size (M), preventing interconnections between patches outside the designated windows. Shifted window Multilayer self-attention (SW-MSA) is used in transformers, particularly in capturing long-range dependencies in images and maintaining computational efficiency. SW-MSA shifts the windows by M/2 x M/2, using cyclic shift to maintain the same number of windows in both attention layers. The Swin Transformer integrates hierarchical patch merging as a crucial strategy to capture global information efficiently. With each merging operation, both height (H) and width (W) dimensions are reduced by a factor of 2. The image is split into 2x2 groups, wherein patches are stacked depth-wise and concatenated, resulting in downsampling by a factor of N. This process systematically aggregates neighboring patches, facilitating the extraction of comprehensive global features.

Using shifted windows method, Swin Transformer Block can be computed as

$${\widehat{\text{z}}}^{\text{l}}=\text{W}-\text{M}\text{S}\text{A}\left(\text{L}\text{N}\left({\text{z}}^{\text{l}-1}\right)\right)+{\text{z}}^{\text{l}-1}$$

Equations (1) and (3) shows the output of the (S)W-MSA block. It involves adding the layer normalized input to the self-attention operation of the previous layer’s output. whereas Eqs. (2) and (4) shows the processing within the MLP block, and ‘$\text{l}$’ denotes the swin transformer bloch position. The embedded patch vector is layer-normalized and then fed into an MLP. The output of the MLP is added to the layer-normalized input.

3.3 Custom CNN

For analysing brain imaging data, 2D Custom CNN is a good choice since they are proficient at recognizing the spatial relationships in images which cannot be seen by human eyes. The architecture comprises a total of seven convolutional layers, each followed by max-pooling layer in order to enhance spatial invariance and downsample the feature maps. A progressive increase in the number of filters is used in the structure of the convolutional layers. The number of filters in each convolutional layer increases progressively, starting from 32 to 1024. Each convolutional layer utilises a kernel size of 3x3, indicating the spatial extent of the convolution operation. These convolutional layers apply kernel for step wise convolutions which are crucial for extracting hierarchical features from the input images that are crucial for discriminating between ASD and non-ASD individuals. The 'relu' activation function is applied to each and every layer immediately after convolutional.

A pooling window of size (2, 2) is applied to each of the layer after convolution layer with a stride of 2 is applied to perform max pooling, effectively reducing the size of feature maps by half along each dimension. The output of the max pooling layer is reshaped by the flattened layer into a 1D vector, which serves as the input for fully connected layers. This output is then given the dense layer for classification.

Dense layers serve as the final stage of feature extraction, combining abstract representations learned by the convolutional layers to make a classification decision. There are 3 dense layers defined in which every dense layer has an activation function called "relu" (Rectified Linear Unit) and a fixed quantity of units (neurons). The number of units or neurons progressively decreases from 4096 to 2000 and then to 300 in successive dense layers. ReLU is a popular choice in mitigating the vanishing gradient problem and it introduces non-linearity to the model, which allows the model to learn complex relationships between features in the data. Dense layers are responsible for combining the learned features from preceding layers and making final predictions. Each dense layer acts as a fully connected layer, where each neuron in the layer is connected to every neuron in the preceding layer as shown in the Fig. 4. The successive reduction in the number of units across dense layers serves as a form of dimensionality reduction. The last dense layer serves as the penultimate layer before the final output layer which is designed with 300 units, allowing it to further compress and abstract the learned features from preceding layers. The final output layer generates class probabilities using the softmax activation function, facilitating classification.

3.4 Quantum SVM

The workflow for Quantum Support Vector Machine (QSVM) involves several key steps. Firstly, quantum circuits are constructed and simulated using Aer Qiskit simulators, including the Statevector simulator for ideal states and QASM simulator for realistic, noisy simulations. Next, a quantum feature map, such as the Z-feature map, shown in Fig. 5 is employed to convert classical data points into quantum states, followed by the creation of a quantum kernel (e.g., Z-kernel) to measure similarities between these quantum states, capturing data relationships for classification. The generated kernel matrices encode pairwise similarities and are used as inputs for a classical Support Vector Machine (SVM) algorithm, which leverages QSVM's quantum features for data classification.The core of the workflow lies in training the QSVM model on quantum kernels derived from feature maps, leading to precise predictions evaluated against testing data, which is shown in Fig. 6. The input features of the Swin Transformer and the input features of the Custom CNN are concatenated, and the features are reduced to 16 features along with labels after applying PCA, which is given as input to the 16-qubit QSVM classifier.

This section delves into key aspects crucial for a thorough analysis of the study's outcomes. We start by detailing the dataset, followed by the research setup encompassing experimental conditions and utilized resources. Additionally, we discuss the findings from the ablation study, effect of cross validation, a comparative analysis with state-of-the-art networks to underscore our approach's strengths and weaknesses and a performance evaluation against existing works.

4.1 Dataset Description

The Autism Brain Imaging Data Exchange I (ABIDE I) dataset presents a collection of MRI data, featuring 539 NII files from individuals diagnosed with Autism Spectrum Disorder (ASD) and 573 NII files from Typically Developing (TD) individuals, spanning diverse age groups. In addition, the dataset encompasses 948 NII files representing male participants and 164 NII files representing female participants, offering a balanced view across genders. These NII files undergo transformation into 2D MRI slices, resulting in a set of 102 images per participant. This results in a dataset that includes 2544 images from the ASD cohort and 2688 images from the TD cohort, establishing a strong foundation for detailed neuroimaging investigations and analysis. The dataset is divided into three parts: 80% for training, 10% for testing, and 10% for validation purposes.

4.2 Environment Setup and Hyperparameter Tuning

In this research work, we utilized a 32GB NVIDIA T4 GPU, 4 AMD vCPUs, and 32GB of RAM for image processing, feature extraction and model training. The extracted features were then trained in a quantum simulator using Amazon Sagemaker Studio Lab that uses Tesla T4 GPU and 16GB RAM for classification tasks. To improve the model's accuracy, adjustments were made to hyperparameters like learning rate, Weight decay, batch size, dropout rates, epochs, Patch Size and Window Size. Proper hyperparameter tuning provides a range of advantages, notably enhancing model accuracy, improving generalization to new data, accelerating convergence during training, bolstering robustness to input data variability, and optimizing resource utilization.

Table 1, shows the optimal hyperparameters used for training the model.

Table 1

Optimal hyperparameters values for model training.
Parameter	Search Space	Optimal Value
Learning Rate	[0.001,0.0001]	0.001
Weight Decay	[0,0.001,0.0001]	0.001
Dropout	[0.25–0.35]	0.30
Epochs	50 to 70	50
Window Size	2 to 10	7
Patch Size	2 to 5	4
Batch Size	128 or 256	128

4.3 Ablation Studies

This section discusses the results obtained from each track of deep learning model that includes the performance metrics, loss accuracy plots and details about hyperparameters tuned.

4.3.1 Analysis of the Swin Transformer track

The Swin Transformers exhibit tremendous potential in analyzing high-resolution MRI scans for ASD classification. Their attention mechanism efficiently captures long-range dependencies, crucial for understanding complex relationships in MRI data. One notable advantage is their scalability, handling inputs of varying sizes without a substantial increase in computational complexity. This scalability is particularly valuable in MRI analysis, where a detailed understanding of brain region relationships is paramount. The Swin Transformer's shifted window approach is instrumental in capturing subtle correlations between brain structures, a task often challenging for traditional CNNs with limited receptive fields. This ability to discern nuanced patterns contributes significantly to its effectiveness in processing MRI scans, offering a deeper insight into the intricate neural connections relevant to ASD classification. The Swin Transformer model is trained for 50 epochs as shown in Fig. 7, using the AdamW optimizer. The performance metrics are shown in Table 2 respectively. The average accuracy, precision, recall, and F1 score are 90.17%, 90.18%, 90.85%, and 90.27%, respectively.

Table 2

Performance metrics for the swin transformer architecture.
Class	Precision (%)	Recall (%)	F1-Score (%)
ASD	94.12	86.78	90.01
TD	90.34	96.89	93.56
Overall	90.18	90.85	90.27

4.3.2 Analysis of the Custom Convolutional Neural Network track

CNNs are designed to capture spatial hierarchies in data. In MRI scans, this translates to identifying features at different levels of granularity, such as edges, textures, and complex structures within the brain. CNNs can not only detect features but also localize them within the image. This localization information is valuable in understanding the spatial distribution of brain abnormalities or characteristics relevant to diagnosis. The model is trained for 25 epochs and the optimizer used is Adam, as shown in Fig. 8. The performance metrics are shown in Table 3 respectively. The average accuracy, precision, recall, and F1 score are 95.12%, 95.50%, 96.34% and 95.88%.

Table 3

Performance metrics for the Custom CNN architecture.
Class	Precision (%)	Recall (%)	F1-Score (%)
ASD	92.03	98.14	95.66
TD	98.12	94.97	96.23
Overall	95.50	96.34	95.88

4.3.3 Analysis of the proposed Deep learning track

The features extracted from Swin transformer and the features extracted from Custom CNN these features are fed to a MLP Classifier for 10 epochs, using Adam optimizer of learning rate 0.001, with batch size of 32, getting a test accuracy of 98.7%. The performance metrics are shown in Table 4 respectively. The average precision, recall, and F1 score are 98.12% 98.77% and 98.65%. The loss and accuracy plot is shown in Fig. 9.

Table 4

Performance metrics for the MLP classifier.
Class	Precision in (%)	Recall in (%)	F1-Score in (%)
ASD	98.88	99.30	98.29
TD	99.34	98.54	98.09
Overall	98.12	98.77	98.65

4.3.4 Analysis of the Quantum track

The feature extraction process involves two distinct models: a Swin Transformer that generates features and a Custom CNN. To harness the power of quantum computing, these high-dimensional feature sets are first reduced to a dimension of 16 features before being utilized by Quantum Support Vector Machine (SVM) models, the reduced feature data is encoded into quantum bits (qubits) and processed on a quantum simulator. This quantum representation of the data is then fed into a Quantum SVM model. The notable success of this methodology is reflected in the achieved classification accuracy of 96% on the test data. The performance metrics and accuracy for different kernels are shown in Table 5 and Table 6 respectively. Additionally, the average precision, recall, and F1 score stand at an impressive 96.66%, 96.12% and 96.88% showcasing the robustness and effectiveness of these quantum-enhanced models in accurately capturing intricate patterns within complex datasets.

Table 5

Performance metrics for QSVM.
Class	Precision in (%)	Recall in (%)	F1-Score in (%)
ASD	97.33	97.09	97.11
TD	94.26	94.89	94.32
Overall	96.66	96.12	96.88

Table 6

Accuracy for different kernels
QSVM Kernel	Accuracy in (%)
Linear	96
Polynomial	96
Radial Basis Function	94
Sigmoid	92

4.4 Effect of cross validation

To assess the reliability of the proposed network across different samples, we utilized a four-fold cross-validation methodology. Table 7 showcases the results obtained from proposed network using this four-fold cross-validation technique.

Table 7

Evaluation metrics of the proposed model on the validation set for each fold.
Folds	Accuracy (%)	Precision (%)	Recall (%)	F1 Score (%)
Fold 1	98.4	98.3	98	98.5
Fold 2	99.04	99.5	99.4	98.6
Fold 3	98.75	99.6	99.7	99.9
Fold 4	99.04	98.2	99.6	99
Average	98.8	98.97	99.17	99
Standard Deviation	0.26	0.651	0.68	0.55

4.5 Comparitive analysis with state-of-the-art networks

In the analysis comparing state-of-the-art networks for the classification of MRI images from the ABIDE I dataset, the proposed hybrid model combining Swin Transformers and Custom CNNs stands out with the highest accuracy of 98.70%. This accuracy surpasses all other models, including InceptionV3 with 95.96% accuracy, the second-highest, and ResNet50 with the lowest accuracy of 72.24%, as shown in Table 8. The superior performance of the proposed model can be attributed to several factors. Firstly, the hybrid nature of the model allows for leveraging the strengths of both Swin Transformers and Custom CNNs, which are known for their effectiveness in handling complex spatial and temporal features in images. Swin Transformers excel in capturing long-range dependencies, while Custom CNNs are adept at extracting detailed local features. This combination results in a more comprehensive feature representation for the MRI images, enhancing the model's ability to discriminate between different classes effectively.

Table 8

Assessing the performance of the proposed method with existing works.
S.No	Model Trained	Accuracy in (%)
1.	VGG16	92.96
2.	VGG19	95.12
3.	ResNet50	72.24
4.	ResNet101	74.12
5.	ResNet152	75.17
6.	InceptionV3	95.96
7.	DenseNet121	89.85
8.	DenseNet169	90.54
9.	MobileNetV1	93.60
10.	MobileNetV2	93.98
11.	MobileNetV3	93.78
12.	Proposed Model	98.70

4.6 Performance comparison with existing works

The research landscape in autism spectrum disorder (ASD) classification from brain imaging data is vibrant, with several recent papers showcasing innovative approaches. One study introduces an innovative approach that combines deep feature selection (DFS) with graph convolutional networks (GCNs) to detect crucial functional connections (FCs) in the cerebral hemisphere of individuals with Autism Spectrum Disorder (ASD). By weighting and selecting FC features through DFS and leveraging phenotypic information, a GCN is constructed to classify ASD and typically developed controls. This method, tested on the ABIDE dataset, achieves an accuracy of 79.5% and an area under the curve (AUC) of 0.85, outperforming existing methods. Another study focuses on deep learning algorithms applied to large brain imaging datasets like ABIDE, aiming to identify ASD participants solely based on brain activation patterns. Their deep learning model achieves 70% accuracy in distinguishing ASD from control participants, highlighting disruptions in anterior-posterior brain connectivity in ASD. Additionally, a study explores an ensemble deep learning model for quality assessment of brain MRI images, achieving high accuracy and sensitivity in evaluating image quality, critical for reliable analysis in ASD research. These studies collectively contribute to advancing the field's understanding of ASD classification, leveraging deep learning techniques and large-scale datasets for improved accuracy and reliability in diagnosis and research. From the Table 9, it was inferred that the proposed model outperformed the existing networks with an accuracy of 98.70%.

Table 9

Comparative Performance Analysis of Proposed Method and Existing Works.
Source	Method	Accuracy (%)
Shao et al.[35]	Deep Feature Selection (DFS) and Graph Convolutional Networks (GCNs).	79.5
Heinsfeld et al.[36]	The study utilizes Functional Connectivity features of ROI, which is classified using DNN,SVM,RF classifier.	70
Sujit et al.[37]	The study used Ensemble Deep learning model, containing axial, coronal, and sagittal images trained on 3 different DCNN models.	84
Benabdallah et al.[38]	The study utilized 3D connectivity matrices and deep learning models to extract features for accurate autism detection.	80
Aghdam et al.[39]	The study used advanced MRI techniques and DBN deep learning to classify ASDs from TCs using ABIDE I and II datasets, focusing on rs-fMRI, GM, and WM data.	65.56
Source	Method	Accuracy (%)
Yin et al.[40]	The workflow involves using an autoencoder (AE) to learn advanced features from raw features. Then, a deep neural network (DNN) is trained using these advanced features.	76.2
Eslami et al.[41]	The study involves using hybrid neural network that comprises of Auto Encoder and Single Layer Perceptron to perform classification.	82
Proposed Work	Autism classification using hybrid network of Custom CNN with Swin Transformer	98.70

The datasets from ABIDE I was used in this investigation. On the other hand, feature representation would be more thorough if more data sources were integrated. As a result, the main goal of our upcoming work is to expand the study's data set and diversity the sample in order to produce more reliable results.

The Swin transformer and custom CNN model was trained to get a accuracy of 90.17 and 95.12. The hybrid model with Quantum Support vector machine was utilized and trained with the features extracted from the DL models. This model is constrained by the capacity to operate with only 16 qubits on the qasm simulator. Due to this qubit limitation, the features extracted from the Swin Transformer and Custom CNN models had to be downgraded to 16 features before being fed into the QSVM classifier. The restriction on qubit availability has impacted the model's ability to fully leverage the complexity of the feature space, potentially resulting in a reduction of the classification accuracy to 96%, compared to the classical hybrid deep learning model that achieved an accuracy of 98.7%. Future research endeavors could explore alternative quantum computing platforms or techniques to overcome the qubit limitation and enhance the model's capacity to handle larger feature sets, thereby potentially improving classification performance.

A novel hybrid model with deep learning and quantum machine learning is proposed in this reseach. ASD detection involves physical observation by doctors by taking various behavioural test, evaluating the patients social interaction, communication skills, these evaluations are time consuming and needs well trained medical experts. In order to fasten the diagnosis of Autism a novel dual track method was proposed where MRI images are given as input to two tracks. The first track is with Swin transformer network, that captures the global features in the mri image through self-attention mechanism and Hierarchical Processing, the second track, custom CNN model captures the local features, thus the entire model increases the ability to capture features from both the tracks are fused to improve the accuracy further, leading to an accuracy of 98.7%.

Further, the features extracted by the dual block feature extractor is fed to a Quantum SVM classifier which achieves a classification accuracy of 96%. This variation in accuracy can be due to the reduced feature set of 16 qubits, resulting in a loss of information compared to the feature-rich approach of the MLP classifier. The classification accuracy of Quantum SVM can be improved if more qubits can be used in the quantum simulator. In summary, the findings highlights the strengths of hybrid model with deep learning and quantum computing in ASD classification. Our hybrid approach shows potential in understanding complex neurological conditions and improving accuracy.

Author Contribution

The manuscript was conceptualized by M.R., with M.R. and K.R. developing the methodology flow. Data acquisition and curation were performed by S.S., while code implementation was carried out by A.P., B.S., and A.J. Manuscript writing was led by A.P., B.S., A.J., and S.S., All authors reviewed the manuscript.

Al-Dewik N, Al-Jurf R, Styles M, et al.: Overview and Introduction to Autism Spectrum Disorder (ASD). Adv Neurobiol. 24, 3-42 (2020). https://doi.org/10.1007/978-3-030-30402-7_1.
Maenner MJ, Warren Z, Williams AR, et al.: Prevalence and Characteristics of Autism Spectrum Disorder Among Children Aged 8 Years — Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2020. MMWR Surveill Summ 72(No. SS-2):1–14, (2023). http://dx.doi.org/10.15585/mmwr.ss7202a1.
Hodges, H., Fealko, C., Soares, N: Autism spectrum disorder: definition, epidemiology, causes, and clinical evaluation. Transl Pediatr. 9(Suppl 1):S55-S65 (2020). http://dx.doi.org 10.21037/tp.2019.09.09
Lordan, R., Storni, C., De Benedictis, C.A.: Autism Spectrum Disorders: Diagnosis and Treatment. In: Grabrucker AM, editor. Autism Spectrum Disorders [Internet]. Brisbane (AU): Exon Publications; 2021 Aug 20. Chapter 2. Available from: https://www.ncbi.nlm.nih.gov/books/NBK573609/ doi:10.36255/exonpublications.autismspectrumdisorders.2021.diagnosis
Robins, D.L., Fein, D., Barton, M.: The modified checklist for autism in toddlers, revised with follow-up (M-CHAT-R/F). Pediatrics. 133(1), 37-45 (2014). https://doi.org/10.1542/peds.2013-1813
McCarty, P., Frye, R.E.: Early Detection and Diagnosis of Autism Spectrum Disorder: Why Is It So Difficult?. Seminars in Pediatric Neurology, 35, 100831 (2020). https://doi.org/10.1016/j.spen.2020.100831.
Hus, Y., Segal, O.: Challenges Surrounding the Diagnosis of Autism in Children. Neuropsychiatr Dis Treat. 17, 3509-3529 (2021). https://doi.org/ 10.2147/NDT.S282569.
Xu, M., Calhoun, V., Jiang, R., Yan, W., Sui, J.: Brain imaging-based machine learning in autism spectrum disorder: methods and applications. In Journal of Neuroscience Methods 361, 109271 (2021). https://doi.org/10.1016/j.jneumeth.2021.109271
Liu, M., Li, B., Hu, D.: Autism Spectrum Disorder Studies Using fMRI Data and Machine Learning: A Review. In Frontiers in Neuroscience 15, 697870 (2021). https://doi.org/10.3389/fnins.2021.697870
Haweel, R., Dekhil, O., Shalaby, A., Mahmoud, A., Ghazal, M., Keynton, R., Barnes, G., El-Baz, A.: A Machine Learning Approach for Grading Autism Severity Levels Using Task-based Functional MRI. In 2019 IEEE International Conference on Imaging Systems and Techniques (IST), 1-5. IEEE (2019).
Kazeminejad, A., Sotero, R.C.: Topological properties of resting-state FMRI functional networks improve machine learning-based autism classification. Frontiers in Neuroscience, 12, 414728 (2019). https://doi.org/10.3389/fnins.2018.01018
Sharif, H., Khan, R.A.: A Novel Machine Learning Based Framework for Detection of Autism Spectrum Disorder (ASD). Applied Artificial Intelligence, 36(1), 2004655 (2022). https://doi.org/10.1080/08839514.2021.2004655
Abdullah, A.A., Rijal, S., Dash, S.R.: Evaluation on Machine Learning Algorithms for Classification of Autism Spectrum Disorder (ASD). Journal of Physics: Conference Series, 1372(1), 012052. IOP Publishing (2019). https://doi.org/10.1088/1742-6596/1372/1/012052
Chaitra, N., Vijaya, P.A., Deshpande, G.: Diagnostic prediction of autism spectrum disorder using complex network measures in a machine learning framework. Biomedical Signal Processing and Control, 62, 102099 (2020). https://doi.org/10.1016/j.bspc.2020.102099
Yassin, W., Nakatani, H., Zhu, Y., Kojima, M., Owada, K., Kuwabara, H., Gonoi, W., Aoki, Y., Takao, H., Natsubori, T., Iwashiro, N., Kasai, K., Kano, Y., Abe, O., Yamasue, H., Koike, S.: Machine-learning classification using neuroimaging data in schizophrenia, autism, ultra-high risk and first-episode psychosis. Translational Psychiatry. 10(1), 278 (2020). https://doi.org/10.1038/s41398-020-00965-5
Eslami, T., Almuqhim, F., Raiker, J.S., Saeed, F.: Machine Learning Methods for Diagnosing Autism Spectrum Disorder and Attention- Deficit/Hyperactivity Disorder Using Functional and Structural MRI: A Survey. Frontiers in Neuroinformatics 14, 575999 (2021). https://doi.org/10.3389/fninf.2020.575999
Ahmad, I., Rashid, J., Faheem, M., Akram, A., Khan, N.A., Amin, R.: Autism spectrum disorder detection using facial images: a performance comparison of pretrained convolutional neural networks. Healthc. Technol. Lett. 1–13 (2024). https://doi.org/10.1049/htl2.12073
Kanimozhi A., Dhanasri A.: Autism Spectrum Disorder Prediction by Facial Recognition Using Deep Learning. International Journal of Creative Research Thoughts (IJCRT). 12(2), d759-d766 (2024). http://www.ijcrt.org/papers/IJCRT2402442.pdf
Reddy, P.: Diagnosis of Autism in Children Using Deep Learning Techniques by Analyzing Facial Features. Engineering Proceedings, 59(1), 198 (2024). https://doi.org/10.3390/engproc2023059198
Sherkatghanad, Z., Akhondzadeh, M., Salari, S., Zomorodi-Moghadam, M., Abdar, M., Acharya, U.R., Khosrowabadi, R. Salari, V. Automated Detection of Autism Spectrum Disorder Using a Convolutional Neural Network. . Frontiers in neuroscience, 13, 482737 (2020). https://doi.org/10.3389/fnins.2019.01325
Rabbi, M.F., Hasan, S.M., Champa, A.I. and Zaman, M.A..: A Convolutional Neural Network Model for Early-Stage Detection of Autism Spectrum Disorder, 2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD). Dhaka, Bangladesh, 110-114 (2021), https://doi.org/10.1109/ICICT4SD50815.2021.9397020.
Kshatri, S.S., Singh, D.: Convolutional Neural Network in Medical Image Analysis: A Review. Arch Computat Methods Eng. 30, 2793–2810 (2023). https://doi.org/10.1007/s11831-023-09898-w
Jiang W., Liu S., Zhang H., Sun X., Wang S.H., Zhao J., Yan J., CNNG.: A Convolutional Neural Networks With Gated Recurrent Units for Autism Spectrum Disorder Classification. Frontiers in Aging Neuroscience, 14, 948704 (2022). https://doi.org/10.3389/fnagi.2022.948704.
Lin, W., Tong, T., Gao, Q., Guo, D., Du, X., Yang, Y., Guo, G., Xiao, M., Du, M., Qu, X., Alzheimer’s Disease Neuroimaging Initiative.: Convolutional Neural Networks-Based MRI Image Analysis for the Alzheimer’s Disease Prediction From Mild Cognitive Impairment. Front. in Neurosci. 12, 777 (2018). https://doi.org/10.3389/fnins.2018.00777
Xin, M., Wang, Y.: Research on image classification model based on deep convolution neural network. EURASIP Journal on Image and Video Processing. 2019(1), 1-11 (2019). https://doi.org/10.1186/s13640-019-0417-8
Chen, L., Li, S., Bai, Q., Yang, J., Jiang, S., Miao, Y.: Review of Image Classification Algorithms Based on Convolutional Neural Networks Remote Sensing. 13(22), 4712 (2021). https://doi.org/10.3390/rs13224712
Hossain, Md.A., Alam Sajib, Md.S.: Classification of Image using Convolutional Neural Network (CNN). Global Journal of Computer Science and Technology. 19(D2), 13-18 (2019). https://doi.org/10.34257/gjcstdvol19is2pg13
Huang, Y., Li, W.: Resizer Swin Transformer-Based Classification Using sMRI for Alzheimer’s Disease. Applied Sciences. 13(16), 9310 (2023). https://doi.org/10.3390/app13169310.
Illakiya, T., Karthik, R.: A Dimension Centric Proximate Attention Network and Swin Transformer for Age-Based Classification of Mild Cognitive Impairment From Brain MRI. IEEE Access. 11, 128018-128031 (2023). doi: 10.1109/ACCESS.2023.3332122.
Asiri, A.A., Shaf, A., Ali, T., Pasha, M.A., Khan, A., Irfan, M., Alqahtani, S., Alghamdi, A., Alghamdi, A.H., Alshamrani, A.F.A., Alelyani, M., Alamri, S.: Advancing brain tumor detection: harnessing the Swin Transformer's power for accurate classification and performance analysis. PeerJ Computer Science. 10, e1867 (2024). doi: 10.7717/peerj-cs.1867
Kumar, S., Sharma, S.: A hybrid deep model with concatenating framework of convolutional neural networks for identification of autism spectrum disorder. In Enabling Technology for Neurodevelopmental Disorders. 230-239. Routledge (2022). https://doi.org/10.4324/9781003165569-15
Ulaganathan, S., Ramkumar, M.P., Emil Selvan, G.S.R., Priya, C.: Spinalnet-deep Q network with hybrid optimization for detecting autism spectrum disorder. Signal, Image and Video Processing, 17(8), 4305-4317 (2023). https://doi.org/10.1007/s11760-023-02663-3
Jain, S., Tripathy, H.K., Mallik, S., Qin, H., Shaalan, Y., Shaalan, K.: Autism Detection of MRI Brain Images Using Hybrid Deep CNN With DM-Resnet Classifier. IEEE Access. 11 (2023). https://doi.org/10.1109/ACCESS.2023.3325701
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 9992-10002 (2021).
Shao, L., Fu, C., You, Y., Fu, D.: Classification of ASD based on fMRI data with deep learning. Cognitive Neurodynamics. 15(6), 961-974. (2021). https://doi.org/10.1007/s11571-021-09683-0
Heinsfeld, A.S., Franco, A.R., Craddock, R.C., Buchweitz, A., Meneguzzi, F.: Identification of autism spectrum disorder using deep learning and the ABIDE dataset. NeuroImage: Clinical. 17, 16-23 (2018). https://doi.org/10.1016/j.nicl.2017.08.017.
Sujit, S.J., Coronado, I., Kamali, A., Narayana, P.A., Gabr, R.E:. Automated image quality evaluation of structural brain MRI using an ensemble of deep learning networks. Journal of Magnetic Resonance Imaging. 50(4), 1260–1267 (2019). https://doi.org/10.1002/jmri.26693
Benabdallah, F.Z., el Maliani, A.D., el Hassouni, M.: A CNN Based 3D Connectivity Matrices Features for Autism Detection: Application on ABIDE I. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12845 LNCS. (2021). https://doi.org/10.1007/978-3-030-86356-2_24
Akhavan Aghdam, M., Sharifi, A., Pedram, M.M.: Combination of rs-fMRI and sMRI Data to Discriminate Autism Spectrum Disorders in Young Children Using Deep Belief Network. Journal of Digital Imaging. 31(6), 895-903 (2018). https://doi.org/10.1007/s10278-018-0093-8
Yin, W., Mostafa, S., Wu, F.X.: Diagnosis of Autism Spectrum Disorder Based on Functional Brain Networks with Deep Learning. Journal of Computational Biology. 28(2), 146-165 (2021). https://doi.org/10.1089/cmb.2020.0252
Eslami, T., Mirjalili, V., Fong, A., Laird, A.R., Saeed, F.:. ASD-DiagNet: A Hybrid Learning Approach for Detection of Autism Spectrum Disorder Using fMRI Data. Frontiers in Neuroinformatics. 13, 70 (2019). https://doi.org/10.3389/fninf.2019.00070

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

A Dual Track Feature Fusion Network for ASD Detection using Swin Transformers and Convolutional Neural Network

Status:

Version 1

Abstract

Figures

1 Introduction

2 Related Works

2.1. Machine Learning-based Methods

2.2 Deep Learning-based Methods

3 Proposed System

3.1 Overview of the proposed system

3.2 Swin Transformer

3.3 Custom CNN

3.4 Quantum SVM

4 Results and Discussion

4.1 Dataset Description

4.2 Environment Setup and Hyperparameter Tuning

4.3 Ablation Studies

4.3.1 Analysis of the Swin Transformer track

4.3.2 Analysis of the Custom Convolutional Neural Network track

4.3.3 Analysis of the proposed Deep learning track

4.3.4 Analysis of the Quantum track

4.4 Effect of cross validation

4.5 Comparitive analysis with state-of-the-art networks

4.6 Performance comparison with existing works

5 Limitations and Future works

6 Conclusion

Declarations

Author Contribution

References

Additional Declarations

Status:

Version 1