Detection of an Autism EEG Signature Through a New Processing Method Based on a Topological Approach

doi:10.21203/rs.3.rs-878499/v1

Download PDF

Research Article

Detection of an Autism EEG Signature Through a New Processing Method Based on a Topological Approach

https://doi.org/10.21203/rs.3.rs-878499/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

A new pre-processing approach of EEG data to detect topological EEG features has been applied to a continuous segment of artifact-free EEG data lasting 10 minutes in ASCII format derived from 50 ASD children and 50 children with other Neuro-psychiatric disorders, matched for age and male/female ratios.

Each EEG was manipulated using a Cin-Cin algorithm, based on an input vector characterized by a linear composition of city-block matrix distances among19 electrodes. From the resulting triangular matrix of 171 numbers expressing all of the one-by-one distances among the 19 electrodes a minimum spanning tree(MST) is calculated. Electrode identification serial codes, sorted according to the decreasing number of links in MST, and the number of links in MST are taken as input vectors for machine learning systems. With this method all the content of an EEG is transformed in 38 numbers which represent the input vectors for machine learning systems classifiers.

Machine learning systems have been applied to build up a predictive model to distinguish between the two diagnostic classes.

The best machine learning system (KNN algorithm) obtained a global accuracy of 93.2% (92.37 % sensitivity and 94.03 % specificity) in differentiating ASD subjects from NPD subjects.

In conclusion the results obtained in this study suggest that the two new pre-processing methods introduced, in particular the MST algorithm, have great potential to allow a machine learning system to discriminate EEGs obtained from subjects with autism from EEGs obtained from subjects affected by other psychiatric disorders.

Neurology

EEG data

triangular matrix

NPD subjects

minimum spanning tree(MST)

KNN algorithm.

Many different mathematical approaches have been tested in the last few years to disentangle the EEG data complexity and determine if it is possible to distinguish children with ASD from typically developing children or children with other neuropsychiatric disorders. An electroencephalogram (EEG) records the electrical activity of the brain by recording the electrical impulses of different frequencies used by neurons for communications through electrodes attached to the scalp. The relevant involvement of the cerebral cortex in substantially altering cortical circuitry explains the unique pattern of deficits and strengths that characterize cognitive functioning. Therefore, EEG recordings can be potential biomarkers of these abnormalities. EEG signals are random, non-stationary, and non-linear. The most delicate phase in the overall EEG process is the preprocessing phase, which aims to extract relevant features that are offered to potent classifiers, generally based on machine learning techniques.

The native EEG signal contains noise due to various factors such as involuntary hand and eye movement or heartbeat interference [1]. These interferences increase the complexity of EEG signal processing and make the quality of mathematical calculations unstable in the later stages of processing, and must, therefore, be eliminated before analysis. A good preprocessing will also reduce the cardinality of the input vectors for machine learning systems, reducing the computation time and the risks of overtraining. As mentioned in a recent review (2), many different pre-processing methods have been described in the literature as Common Spatial Patterns (CSP), Principal Component Analysis (PCA) [1], Common Average Referencing (CAR) [1][3], Surface Laplacian (SL), adaptive filtering [1][4], Independent Component Analysis (ICA) and digital filter [5], MS-ROM IFAST (6). Each method has advantages and disadvantages. PCA, for example, is a potent dimensionality reduction technology but involves discarding non-principal components with small variance, which could potentially contain useful information (7). Digital filters process EEG signals from the frequency domain and are broadly utilized in artifact processing of EEG signals; however, it is required that EEG signals and artifact signals have different frequency bands, which rarely exist in practical situations. Our group has proposed a new technique artificial neural networks based called MS-ROM / I-FAST system to extract desired features from EEG to achieve the differential diagnosis of children with autism and achieve valid results (6). The data assessment only requires a few minutes of EEG data collection and does not require any data preprocessing. The drawback of this approach is the large computational time required to achieve the final task.

In this paper, we present an alternative pre-processing approach of EEG data based on a novel algorithm applied to raw data to detect topological EEG features. Our assumption is that brain connection abnormalities can be detected through a specific mathematical topological approach, which is able to compare the minimal structure of functional networks beneath scalp electrodes. Additionally, functional interconnections of different brain areas can be assessed by measuring the interdependence of time-series electrical signals recorded by scalp electrodes using distance functions (i.e., the Euclidean distance, the Manhattan distance, the Minkowski distance, the Cosine similarity, etc.)There are many clustering methods available, such as Principal Component Analysis, Hierarchical agglomerative clustering, Nearest-neighbor test, autocorrelation, Cuzick-and-Edwards’. In our study, we have decided to rely on the minimum spanning tree (MST) algorithm as a base to perform electrodes clustering. A minimum spanning tree (MST) is a spanning tree of a connected, undirected graph. It connects all the vertices together with the minimal total weighting for its edges.

The MST algorithm described originally by the Czech scientist, Otakar Boruvka, in 1926, aims to optimize the planning of electrical connections among cities and later on refined by Kruskal’s with a specific deterministic algorithm.

A MST is a spanning tree with weight less than or equal to the weight of every other spanning tree. In practical terms, MST shows the best way to connect the variables in a tree and the shortest possible combination allowing the presentation of the data in a simplified graph.

In the bio-medical field, the MST has been used particularly in microarray clustering. Although MST-based clustering is formally equivalent to the dendrograms produced by hierarchical clustering under certain conditions, visually they can be extremely different. Our assumption is that MST is a valuable approach to synthetize the interconnection scheme of time-series electrical signals recorded by scalp electrodes which are expected to be different in subjects with autism in comparison with those affected by other disease. The main advantage of MST algorithm is that it gives a synthetic view of the variable ensemble and allows an easy understanding of clustering through links that directly connect variables that are very close to each other. The importance of the variables in the graph is related to the number of links. Hubs may be defined as the variables with the maximum number of connections in the graph.

To prove this hypothesis the EEG data of fifty subjects with autism and 50 subjects with other neuropsychiatric disorders have been pre-processed with MST. Machine learning systems have been applied subsequently to build up a predictive model to distinguish between the two diagnostic classes.

50 subjects diagnosed with ASD and 50 control subjects that were diagnosed with other neuropsychiatric disorders, matched for age and gender, were obtained from a clinical archive in the United States. Both groups had the same age range (4–10 years) and the same gender distribution (m = 39, f = 11). None of the subjects were affected by genetic conditions, cerebral malformations, or epilepsy. In the control group, the range of primary diagnoses were ADHD (n = 41), mood disorders (n = 4), anxiety disorders (n = 16), sleep disorders (n = 12), ODD (n = 6), and TBI (n = 5).

Multiple diagnoses in control group:

ADHD: n = 41

ADHD & anxiety disorders: n = 11

ADHD & mood disorders: n = 4

ADHD & sleep disorders: n = 9

ADHD & ODD: n = 6

ADHD & TBI: n = 3

Methods

The EEG data were recorded at a psychiatric center in the US, at resting state, eyes-closed. EEG acquisition was performed using Mitsar-EEG-10/70–201 equipment, with impedance maintained below 10k ohm. The patients were seated in a slightly reclining chair in a silent and low light environment. An Electrocap was used to collect the data according to the international 10–20 system with linked ears montage (Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, and O2). A minimum of 20 minutes of total data were recorded in both eyes open (10 minutes) and eyes closed (10 minutes) resting conditions. The order of these could vary among patients. This study used only the eyes closed data to be consistent with our pilot study.

The EEG track was then saved in the database. Subsequently, ten minutes of recording were exported ASCII files through the same acquisition program, SystemPlus Evolution, and saved to make it possible to read in numerical format.

Preprocessing phase

In the common of EEG registration, there are 19 electrodes registering brain activity related to different brain cortex regions. It is reasonable to assume that these regions are interconnected to each other with a complete matrix of mutual relationships. Measuring the similarity between time series registered in a EEG is a way to establish how close parts of the brain under given electrodes are coherent to each other. There are many types of similarity measures. One of the most popular is Manhattan distance also known as city-block distance. City-block distance is so-named because it is the distance in blocks between any two points in a city (e.g., down 3 blocks and over 1 for a total of 4 blocks). This distance calculation has been applied to the 19 time-series from EEG electrode derivations. After the calculation, we can visualize the matrix in this way (Fig. 1).

Distance matrix of 19 EEG electrodes according to their Manhattan distance in an EEG of a study participant taken as example. Data processing through PST software, Semeion, Rome.

The cells contain values expressing time series distances relative to each channel’s couple according to a specific metric (Manhattan distance). The value in each cell is proportional to the distance between respective electrodes. Higher values indicate that two electrode time series are more distant, indicating that the two brain areas are more disconnected.

From each EEG Manhattan distance matrix, a MST has been derived.

The following two figures summarize the MST of nine EEG subjects with ASD and nine other subjects without ASD. To make computable the information contained in MST, the electrode’s names are numbered (Fp1 = 1, Fp2 = 2, F7 = 3, F3 = 4, Fz = 5, F4 = 6, F8 = 7, T3 = 8, C3 = 9, Cz = 10, C4 = 11, T4 = 12, T5 = 13, P3 = 14, Pz = 15, P4 = 16, T6 = 17, O1 = 18, O2 = 19).

The different electrodes are then listed according to their decreasing number of links in the Minimum Spanning Tree (see an example in supplementary materials). Joining together in a single raw the electrodes number and their interconnections in the map, all of the content of an EEG file is transformed in just 38 numbers. Therefore, it is now possible to apply machine learning systems to develop a classification model. MST algorithms have been applied independently to pre-process the 100 EEGs. Machine learning systems have been applied to build up a predictive model to distinguish between the two diagnostic classes.

Predictive modelling

The robust set of 38 features related to MST were used as input for Machine Learning classifiers. KNN algorithm was used to develop a predictive model to distinguish subjects belonging to the two diagnostic classes (autism vs other disorders). Models' performances were tested with training/testing cross-validation procedures.

Training-testing protocol

These classification tools were applied to predict the diagnostic class using the Training and Testing validation protocol, with the following steps:

1. Subdivision of the dataset into two sub-samples: A and B, containing 50% of records each and having an equal proportion of cases and controls. This is not obtained by random extraction, but the TWIST algorithm, provided by Semeion Research Centre, Rome, aims to create two subsamples with similar probability density for all the input variables. We performed a homogeneity check which confirmed the substantial equivalence of the two subsets with respect to the variable values distribution. In the first run, A is used as the Training Set and the B as the Testing Set.

2. Application of ANN on the Training Set. In this phase, the ANN learns to associate the input variables with those indicated as targets.

3. After the training phase, the weights matrix, produced by the algorithm, is saved and frozen together with all of the other parameters used for the training.

4. The Testing Set is then shown to a virgin twin (same architecture and base parameters) ANN with the same weights matrix of the trained ANN, acting as the final classifier. This operation takes place for all records and the results (right or wrong classification) are not communicated to the classifier. This allows us to assess the generalization ability of trained ANN.

5. In a second run, another virgin ANN is applied to subset B which is used as a training subset, and then to subset A which is used as a testing subset.

6. Therefore, the results are relevant to two sequences of training testing protocol: A-B and B-A.

Results are expressed in terms of sensitivity (correct classification of positive patients), specificity (correct classification of negative patients), global accuracy (arithmetic mean between sensitivity and specificity). Overall results are expressed as the average of the two experiments.

This crossover procedure allows us to blindly classify all records with the trained algorithm, ensuring the generalization capability of the model on records has never been seen before.

Natural clustering of records

The Pick and Squash Tracking (PST), an unsupervised machine learning system developed at Semeion Research Centre based on an evolutionary algorithm GenD [8] has been used to cluster records according to the features selected by the TWIST system. Such a system can find the best spatial distribution of a given number of points with respect to the maximum degree of their reciprocal Euclidean distances without exploring all the possible combinations, but adaptively evolving through the optimal solution.

PST system locates the points of the dataset onto a 2D space minimizing the projection error, thus, the original distances between the points suffer only minimal distortions. The algorithm is particularly useful when the matrix distance of the point of interest is imprecise, for different reasons, and consequently, the map doesn't correspond precisely to the reality.

The PST algorithm carries out a multidimensional scaling from an N-dimensional to an L-dimensional space (where N > > L) and typically where L = 2, or L = 3. PST acts in this dimensional reduction to ensure that the original distance between points has a minimal amount of distortion in the L- dimensional space.

Acting on the features related to MST, KNN algorithm reached the best predictive capability in distinguishing autistic cases from NPD subjects with an overall accuracy of 93% (Table 1).

Table 1. Predictive performance of machine learning systems

The natural clustering of subjects with the PST system allowed an almost perfect separation of records according to their diagnostic classes (Fig. 4).

Several papers have been published recently using EEG data processed by advanced mathematical techniques (often based on machining learning) to distinguish children with autism from typically developing children.

Table 2 summarizes the studies published in articles of international journals or congress proceedings.

Almost all studies have employed machine learning systems acting as classifiers after suitable data preprocessing. Among the preprocessing methods, the most prevalent appears to be discrete wavelet transform followed by Fast Fourier Transform.

Table 2. Summary of published studies on autism diagnosis through digital EEG

Table legend. DWT = discrete wavelet transform; AOI &MRMR = area of interest & minimum redundancy/maximum relevance; STFT = Short time Fourier transform EMD = empirical mode decomposition FFT = Fast Fourier Transform; MS-ROM/IFAST = Multi-Scale Ranked Organizing Map/Implicit Function As Squashing Time; MST = Minimum spanning tree

In our study, minimum spanning tree has been employed on the electrodes distance matrix as a robust pre-processing method representing a novel application of this technique in biomedical field.

As happens in variables clustering efforts, MST captures the implicit complexity of a data set and returns a synthetic representation of it, while still retaining its complexity. When processing EEG data, it is very important to avoid overwhelming the machine learning system with extraneous unimportant data. Data which does not contain pertinent information, when inserted in the model, can cause an increase of the noise and therefore a greater difficulty for the machine learning systems to correctly generalize new cases not seen during training phase. The results obtained are promising and introduce a new philosophy in handling this kind of data.

Looking to Table 2, few studies have focused the distinction between autism from other neuropsychiatric disorders with a consistent sample size. From this point of view, this is the largest study published so far that aims to differential diagnosis, rather than simply distinguish children with autism from typically developing children. This is important because, in the real world, the application of these diagnostic techniques will take place only for subjects seeking medical care for some symptoms, rather than for simple screening.

Looking at Table 2, is quite clear that we are still in a research phase with proof-of-concept efforts. The next step is to validate these results in large cohorts with multicentric studies where clinicians employ different technical apparatus and different protocols to ensure that EEG data processing methods are robust enough to resist to a certain degree of heterogeneity.

Further studies with more robust data and less potential bias are probably required.

Research in this area is vital to the well-being of those diagnosed with ASD. There are many disorders, such as epilepsy, that are commonly misdiagnosed as ASD. Because of this, those with misdiagnoses, especially children, tend to be prescribed medications that worsen their symptoms (26). By adding a biological basis to the diagnosis of ASD through recognition of specified EEG patterns, we can minimize the misdiagnosis of certain neuropsychiatric disorders.

There is also the need to increase adaptability in the systems, enabling the incorporation of new medical knowledge as new technology appears. A further step will be to engineer machine learning systems to make them work automatically on commercial EEG machines, with the intervention of EEG companies able to embed these trained systems in their technical devices.

Disclosure of potential conflicts of interest : the authors declare to have no conflict of interest to disclose.
Research involving Human Participants and/or Animals: The data were collected over a 5-year period for those referred for an EEG assessment. The data was submitted to an institutional review board and granted a “waiver of approval,” meeting the exemption categories setforth by federal regulation 45 CFR 46.101(b)
Informed consent: an informed consent was collected from all human subjects participant to the study.

Lakshmi, M. R., Prasad D. T. V., & Prakash, D. V. C. Survey on EEG signal processing methods. International Journal of Advanced Research in Computer Science and Software Engineering, 2014, 4, 84-91.

Xie, Y. & Oniga, S. A review of processing methods and classification algorithm for EEG signal. Carpathian Journal of Electronic and Computer Engineering, 2020, 13(1), 23-29.

Alhaddad, M. J. Common average reference (CAR) improves P300 speller. International Journal of Engineering and Technology, 2012, 2, 451-489.

Ahirwal, M. K., Kumar, A., & Singh, G. K. (2014). Adaptive filtering of EEG/ERP through bounded range artificial bee colony (BR-ABC) algorithm. Digital Signal Processing, 25(2), 164-172.

Suto, J. & Oniga, S. Music stimuli recognition in electroencephalogram signal. Elektronika ir Elektrorechnika, 2018, 24(4), 68-71. http://dx.doi.org/10.5755/j01.eie.24.4.21482

Grossi, E., Olivieri, C., & Buscema, M. Diagnosis of autism through EEG processed by advanced computational algorithms: A pilot study. Computer Methods and Programs in Biomedicine, 2017 142, 73-79.

Liu, T. & Yao, D. Removal of the ocular artifacts from EEG data using a cascaded Spatio-temporal processing. Computer methods and programs in biomedicine, 2006, 83, 95–103.

Buscema, M. & Terzi, S. PST: An Evolutionary Approach to the Problem of Multi-Dimensional Scaling. WSEAS Transactions on Information Science and Applications, 2006, 3(9), 1704-1710.

Ahmadlou, M., Adeli, H., & Adeli, A. Fractality and a wavelet-chaos-neural network methodology for EEG-based diagnosis of autistic spectrum disorder. J Clin Neurophysiol. 2010, 27(5): 328-33. doi: 10.1097/WNP.0b013e3181f40dc8. PMID: 20844443.

Bosl, W., Tierney, A., Tager-Flusberg, H., & Nelson, C. EEG complexity as a biomarker for autism spectrum disorder risk. BMC Med, 2011, 9, 18. doi: 10.1186/1741-7015-9-18. PMID: 21342500; PMCID: PMC3050760.

Ahmadlou, M., Adeli, H., & Adeli, A. Fuzzy Synchronization Likelihood-wavelet methodology for diagnosis of autism spectrum disorder. J Neurosci Methods, 2012, 211(2): 203-9. doi: 10.1016/j.jneumeth.2012.08.020. Epub 2012 Aug 28. PMID: 22968137.

Sheikhani, A., Behnam, H., Mohammadi, M. R., Noroozian, M., & Mohammadi, M. Detection abnormalities for diagnosing of children with autism disorders using of quantitative electroencephalography analysis. J Med Syst, 2012, 36(2): 957-63. doi: 10.1007/s10916-010-9560-6. Epub 2010 Aug 14. PMID: 20711644.

Jamal, W., Das, S., Oprescu, I. A., Maharatna, K., Apicella, F., & Sicca, F. Classification of autism spectrum disorder using supervised learning of brain connectivity measures extracted from synchrostates. J Neural Eng, 2014, 11(4): 046019. doi: 1088/1741-2560/11/4/046019. PMID: 24981017.

Alsaggaf1, E. A. & Kamel, M. I. Using EEGs to Diagnose Autism Disorder by Classification Algorithm. Life Science Journal, 2014, 11(6).

Cheong, L. C., Sudirman, R., & Hussin, S. S. Feature Extraction of EEG Signal Using Wavelet Transform for Autism Classification ARPN. Journal of Engineering and Applied Sciences, 2015, 10(19: 8533-8540.

Grossi, E., Olivieri, C., & Buscema, M. Diagnosis of autism through EEG processed by advanced computational algorithms: A pilot study. Comput Methods Programs Biomed, 2017, 142, 73-79. doi: 10.1016/j.cmpb.2017.02.002. 20.PMID: 28325448.

Djemal, R., AlSharabi, K., Ibrahim, S., & Alsuwailem, A. EEG-Based Computer Aided Diagnosis of Autism Spectrum Disorder Using Wavelet, Entropy, and ANN. Biomed Res Int., 2017, 9816591. doi: 10.1155/2017/9816591. PMID: 28484720; PMCID: PMC5412163.

Bosl, W. J., Tager-Flusberg, H., & Nelson, C. A. EEG Analytics for Early Detection of Autism Spectrum Disorder: A data-driven approach. Sci Rep., 2018, 8(1): 6828. doi: 10.1038/s41598-018-24318-x. PMID: 29717196; PMCID: PMC5931530.

Thapaliya, S., Jayarathna, S., & Jaime, M. Evaluating the EEG and Eye Movements for Autism Spectrum Disorder. IEEE International Conference on Big Data (Big Data), 2018, 2328- 2336.

Grossi, E., Buscema, M., Della-Torre, F., & Swatzyna, R. J. The "MS-ROM/IFAST" Model, a Novel Parallel Nonlinear EEG Analysis Technique, Distinguishes ASD Subjects From Children Affected With Other Neuropsychiatric Disorders With High Degree of Accuracy. Clin EEG Neurosci, 2019, 50(5): 319-331. doi: 10.1177/1550059419861007. PMID: 31296052.

Haputhanthri, D., Brihadiswaran, G., Gunathilaka, S., Meedeniya, D., Jayawardena, Y., Jayarathna, S., & Jaime, M. An EEG based Channel Optimized Classification Approach for Autism Spectrum Disorder. Moratuwa Engineering Research Conference (MERCon), 2019, 123–128. https://doi.org/10.1109/MERCon.2019.8818814

Hadoush, H., Alafeef, M., & Abdulhay, E. Automated identification for autism severity level: EEG analysis using empirical mode decomposition and second order difference plot. Behav Brain Res., 2019, 362, 240-248. doi: 10.1016/j.bbr.2019.01.018. PMID: 30641159.

Kang, J., Han, X., Song, J., Niu, Z., & Li, X. The identification of children with autism spectrum disorder by SVM approach on EEG and eye-tracking data. Comput Biol Med. 2020, 120, 103722. doi: 10.1016/j.compbiomed.2020.103722. PMID: 32250854

Grossi, E., Valbusa, G., & Buscema, M. Detection of an autism EEG signature from only two EEG channels through features extraction and advanced machine learning analysis. Clin EEG Neurosci. 2020, 1550059420982424. doi: 10.1177/1550059420982424. PMID: 33349054.

Abdolzadegan, D., Moattar, M. H., Ghoshuni, M. A robust method for early diagnosis of autism spectrum disorder from EEG signals based on feature selection and DBSCAN method. Biocybernetics and Biomedical Engineering, 2020, 40(1): 482-493.

Swatzyna, R. J., Tarnow, J. D., Turner, R. P., Roark, A. J., MacInerney, E. K., & Kozlowski, G. P. Integration of EEG into psychiatric practice: A step toward precision medicine for autism spectrum disorder. Journal of Clinical Neurophysiology, 2017, 34(3): 230-235. DOI: 10.1097/wnp.0000000000000365

Supplementarymaterials.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Detection of an Autism EEG Signature Through a New Processing Method Based on a Topological Approach

Status:

Version 1

Abstract

Figures

Introduction

Patients and methods

Methods

Preprocessing phase

Predictive modelling

Training-testing protocol

Natural clustering of records

Results

Discussion

Declarations

References

Supplementary Files

Status:

Version 1