Many different mathematical approaches have been tested in the last few years to disentangle the EEG data complexity and determine if it is possible to distinguish children with ASD from typically developing children or children with other neuropsychiatric disorders. An electroencephalogram (EEG) records the electrical activity of the brain by recording the electrical impulses of different frequencies used by neurons for communications through electrodes attached to the scalp. The relevant involvement of the cerebral cortex in substantially altering cortical circuitry explains the unique pattern of deficits and strengths that characterize cognitive functioning. Therefore, EEG recordings can be potential biomarkers of these abnormalities. EEG signals are random, non-stationary, and non-linear. The most delicate phase in the overall EEG process is the preprocessing phase, which aims to extract relevant features that are offered to potent classifiers, generally based on machine learning techniques.
The native EEG signal contains noise due to various factors such as involuntary hand and eye movement or heartbeat interference [1]. These interferences increase the complexity of EEG signal processing and make the quality of mathematical calculations unstable in the later stages of processing, and must, therefore, be eliminated before analysis. A good preprocessing will also reduce the cardinality of the input vectors for machine learning systems, reducing the computation time and the risks of overtraining. As mentioned in a recent review (2), many different pre-processing methods have been described in the literature as Common Spatial Patterns (CSP), Principal Component Analysis (PCA) [1], Common Average Referencing (CAR) [1][3], Surface Laplacian (SL), adaptive filtering [1][4], Independent Component Analysis (ICA) and digital filter [5], MS-ROM IFAST (6). Each method has advantages and disadvantages. PCA, for example, is a potent dimensionality reduction technology but involves discarding non-principal components with small variance, which could potentially contain useful information (7). Digital filters process EEG signals from the frequency domain and are broadly utilized in artifact processing of EEG signals; however, it is required that EEG signals and artifact signals have different frequency bands, which rarely exist in practical situations. Our group has proposed a new technique artificial neural networks based called MS-ROM / I-FAST system to extract desired features from EEG to achieve the differential diagnosis of children with autism and achieve valid results (6). The data assessment only requires a few minutes of EEG data collection and does not require any data preprocessing. The drawback of this approach is the large computational time required to achieve the final task.
In this paper, we present an alternative pre-processing approach of EEG data based on a novel algorithm applied to raw data to detect topological EEG features. Our assumption is that brain connection abnormalities can be detected through a specific mathematical topological approach, which is able to compare the minimal structure of functional networks beneath scalp electrodes. Additionally, functional interconnections of different brain areas can be assessed by measuring the interdependence of time-series electrical signals recorded by scalp electrodes using distance functions (i.e., the Euclidean distance, the Manhattan distance, the Minkowski distance, the Cosine similarity, etc.)There are many clustering methods available, such as Principal Component Analysis, Hierarchical agglomerative clustering, Nearest-neighbor test, autocorrelation, Cuzick-and-Edwards’. In our study, we have decided to rely on the minimum spanning tree (MST) algorithm as a base to perform electrodes clustering. A minimum spanning tree (MST) is a spanning tree of a connected, undirected graph. It connects all the vertices together with the minimal total weighting for its edges.
The MST algorithm described originally by the Czech scientist, Otakar Boruvka, in 1926, aims to optimize the planning of electrical connections among cities and later on refined by Kruskal’s with a specific deterministic algorithm.
A MST is a spanning tree with weight less than or equal to the weight of every other spanning tree. In practical terms, MST shows the best way to connect the variables in a tree and the shortest possible combination allowing the presentation of the data in a simplified graph.
In the bio-medical field, the MST has been used particularly in microarray clustering. Although MST-based clustering is formally equivalent to the dendrograms produced by hierarchical clustering under certain conditions, visually they can be extremely different. Our assumption is that MST is a valuable approach to synthetize the interconnection scheme of time-series electrical signals recorded by scalp electrodes which are expected to be different in subjects with autism in comparison with those affected by other disease. The main advantage of MST algorithm is that it gives a synthetic view of the variable ensemble and allows an easy understanding of clustering through links that directly connect variables that are very close to each other. The importance of the variables in the graph is related to the number of links. Hubs may be defined as the variables with the maximum number of connections in the graph.
To prove this hypothesis the EEG data of fifty subjects with autism and 50 subjects with other neuropsychiatric disorders have been pre-processed with MST. Machine learning systems have been applied subsequently to build up a predictive model to distinguish between the two diagnostic classes.