A survey of electroencephalography open datasets and their applications in deep learning

doi:10.21203/rs.3.rs-2084472/v1

Download PDF

Research Article

A survey of electroencephalography open datasets and their applications in deep learning

https://doi.org/10.21203/rs.3.rs-2084472/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Artificial intelligence is directly linked to data. Depending on the structure, we can classify data into different categories. The medical field encompasses them by considering various tests from other specialties. Medical data are complex to study and challenging to collect, so open datasets are golden for researchers. This paper aims to identify the primary open datasets of electroencephalography tests and their use with deep learning models. The process consisted of following the PRISMA methodology for systematic reviews. The databases included were Scopus, PubMed, Web Of Science, and Science Direct. In total, 37 datasets have been analyzed. The main conclusions said that convolutional neural networks are widely used alongside the 10-20 electrodes system.

Survey

Deep learning

Open datasets

Electroencephalograms

Most people are connected every day through their mobile phones or computers. This entails the creation of vast amounts of data through organizations or private companies every day. According to (Völske et al., 2021), in 2020, 44 zettabytes were produced, and for 2025 is estimated to be between 163 and 175 zettabytes. The trend remains the same in the medical field due to new applications and the wide range of data that can be used, from demographic information to images like radiographs or 3D scanners passing through those that collect the biomedical signal.

(Chang & Moura, 2010) defines biosignal processing as extracting relevant information from biomedical signals. These are also described as physiological activities from organisms that can comprise neural, cardiac rhythms, and others. Among all the types of signals, electroencephalograms (EEGs) are considered the most beneficial medical test for compiling brain signals.

EEGs are a type of data called time series, defined in (Velicer & Molenaar, 2013) as sets of repeated observations of a single unit or individual at regular intervals over many instances. The particular case of EEGs corresponds to a test used to diagnose neurological diseases based on a set of electrodes placed around the scalp. EEGs compile a lot of data being very complex to analyze. They need professionals with high skills acquired through years of training. The problem with EEGs is that they are studied by eye, and due to their complexity, the professionals miss a lot of information.

There is a trend toward integrating and leveraging these enormous amounts of data to make medicine more personalized, efficient and focused on the patient. Nevertheless, classical methods like statistics are not powerful enough to manage that large number of variables and data. At this point, using more modern techniques, such as Artificial Intelligence (AI), is a great benefit. They allow to find new patterns and predict how different variables behave or identify new ones not considered in complex medical problems, (Normandeau, 2013)

AI is a computer science field aiming to analyze and decipher human mechanisms related to intelligent behaviors. Then, these behaviors are reproduced in machines, (Russell & Norvig, 2016). Among all the AI techniques, Machine Learning (ML) has stood out from the rest in recent years. ML is defined in (Samuel, 1959) as a discipline that studies and develops algorithms that create systems that learn by finding patterns in data sets. ML comprises a wide range of data, with Artificial Neural Networks (ANN) obtaining the best results recently. (Hecht-Nielsen, 1988) defines ANN as a computational model formed by several simple units that are strongly connected and can process information by responding to external stimuli. The benefits of ANN remained not very useful until deeper architectures, called Deep Learning (DL) models, arose. DL consists of ANN models with several layers that can learn data representation using more abstraction levels, (LeCun et al., 2015). Figure 1 depicts the hierarchy of the fields in AI described previously.

This paper presents a systematic review of open EEG datasets used in works using DL techniques. The paper follows a methodology to obtain scientific research utilizing this kind of datasets. Papers have been searched in the best-known scientific sources using a set of keywords to focus the searches. However, as EEGS datasets are scarce, not many open datasets are available, so there are not many papers that meet the selection criteria. After discarding some of them that either did not use an open EEG dataset with a DL model, did not provide model performance metrics, or did not include a description of the dataset and a link to download it, we remained with 47 works. As far as we know, this is the first compilation of this type of dataset and its use with deep learning techniques.

The contribution of this paper is to provide an information resource that is a reference for researchers who want to apply deep learning techniques to EEGs, physicians, or computer scientists. In this process, we provide a set of statistical metrics alongside some graphics that let to understand the information, this is useful in the following cases. The content will let researchers know which are the most used deep learning techniques and which accuracies they get depending on the dataset. It also could help scientist to choose which models perform well with their specific data or use case. Another provided information is which datasets are available and how they performance, this is useful when researchers want to develop a new model/method and test it or know which models are not applied a lot. Another interesting use is also to know which type of use cases does not have an open dataset that could be used by the scientific community, so people could create a new one. Finally, compiling the information of which are the most common values for the main features of the datasets (number of channels, sample rate, etc.) could help us to build a golden standard of the dataset.

This work has the following sections: Section 2 describes the DL techniques used in the papers. Section 3 details the method followed in compiling the documents. Section 4 contains the in-depth study of the datasets and their use. Finally, Section 5 presents some conclusions about the research.

By considering the creation of AlexNet as the main milestone in deep learning (Krizhevsky et al., 2012), the number of papers in medical bibliographic databases has been growing exponentially yearly. Figure 2 shows that in some cases, the number of publications doubled from one year to another. Then, considering the period from 2018 to 2021, we can see that most of the scientific production in this field is during those years.

DL techniques are based on multiple models and architectures, although their effectiveness is directly related to the nature and quality of the data used in the training stage. This section describes the architectures and models that can use EEGs.

DL models can be classified depending on how they learn from the data. This case has three main classes: supervised, unsupervised and semi-supervised.

Supervised models need labeled data to perform the training. In this case, the model knows the relation between input data and the expected output and uses the following classification.

Multilayer perceptron (MLP) is the simplest case of a DL model. The architecture comprises an input layer, several hidden layers, and an output layer. (Lin et al., 2007) uses an MLP to classify EEG signals depending on the music some subjects are listening to.

Convolutional neural networks (CNNs) are the most used models with several applications in computer vision. Its primary ability is to detect patterns in a delocalized way. This characteristic lets to learn a particular pattern in an image that can later be seen in another part of another image. Recently, a specific type of CNN that manages graphs called Graph Convolutional Neural Networks (GCNN) has arisen in recent years. (Kipf & Welling, 2016) presents this model as a method that encodes a graph's structure and its nodes' features using a special type of CNNs. CNNs are used by (Zhou et al., 2018) to classify epileptic seizures. GCNN recognizes emotions by analyzing EEGs, (Song et al., 2018).

Recurrent neural networks (RNNs). (Elman, 1990) defines it as a model that uses an input vector of arbitrary length and applies a transition function recursively to its internal hidden state vector h_t. It uses data structures that are time series, for example, EEGs. Within RNNs, a particular type is called Short-Term Memory networks (LSTM). LSTMs were proposed to work with noisy or incomprehensible input data without information loss (Hochreiter & Schmidhuber, 1997). In the case of RNNs, (Ruffini et al., 2016) applies them to the prognosis in patients with neurodegeneration. Then, LSTMs have applications in emotion recognition (Alhagry et al., 2017).

The other leading group of models belongs to the category of unsupervised models. In this case, data is unlabeled, and there is no a priori knowledge about the final results (Sathya et al., 2013).

Deep Autoencoders (AE) use unsupervised learning. Defined in (Ballard, 1987), its particular characteristic is that both the input and output layer have the same or similar size and two processing structures. The first one is the decoder that starts from the input data and reduces its size to a small piece that contains its main characteristics. The second part is the decoder that aims to upsample the previous small piece of data by upsampling it until reaches the input data size. In (Qiu et al., 2018), autoencoders classify ictal EEG.

The previous learning types generate a new one by mixing them and are called semi-supervised. Generative adversarial networks (GAN) are under this class. GANs need neural models, the generator, and the discriminator. Both work in a training type called adversarial process, (Goodfellow et al., 2014). This architecture aims to learn and imitate a data distribution. The generative model is responsible for creating synthetic instances of the input data. Then, the discriminator evaluates these data and decides if it is similar enough to the input data or not. This task gives a probability of being authentic (input data) or synthetic (created by the generator). By repeating this process, the generator learns how to create data more like the input one. In this case, GANs are applied to perform data augmentation strategies with EEGs (Luo & Lu, 2018).

3.1 Criteria for selecting papers

A group of computer scientists has set out the following criteria to obtain a set of papers containing open EEG data sets. These papers describe the use of these datasets with DL models to solve a particular use case.

As there is no way to automatize the process of selecting these kinds of papers, several requirements have been set out. These inclusion criteria are:

The paper uses a DL model and is trained with an open EEG dataset.
Metrics about the performance of the models are included to evaluate if the datasets are good enough to apply DL techniques.
The paper includes a detailed description of the dataset and a link to download it.

3.2 Search strategy for identifying the studies

To obtain the papers, we have set the following keywords to be used in every scientific source: "open" OR "free available" AND ("dataset", "deep learning", "electroencephalogram"). The search and collection of papers include everything published until December 1, 2021. The following sources were used to make the searches: Scopus, PubMed, Web of Science (WOS), and Science Direct. After discarding repeated items and surveys, the final selection of works has been made.

3.3 Data extraction and classification of studies

The selection of works consisted of a single screening. Titles and abstracts are read to check if they meet the criterion of "a paper that uses an EEG open dataset to train a deep learning model". Full papers were obtained and read, and those that did not meet the criteria of including an open EEG dataset were discarded.

The paper comprises two types of studies on this set of papers. The first is on the Deep Learning model used to analyze the data, the metrics to measure performance, the use case, and the paper's year of publication. The second compiles some characteristics related to the dataset: number of channels, classes, number of individuals, the total amount of tests, duration of each test, the format of the files, distribution system, use, and sampling frequency.

3.4 Statistical analysis

Some statistics have been obtained based on the previous characteristics used in the papers and datasets. This information was compiled by developing some Python scripts and using Matplolib, which provides a graphic set of charts, (Barrett et al., 2005).

The following graphs have been provided depending on the results obtained from the papers or the datasets. Looking at the papers: a bar diagrams with the publication year, distribution of the deep learning models used, the medical use case that has been solved, and the relation between the deep learning models and the use cases. By taking into account the datasets' characteristics: distribution of the electrodes systems that have been used, frequency in hertz during the test, and the file format provided to work with the data.

4.1 PRISMA flow diagram

Figure 3 contains a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram (Moher et al., 2009), which summarizes how we achieved the final selection of papers containing EEG open datasets.

After eliminating duplicates, the search applied the criteria described above returned a total of 568 papers (ScienceDirect 264, Scopus 269, PubMed 14, and Web of Science 21). The next step was to analyze their abstracts to check that they met criteria 1 and 2. This step eliminated 515 papers, leaving 53. On this set, we checked that the entire paper was available and that there was a link to download the dataset. This decision eliminated five papers, including one whose dataset was unavailable for download, MERTI-Apps (Maeng et al., 2020). After this process, the final set had 47 items.

4.2 Paper summarising

Table 2 shows the selected set of papers with the following information: the paper's reference, the deep learning model used, the metrics applied during the experimentation, the tasks performed by the individuals while compiling the data, and the year of publication of the work.

Table 2. Summary of the primary papers reviewed.

Paper	Deep learning model	Metrics	Use case	Year
(G. Xu et al., 2019)	CNN	74.2% Accuracy	Motor imagery (MI) electroencephalogram (EEG) signal classification	2019
(Zhao et al., 2019)	CNN	93.53% and 95.86% Accuracy	Classify four emotional regions from the arousal valence plane	2019
(Yao et al., 2019)	RNN	88.80%, 88.60%, and 88.69% Sensitivity, Specificity, and Precision	Seizure/nonseizure classification	2019
(Wu et al., 2019)	CNN	75.8 and 84.3 Accuracy	MI classification	2019
(Schirrmeister et al., 2017)	CNN	91.15% Sensitivity	Classification of imagined or executed movements	2018
(Kwon et al., 2018)	CNN	73.4% Accuracy	Classify emotion based on multimodal data	2018
(Wang & Shang, 2013)	MLP	60.9%, 51.2%, and 68.4% Accuracy	Predict the levels of arousal, valance, and liking based on the learned features	2013
(X. Li et al., 2015)	MLP	58.4%, 64.2%, 65.8% and 66.9% Accuracy	EEG-based emotion recognition task	2015
(Barsim et al., 2018)	CNN	76.5% and 98.5% Accuracy	Detection and attended target recognition in attention-based speller systems	2018
(Yisi Liu et al., 2019)	CNN	63.98% and 59.84% Accuracy	Classify two levels of fatigue	2019
(Podmore et al., 2019)	CNN	86% and 77% Accuracy	Extract stimulus pattern features	2019
(Pedoeem et al., 2020)	Autoencoder	12.37% Sensitivity	Predict seizures	2020
(Cui et al., 2021)	CNN	73.22 Accuracy	Detect drivers' drowsy states	2021
(X. Zhang et al., 2018)	CNN+RNN	95.53% Accuracy	Brain typing system to convert user's thoughts to texts	2017
(L. Xu et al., 2020)	CNN	71%, 72%, 70% and 72 Accuracy	MI classification	2020
(Miao et al., 2020)	CNN	90% Accuracy	Classification of motor imagery EEG	2020
(Abdelhameed & Bayoumi, 2021)	Autoencoder+LSTM	98.79 Accuracy, 98.72 Sensitivity, 98.86 Specificity	Detecting seizures in pediatric patients	2021
(Frassineti et al., 2020)	CNN	85% Accuracy, 37% Sensitivity, 94% Specificity and 80% Accuracy, 57% Sensitivity and 82% Specificity	Diagnosis of neonatal epileptic seizures	2020
(Loza & Colgin, 2021)	MLP	41.9% and 45.9% Accuracy	Classify sleep stages	2021
(Schirrmeister et al., 2017)	CNN	85.4% Accuracy, 75.1% Sensitivity, 94.1% Specificity and 84.5% Accuracy, 77.3% Sensitivity and 90.5% Specifcity	Distinguishing pathological	2018
(Yisi Liu et al., 2020)	CNN	73.01% Accuracy and 68% Accuracy	Mental fatigue recognition	2020
(Lee et al., 2021)	GCNN	44.33% Accuracy and 44.40% Accuracy	Motor Imagery EEG Classification	2021
(Nasiri & Clifford, 2020)	GAN	75% Precision, 76% Sensitivity, 90% Accuracy and 78% Precision, 79% Sensitivity and 91% Accuracy	Classify sleep stages	2020
(Yukang Liu, n.d.)	MLP	85% Accuracy and 31.9% Accuracy	EEG-based alcoholism detection	2021
(Shalash, 2021)	CNN	94.33%, 92.57 and 93% Accuracy	Detect drivers' fatigue	2021
(D. Zhang et al., 2018)	CNN+RNN	98.3% Accuracy	EEG-based intention recognition	2017
(Normandeau, 2013)	MLP	80% Accuracy	Classification of motor activities (executed and imagery)	2015
(X. Zhang et al., 2017)	Autoencoder+RNN	98.2% Accuracy	Biometric identification	2021
(E et al., 2020)	LSTM	96.2% and 98.5% Accuracy	Epilepsy Prediction	2020
(Korkalainen et al., 2019)	CNN+LSTM	83.9 and 83.7 Accuracy	Estimation of the sleep stages	2019
(Abdelhameed & Bayoumi, 2021)	RNN	95.54% Accuracy and 95.82% AUC	Classification of Epileptic Signals	2018
(Sarmiento et al., 2021)	CNN	65.62% and 85.66% Accuracy	Recognize EEG signals in imagined vowel tasks	2021
(Bassi et al., 2021)	CNN	82.2% Accuracy and 82.5% F1-Score	BCI Classification	2021
(Guillot & Thorey, 2021)	Autoencoder	97% F1-Score	Sleep stage classification	2021
(F. Li et al., 2021)	CNN	66.5% Sensitivity, 97.9% Specificity and 67.9% Sensitivity, 97.0% Specificity	Classify sleep staging	2019
(Banville et al., 2021)	CNN	72.3% and 79.4% Accuracy	EEG-based sleep staging and pathology detection	2020
(Yan et al., 2021)	CNN+LSTM	87%, 86% and 86% Accuracy	Automatic Sleep Scoring	2020
(Eldele et al., 2021)	CNN	84.4%, 81.3% and 86.7% Accuracy	Sleep stage classification	2021
(Huang et al., 2020)	CNN	90.89% Accuracy	Sleep Stage Classification	2020
(J. Liu et al., 2020)	Autoencoder	89.49%, 92.86% and 96.77% Accuracy	EEG-Based Emotion Classification	2020
(San-Segundo et al., 2019)	CNN	99.5%, 96.5% and 95.7% Accuracy	Classification of epileptic EEG recordings	2019
(Islam et al., 2021)	CNN	78.22% and 74.92% Accuracy	Emotion Recognition	2021
(Partovi et al., 2021)	CNN	95% Accuracy	Three grasp motion classes (cylindrical, spherical, and lumbrical) of one hand	2020
(Das et al., 2020)	CNN, MLP	34.46% Accuracy	Classification task of digit recognition	2020
(Y. Zhang et al., 2021)	CNN	70.15% Accuracy, 70.18 F1-Score and 77.07% Accuracy, and 75.48% F1-Score	Detection Attention Deficit and Hyperactivity Disorder (ADHD)	2021
(R. Li et al., 2020)	GCNN	70.6% Accuracy	Fatigue-related Situation Awareness recognition	2020
(Z. Li et al., 2022)	GAN	79.45%% and 76.3% Accuracy	Emotion recognition	2021

4.3 Statistics and analysis of the studies included

This section provides graphs and statistics from analyzing the selected papers and the included datasets. Figure 4 shows a bar chart distributing the 47 papers by year of publication. It verifies the trend of papers in deep learning, as mentioned above. In recent years, we see a more significant increase; from 2017 to 2020, the number of papers has been multiplied by 7. This numbers conclude that there is room for researching in this field, as more papers of EEGs and deep learning seems to be published in the following years.

Another relevant piece of information that can be obtained from this preliminary analysis is the type of DL model used. This knowledge is helpful for researchers to determine which are the most potent models for processing EEGs.

As can be seen in Figure 5, the most commonly used DL model, by a wide margin, is the CNN, which appears in 55.3% of the cases either as a 1-dimensional CNN (EEGs are processed channel by channel) or 2-dimensional CNN, (EEGs are processed as a whole). Then there is a set of papers that uses MLP, 10.6%, and Autoencoders, 6.38%. In a percentage of 4.26%, we find RNN, CNN plus RNN, GCNN, and GAN. Finally, CNN with LSTM, Autoencoder with RNN, and LSTM are 2.13% of the cases. RNN and LSTMs are part of the first group and are significantly related to the signal processing field. We also should highlight the use of four hybrid models: CNN and Autoencoder combined with LSTM. These numbers give us two ideas: first, using CNNs is successful but less innovative and, second, using hybrid models seems an opportunity to make new contributions in the field.

EEGs can solve several use cases. This information helps know which application fields are less exploited, so there is scope for further research. We have classified the datasets into 11 general categories:

Motor imagery (MI) classification. This field aims to recognize a subject's intention, (Lu et al., 2016).
Seizure management. EEGs of patients with epilepsy, a brain disorder that consists of abnormal cerebral activities.
Classify sleep stages. Datasets collect the five possible stages a human can experiment with while sleeping.
EEG-based alcoholism detection. Brain patterns detection between an alcoholic or a healthy person.
Biometric identification. This application is related to a person's unique characteristics, such as fingerprints. In this case, the study considers brain signals.
Recognize emotions. This task consists of classifying human emotional states as the domains of arousal and valence.
Classify levels of fatigue. Mental fatigue happens when a subject has paid attention to a task for a long time. These datasets can measure different levels of fatigue, in some cases while driving.
Disease diagnostic. In the medical field, we typically find datasets of epilepsy, but others can diagnose diseases such as Attention Deficit and Hyperactivity Disorder (ADHD).

Figure 6 uses a pie chart to describe this information. The most frequent use case is MI EEG classification, with more than 30% of the cases. This fact is related to the BCI competition IV[1], a famous data resource in the field comprising a set of datasets for signal processing and BCI classification. Then, we can highlight four use cases among the rest: sleep stage classification, emotion recognition, seizure management, and fatigue classification. The rest of the use cases only occur once or twice: disease diagnosis, alcoholism detection, and biometric identification. We can conclude with this analysis that if we want to publish a dataset that brings value to the field, the last three use cases are not exploited a lot.

Figure 7 combines the results of both previous analyses in a bubble diagram where the X-axis represents the deep learning model and the Y-axis the possible use cases. This information is interesting when a scientist needs to decide what DL models could be used depending on the use case they are working on. The bubble size and color depending on the number of instances. The biggest bubble representing MI EEG classification with CNN makes sense because both are the most popular in their category. Then, we have papers using CNN in stages of fatigue and sleep. The information of the chart can be used to identify what models can be used with our dataset. Also, to find combinations that have not been applied before to do new contributions to science.

The following is a brief description of the information represented in the previous papers' datasets.

BCI competition IV 2a[2]: the imagination of movement of the left hand, right hand, both feet, and tongue.
BCI competition IV 2b[3]: motor imagery of left hand and right hand.
DEAP and video signals[4]: emotion recognition of low arousal and low valence (LALV), high arousal and low valence (HALV), low arousal and high valence (LAHV,) and high arousal and high valence (HAHV).
EEG Motor Movement/Imagery Dataset[5]: the active task of closing the eyes, MI of both feet, fists, left fist, and right fist.
Multichannel EEG sustained attention driving task[6]: fatigue and non-fatigued during driving.
Temple University EEG Corpus[7]: a compilation of different neural diseases.
CHB-MIT Scalp EEG Database[8]: seizure and nonseizure states in epileptic patients.
MAHNOB-HCI[9]: a scale of valence and arousal.
High Gamma Dataset[10]: MI of the left hand, right hand, and resting.
Sleep EDF[11]: sleep stages after temazepam intake and after placebo intake.
AMIGOS[12]: valence, arousal, dominance, familiarity and liking, and selected basic emotions.
Motor Imagery dataset from Weibo et al. 2014[13]: simple MI (left hand, right hand, and feet) and compound MI (both hands, left hand combined with the right foot, right hand combined with the left foot).
The DREAMS Databases[14]: sleep spindles expert scores.
PhysioNet/CinC Challenge 2018[15]: wakefulness, stage 1, stage 2, stage 3, rapid eye movement (REM), and undefined.
Open source SSVEP dataset[16]: healthy subjects focused on 40 characters flickering at different frequencies.
BCI Competition IVa[17]: MI of the left hand, right hand, and right foot.
NMED-H[18]: rated the pleasantness, musicality, order, and level of interest of the musical stimulus.
EEG data for driver fatigue detection[19]: drivers suffering fatigue or not.
Neonatal EEG recordings with seizures[20]: seizure or nonseizure states of epilepsy in neonates.
University of Bonn[21]: seizure and nonseizure states.
Motor Imagery dataset from Zhou et al. 2016[22]: MI of the left hand, right hand, and feet.
Sleep Heart Health Study[23]: sleep scores.
EEG datasets for motor imagery brain-computer interface[24]: data for non-task-related and task-related states.
SEED[25]: report of emotional reactions.
EEG Motor Movement/Imagery Dataset a subset[26]: MI of open and close left or right fist, opening and closing left or right fist, open and close both fists or both feet, and opening and closing both fists or both feet.
DOD-O[27]: scored apnea patients.
DOD-H[28]: scored sleep stages.
International BCI Competition[29]: imagine three different grasps, cylindrical, spherical, and lumbrical.
CAP sleep database[30]: activity during NREM sleep.
ISRUC-Sleep[31]: healthy subjects, subjects with sleep disorders, and subjects under the effect of sleep medication.
NIRS brain-computer interface[32]: MI experiments: left and right hand or mental arithmetic and resting state.
Bern-Barcelona EEG database[33]: patients have pharmacoresistant focal-onset epilepsy.
BD1[34]: imagine vowels.
MrOS Sleep[35]: sleep study.
Database-Imaged-Vowels-1[36]: pronounce the five main vowels "a", "e", "i", "o", and "u" and six Spanish words.
MindBigData MUSE[37]: the subject is allowed to see a digit from 0 to 9.

All the information in the datasets has been collected in the following table. The most used datasets are part of the BCI Competition, which confirms what has been pointed out above. In the first position, we can find BCI Competition IV 2a (8 times) and 2b (5 times). The number of subjects used in the research ranges from 4 to 40,983 and is directly related to the model behavior. (Roy et al., 2019) show that models increase their performance when the number of subjects exceeds 15. In the case of the duration of the tests, we can find tests of seconds, minutes, or hours (usually, these are sleep studies or patients with epilepsy). The number of channels is also a critical decision depending on the use case. (Jasper, 1958) tells that a minimum of 21 channels should be used to examine an adult brain. Any analysis does not support this information for deep learning studies, so it could be a future work to be developed. Other features that have not been studied under a minimum standard to be met are the electrodes system and the sampling frequency.

Table 3. Summary of the selected datasets.

Dataset	Number of subjects	Total tests	Length per test	Electrodes' system	Nº channels	Sampling frequency	Format	Papers
BCI Competition IV 2a	9	2,591	5 minutes	10-20 system	22 channels	250 Hz	GDF	8
BCI Competition IV 2b	9	45	5 minutes	No system	3 channels	250 Hz	GDF	5
DEAP and video signals	32	32	40 minutes	10-20 system	45 channels	512 Hz	BDF	6
EEG Motor Movement Imagery Dataset	109	3,145,160	2 minutes	10-10 system	64 channels	160 Hz	EDF	4
Multi-channel EEG sustained attention driving task	27	62	90 minutes	10-20 system	32 channels	500 Hz	SET	3
Temple University EEG Corpus	16,986	10,874	20 minutes	10-20 system	31 channels	250 Hz (87%), 256 Hz (8.3%), 400 Hz (3.8%), and 512 Hz (1%).	EDF	4
CHB-MIT Scalp EEG Database	22	664	1 to 4 hours	10-20 system	23 channels	256 Hz	EDF	2
MAHNOB-HCI	30	120	30 seconds	10-20 system	32 channels	256 Hz	BDF	2
High Gamma Dataset	14	1,040	52 seconds	10-20 system	44 channels	512 Hz	MAT	2
Sleep EDF	78	197	9 hours	No system	2 channels	100 Hz	EDF	5
AMIGOS	77	77	Variable	10-20 system	17 channels	256 Hz	MAT	1
Weibo et al., 2014	10	320	8 seconds	10-20 system	60 channels	100 Hz	TXT	1
The DREAMS Databases	8	8	30 minutes	No system	Single-channel	200 Hz	EDF	1
PhysioNet/CinC Challenge 2018	1,985	1,985	7.7 hours average	10-20 system	6 channels	200 Hz	MAT	2
Open source SSVEP dataset	35	35	4 minutes	10-20 system	64 channels	1000 Hz	MAT	2
BCI Competition IVa	5	10	980 seconds	10-20 system	118 channels	1000 Hz	MAT	1
NMED-H	48	97	36 minutes	EGI 128-channel	125 channels	125 Hz	MAT	1
EEG data for driver fatigue detection	12	12	5 minutes	10-20 system	40 channels	1000 Hz	CNT	1
Neonatal EEG recordings with seizures	79	79	74 minutes average	10-20 system	19 channels	256 Hz	EDF	1
University of Bonn	5	500	23.6 seconds	10-20 system	Single-channel	173.61 Hz	TXT	2
Zhou et al., 2016	4	12	750 seconds	10-20 system	14 channels	250 Hz	CNT	1
Sleep Heart Health Study	3,295	2,651	About 8 hours	No system	2 channels	125 Hz	EDF	5
EEG datasets for motor imagery brain-computer interface	52	52	51 minutes	10-10 system	64 channels	512 Hz	MAT	2
EEG Database Data Set	122	122	420 seconds	No system	64 channels	512 Hz	RD	1
SEED	15	15	4,575 seconds	10-20 system	62 channels	200 Hz	MAT	2
EEG Motor Movement/Imagery Dataset a subset	109	>1,500	8 minutes	10-10 system	64 channels	160 Hz	EDF	1
DOD-O	55	55	387 minutes	No system	8 channels	250 Hz	H5	1
DOD-H	25	25	427 minutes	No system	8 channels	250 Hz	H5	1
2020 International BCI Competition	15	30	500 seconds	No system	60 channels	60 Hz	MAT	1
CAP Sleep Database	108	108	410 minutes	10-20 system	3 channels	From 128 to 512 Hz	EDF	1
ISRUC-Sleep	118	121	8 hours	10-20 system	7 channels	200 Hz	EDF	1
NIRS brain-computer interface (BCI)	26	26	62 seconds	10-5 system	30 channels	200 Hz	MAT	1
Bern-Barcelona EEG database	5	3,740	20 seconds	10-20 system	64 channels	512 or 1024 Hz	TXT	1
BD1	15	15	480 seconds	10-20 system	18 channels	1024 Hz	MAT	1
MrOS Sleep	1,026	586	341 minutes	10-20 system	5 channels	256 Hz	EDF	1
Database-Imaged-Vowels	15	15	110 seconds	10-20 system	18 channels	1024 Hz	MAT	1
MindBigData MUSE	40,983	40,983	8 seconds	10-20 system	4 channels	220 Hz	TXT	1

Figure 8 shows a pie chart with the distribution of datasets according to the system used to place the electrodes around the scalp. As can be seen, 10-20 and 10-10 systems are the most used, which makes sense due to the following aspects. They are international recommendations, (S. Yang & Deravi, 2017). (Association & others, 2013) highlights that they are also the most used ones.

The following pie chart (Figure 9) represents the distribution of studies according to the frequency used to collect the data. In the first position, we can find three different measures 200, 250, and 256 Hz. The percentage of datasets using 512 and 1000 Hz are also noteworthy. This measure is directly related to the machine used to collect the data. Regarding the minimum Hz to obtain good performances in DL models, (Wen et al., 2021) demonstrate that a higher frequency does not provide better results.

Finally, we have a pie chart that compiles the file format used. Figure 10 shows that European Data Format (EDF) and MAT are the most used. The first one is a standard for storing multichannel biological and physiological signals, (Kemp & Olivan, 2003). The other relates to EEGLAB[38] , a well-known Matlab tool for brain signal processing.

This work provides a compilation of open EEG datasets from papers that apply deep learning models. We have used PRISMA to define a workflow for selecting a set of papers that uses these kinds of datasets. Our initial search returned 568 works which, after screening based on the inclusion/exclusion criteria, were reduced to 47. In these papers, 37 datasets were found. Some clear conclusions are obtained: convolutional neural networks are widely used due to their link with the nature of the data, MI classification is the most common use case, the most used electrodes system is the 10-20 system, and EDF and MAT file-formats stand out from the rest.

The further analysis concludes that the number of published papers per year is remarkable, but it is still worth working in the field. From 2017 to 2020, the amount has been multiplied by 7. In 2021, the amount decreased a little. Related to the DL models, we can see that CNNs are a good solution which is why they have been widely applied. However, the bubble diagram can be used by researchers to know which DL models should be involved in their datasets depending on the use case. The graphics of the use cases are helpful to find application fields that have not been covered a lot or to know which kind of datasets can obtain good results. There is a limitation in the study because there are not several papers accomplishing the criteria. As EEGs are medical data, people are reluctant to make them freely available, and researchers who compile the EEGs do not want to share them since they prefer to exploit them themselves. Another reason is the difficulty of collecting a good quality bank of EEGs as it is costly in terms of money (electroencephalographs are expensive) and time.

The analysis of the dataset's characteristics leads us to conclude that the 10-20 system is the most widely used when collecting the data. No work supports the idea that this is the most efficient one. The sample rate of the datasets is very diverse; therefore, none is a priori better than the other.

A limitation of the work is that the authors of the papers using the datasets are not the same as those who have published them. This condition supposes a decoupling between the medical and computer science perspectives, not considering that both profiles are necessary.

Some niches to consider are the following. The use of Natural Language Processing (NLP) techniques such as Transformers and GCNN for not being so exploited. NLP models are one of the most advanced nowadays. If we make a parallelism between texts and EEGs, a sentence can be considered a channel and a word in the sentence a particular measure in the channel. This approach could be a starting point for applying these powerful models with EEGs. Another exciting application is studying the network connectivity that can be modeled by representing the EEGs as graphs. In this case, GCNNs are very useful and seem to be a niche.

In future work, it could be interesting finding a gold standard of the characteristics of an EEG dataset to be used in multidisciplinary teams of physicians and computer scientists because sometimes the needs of some do not match those of others. Only one article has been found that studies a single characteristic of the datasets, the number of subjects, (Roy et al., 2019). So, we propose to do different future works to discover how the electrodes system, the number of channels, or the sample rate influence obtain good results when using DL models.

Competing interests and Funding

This work has received no funding.

Abdelhameed, A., & Bayoumi, M. (2021). A deep learning approach for automatic seizure detection in children with epilepsy. Frontiers in Computational Neuroscience, 15, 29.
Association, M., & others. (2013). Assistive Technologies: Concepts, Methodologies, Tools, and Applications: Concepts, Methodologies, Tools, and Applications. IGI Global.
Ballard, D. H. (1987). Modular learning in neural networks. AAAI, 647, 279–284.
Banville, H., Chehab, O., Hyvärinen, A., Engemann, D.-A., & Gramfort, A. (2021). Uncovering the structure of clinical EEG signals with self-supervised learning. Journal of Neural Engineering, 18(4), 46020.
Barrett, P., Hunter, J., Miller, J. T., Hsu, J.-C., & Greenfield, P. (2005). matplotlib--A Portable Python Plotting Package. Astronomical Data Analysis Software and Systems XIV, 347, 91.
Barsim, K. S., Zheng, W., & Yang, B. (2018). Ensemble learning to EEG-based brain computer interfaces with applications on P300-spellers. 2018 IEEE International Conference on Systems, Man, and Cybernetics (Smc), 631–638.
Bassi, P. R. A. S., Rampazzo, W., & Attux, R. (2021). Transfer learning and SpecAugment applied to SSVEP based BCI classification. Biomedical Signal Processing and Control, 67, 102542.
Chang, H.-H., & Moura, J. M. F. (2010). Biomedical signal processing. Biomedical Engineering and Design Handbook, 2, 559–579.
Cui, J., Lan, Z., Liu, Y., Li, R., Li, F., Sourina, O., & Müller-Wittig, W. (2021). A compact and interpretable convolutional neural network for cross-subject driver drowsiness detection from single-channel EEG. Methods.
Das, D., Chowdhury, T., & Pal, U. (2020). Analysis of Multi-class Classification of EEG Signals Using Deep Learning. International Conference on Pattern Recognition and Artificial Intelligence, 203–217.
E, S. S. V., A, S., & A, N. K. (2020). Epilepsy Prediction using a Combined LSTM - XGBoost System on EEG Signals. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 10(1), 18–24. https://doi.org/10.35940/ijitee.A8086.1110120
Eldele, E., Chen, Z., Liu, C., Wu, M., Kwoh, C.-K., Li, X., & Guan, C. (2021). An Attention-Based Deep Learning Approach for Sleep Stage Classification With Single-Channel EEG. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 809–818.
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.
Frassineti, L., Ermini, D., Fabbri, R., & Manfredi, C. (2020). Neonatal Seizures Detection using Stationary Wavelet Transform and Deep Neural Networks: Preliminary Results. 2020 IEEE 20th Mediterranean Electrotechnical Conference (MELECON), 344–349.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27.
Guillot, A., & Thorey, V. (2021). RobustSleepNet: Transfer learning for automated sleep staging at scale. ArXiv Preprint ArXiv:2101.02452.
Hecht-Nielsen, R. (1988). Neurocomputing: picking the human brain. IEEE Spectrum, 25(3), 36–41.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Huang, X., Shirahama, K., Li, F., & Grzegorzek, M. (2020). Sleep stage classification for child patients using DeConvolutional Neural Network. Artificial Intelligence in Medicine, 110, 101981.
Islam, M. R., Islam, M. M., Rahman, M. M., Mondal, C., Singha, S. K., Ahmad, M., Awal, A., Islam, M. S., & Moni, M. A. (2021). EEG channel correlation based model for emotion recognition. Computers in Biology and Medicine, 136, 104757.
Jasper, H. H. (1958). The ten-twenty electrode system of the International Federation. Electroencephalogr. Clin. Neurophysiol., 10, 370–375.
Kemp, B., & Olivan, J. (2003). European data format ‘plus’(EDF+), an EDF alike standard format for the exchange of physiological data. Clinical Neurophysiology, 114(9), 1755–1761.
Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. ArXiv Preprint ArXiv:1609.02907.
Korkalainen, H., Aakko, J., Nikkonen, S., Kainulainen, S., Leino, A., Duce, B., Afara, I. O., Myllymaa, S., Töyräs, J., & Leppänen, T. (2019). Accurate deep learning-based sleep staging in a clinical population with suspected obstructive sleep apnea. IEEE Journal of Biomedical and Health Informatics, 24(7), 2073–2081.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.
Kwon, Y.-H., Shin, S.-B., & Kim, S.-D. (2018). Electroencephalography based fusion two-dimensional (2D)-convolution neural networks (CNN) model for emotion recognition system. Sensors, 18(5), 1383.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Lee, J., Choi, J. W., & Jo, S. (2021). Subject-Independent Motor Imagery EEG Classification Based on Graph Convolutional Network. The 6th Asian Conference on Pattern Recognition (ACPR 2021).
Li, F., Yan, R., Mahini, R., Wei, L., Wang, Z., Mathiak, K., Liu, R., & Cong, F. (2021). End-to-end sleep staging using convolutional neural network in raw single-channel EEG. Biomedical Signal Processing and Control, 63, 102203.
Li, R., Lan, Z., Cui, J., Sourina, O., & Wang, L. (2020). EEG-based recognition of driver state related to situation awareness using graph convolutional networks. 2020 International Conference on Cyberworlds (CW), 180–187.
Li, X., Zhang, P., Song, D., Yu, G., Hou, Y., & Hu, B. (2015). EEG based emotion identification using unsupervised deep feature learning.
Li, Z., Liu, J., & Yang, L. (2022). Subject Adversarial Neural Network on EEG-Based Emotion Recognition. Proceedings of 2021 Chinese Intelligent Systems Conference, 814–821.
Lin, Y.-P., Wang, C.-H., Wu, T.-L., Jeng, S.-K., & Chen, J.-H. (2007). Multilayer perceptron for EEG signal classification during listening to emotional music. TENCON 2007-2007 IEEE Region 10 Conference, 1–3.
Liu, J., Wu, G., Luo, Y., Qiu, S., Yang, S., Li, W., & Bi, Y. (2020). EEG-based emotion classification using a deep neural network and sparse autoencoder. Frontiers in Systems Neuroscience, 14, 43.
Liu, Yisi, Lan, Z., Cui, J., Sourina, O., & Müller-Wittig, W. (2020). Inter-subject transfer learning for EEG-based mental fatigue recognition. Advanced Engineering Informatics, 46, 101157.
Liu, Yisi, Lan, Z., Cui, J., Sourina, O., & Müller-Wittig, W. (2019). Eeg-based cross-subject mental fatigue recognition. 2019 International Conference on Cyberworlds (CW), 247–252.
Liu, Yukang. (n.d.). EEG alcoholism classification based on BDNN⋆.
Loza, C. A., & Colgin, L. L. (2021). Deep Neural Dynamic Bayesian Networks applied to EEG sleep spindles modeling. International Conference on Medical Image Computing and Computer-Assisted Intervention, 550–560.
Lu, N., Li, T., Ren, X., & Miao, H. (2016). A deep learning scheme for motor imagery classification based on restricted Boltzmann machines. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(6), 566–576.
Luo, Y., & Lu, B.-L. (2018). EEG data augmentation for emotion recognition using a conditional Wasserstein GAN. 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2535–2538.
Miao, M., Hu, W., Yin, H., & Zhang, K. (2020). Spatial-frequency feature learning and classification of motor imagery EEG based on deep convolution neural network. Computational and Mathematical Methods in Medicine, 2020.
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & Group, P. (2009). Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Medicine, 6(7), e1000097.
Nasiri, S., & Clifford, G. D. (2020). Attentive adversarial network for large-scale sleep staging. Machine Learning for Healthcare Conference, 457–478.
Normandeau, K. (2013). Beyond volume, variety and velocity is the issue of big data veracity. Inside Big Data.
Partovi, A., Hosseini, S. M., Soleymani, M., Liaghat, K., Ziaee, S., Fard, E. H. P., Vajdi, S. S., & Goodarzy, F. (2021). A Deep Learning Algorithm for Classifying Grasp Motions using Multi-session EEG Recordings. 2021 9th International Winter Conference on Brain-Computer Interface (BCI), 1–6.
Pedoeem, J., Abittan, S., Yosef, G. B., & Keene, S. (2020). TABS: Transformer Based Seizure Detection. 2020 IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 1–6.
Podmore, J. J., Breckon, T. P., Aznan, N. K. N., & Connolly, J. D. (2019). On the relative contribution of deep convolutional neural networks for SSVEP-based bio-signal decoding in BCI speller applications. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(4), 611–618.
Qiu, Y., Zhou, W., Yu, N., & Du, P. (2018). Denoising sparse autoencoder-based ictal EEG classification. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26(9), 1717–1726.
Roy, Y., Banville, H., Albuquerque, I., Gramfort, A., Falk, T. H., & Faubert, J. (2019). Deep learning-based electroencephalography analysis: a systematic review. Journal of Neural Engineering, 16(5), 51001.
Ruffini, G., Ibañez, D., Castellano, M., Dunne, S., & Soria-Frisch, A. (2016). EEG-driven RNN classification for prognosis of neurodegeneration in at-risk patients. International Conference on Artificial Neural Networks, 306–313.
Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach. Malaysia. Pearson Education Limited London, UK:
Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3), 210–229.
San-Segundo, R., Gil-Mart\’\in, M., D’Haro-Enr\’\iquez, L. F., & Pardo, J. M. (2019). Classification of epileptic EEG recordings using signal transforms and convolutional neural networks. Computers in Biology and Medicine, 109, 148–158.
Sarmiento, L. C., Villamizar, S., López, O., Collazos, A. C., Sarmiento, J., & Rodr\’\iguez, J. B. (2021). Recognition of EEG Signals from Imagined Vowels Using Deep Learning Methods. Sensors, 21(19), 6503.
Sathya, R., Abraham, A., & others. (2013). Comparison of supervised and unsupervised learning algorithms for pattern classification. International Journal of Advanced Research in Artificial Intelligence, 2(2), 34–38.
Schirrmeister, R. T., Springenberg, J. T., Fiederer, L. D. J., Glasstetter, M., Eggensperger, K., Tangermann, M., Hutter, F., Burgard, W., & Ball, T. (2017). Deep learning with convolutional neural networks for EEG decoding and visualization. Human Brain Mapping, 38(11), 5391–5420.
Shalash, W. M. (2021). A deep learning cnn model for driver fatigue detection using single EEG channel. Journal of Theoretical and Applied Information Technology, 99(2), 462–477.
Song, T., Zheng, W., Song, P., & Cui, Z. (2018). EEG emotion recognition using dynamical graph convolutional neural networks. IEEE Transactions on Affective Computing, 11(3), 532–541.
Tortora, S., Ghidoni, S., Chisari, C., Micera, S., & Artoni, F. (2020). Deep learning-based BCI for gait decoding from EEG with LSTM recurrent neural network. Journal of Neural Engineering, 17(4), 46011.
Velicer, W. F., & Molenaar, P. C. (2013). Time series analysis for psychological research.
Völske, M., Bevendorff, J., Kiesel, J., Stein, B., Fröbe, M., Hagen, M., & Potthast, M. (2021). Web Archive Analytics. INFORMATIK 2020.
Wang, D., & Shang, Y. (2013). Modeling physiological data with deep belief networks. International Journal of Information and Education Technology (IJIET), 3(5), 505.
Wen, T., Du, Y., Pan, T., Huang, C., & Zhang, Z. (2021). A Deep Learning-Based Classification Method for Different Frequency EEG Data. Computational and Mathematical Methods in Medicine, 2021.
Wu, H., Niu, Y., Li, F., Li, Y., Fu, B., Shi, G., & Dong, M. (2019). A parallel multiscale filter bank convolutional neural networks for motor imagery EEG classification. Frontiers in Neuroscience, 13, 1275.
Xu, G., Shen, X., Chen, S., Zong, Y., Zhang, C., Yue, H., Liu, M., Chen, F., & Che, W. (2019). A deep transfer convolutional neural network framework for EEG signal classification. IEEE Access, 7, 112767–112776.
Xu, L., Xu, M., Ke, Y., An, X., Liu, S., & Ming, D. (2020). Cross-dataset variability problem in EEG decoding with deep learning. Frontiers in Human Neuroscience, 14, 103.
Yan, R., Li, F., Zhou, D. D., Ristaniemi, T., & Cong, F. (2021). Automatic sleep scoring: A deep learning architecture for multi-modality time series. Journal of Neuroscience Methods, 348, 108971.
Yang, D., Liu, Y., Zhou, Z., Yu, Y., & Liang, X. (2020). Decoding visual motions from EEG using attention-based RNN. Applied Sciences, 10(16), 5662.
Yang, S., & Deravi, F. (2017). On the usability of electroencephalographic signals for biometric recognition: A survey. IEEE Transactions on Human-Machine Systems, 47(6), 958–969.
Yao, X., Cheng, Q., & Zhang, G.-Q. (2019). Automated classification of seizures against nonseizures: A deep learning approach. ArXiv Preprint ArXiv:1906.02745.
Zhang, D., Yao, L., Zhang, X., Wang, S., Chen, W., Boots, R., & Benatallah, B. (2018). Cascade and parallel convolutional recurrent neural networks on EEG-based intention recognition for brain computer interface. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1).
Zhang, X., Yao, L., Kanhere, S. S., Liu, Y., Gu, T., & Chen, K. (2017). MindID: Person identification from brain waves through attention-based recurrent neural network. ArXiv Preprint ArXiv:1711.06149.
Zhang, X., Yao, L., Sheng, Q. Z., Kanhere, S. S., Gu, T., & Zhang, D. (2018). Converting your thoughts to texts: Enabling brain typing via deep feature learning of eeg signals. 2018 IEEE International Conference on Pervasive Computing and Communications (PerCom), 1–10.
Zhang, Y., Cai, H., Nie, L., Xu, P., Zhao, S., & Guan, C. (2021). An end-to-end 3D convolutional neural network for decoding attentive mental state. Neural Networks, 144, 129–137.
Zhao, Y., Cao, X., Lin, J., Yu, D., & Cao, X. (2019). Multimodal emotion recognition model using physiological signals. ArXiv Preprint ArXiv:1911.12918.
Zhou, M., Tian, C., Cao, R., Wang, B., Niu, Y., Hu, T., Guo, H., & Xiang, J. (2018). Epileptic seizure detection based on EEG signals and CNN. Frontiers in Neuroinformatics, 12, 95.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

A survey of electroencephalography open datasets and their applications in deep learning

Status:

Version 1

Abstract

Figures

1. Introduction

2. State of the art

3. Materials and methods

4. Results

5. Discussion and future works

Declarations

References

Additional Declarations

Status:

Version 1