4.1 PRISMA flow diagram
Figure 3 contains a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram (Moher et al., 2009), which summarizes how we achieved the final selection of papers containing EEG open datasets.
After eliminating duplicates, the search applied the criteria described above returned a total of 568 papers (ScienceDirect 264, Scopus 269, PubMed 14, and Web of Science 21). The next step was to analyze their abstracts to check that they met criteria 1 and 2. This step eliminated 515 papers, leaving 53. On this set, we checked that the entire paper was available and that there was a link to download the dataset. This decision eliminated five papers, including one whose dataset was unavailable for download, MERTI-Apps (Maeng et al., 2020). After this process, the final set had 47 items.
4.2 Paper summarising
Table 2 shows the selected set of papers with the following information: the paper's reference, the deep learning model used, the metrics applied during the experimentation, the tasks performed by the individuals while compiling the data, and the year of publication of the work.
Table 2. Summary of the primary papers reviewed.
Paper
|
Deep learning model
|
Metrics
|
Use case
|
Year
|
(G. Xu et al., 2019)
|
CNN
|
74.2% Accuracy
|
Motor imagery (MI) electroencephalogram (EEG) signal classification
|
2019
|
(Zhao et al., 2019)
|
CNN
|
93.53% and 95.86% Accuracy
|
Classify four emotional regions from the arousal valence plane
|
2019
|
(Yao et al., 2019)
|
RNN
|
88.80%, 88.60%, and 88.69% Sensitivity, Specificity, and Precision
|
Seizure/nonseizure classification
|
2019
|
(Wu et al., 2019)
|
CNN
|
75.8 and 84.3 Accuracy
|
MI classification
|
2019
|
(Schirrmeister et al., 2017)
|
CNN
|
91.15% Sensitivity
|
Classification of imagined or executed movements
|
2018
|
(Kwon et al., 2018)
|
CNN
|
73.4% Accuracy
|
Classify emotion based on multimodal data
|
2018
|
(Wang & Shang, 2013)
|
MLP
|
60.9%, 51.2%, and 68.4% Accuracy
|
Predict the levels of arousal, valance, and liking based on the learned features
|
2013
|
(X. Li et al., 2015)
|
MLP
|
58.4%, 64.2%, 65.8% and 66.9% Accuracy
|
EEG-based emotion recognition task
|
2015
|
(Barsim et al., 2018)
|
CNN
|
76.5% and 98.5% Accuracy
|
Detection and attended target recognition in attention-based speller systems
|
2018
|
(Yisi Liu et al., 2019)
|
CNN
|
63.98% and 59.84% Accuracy
|
Classify two levels of fatigue
|
2019
|
(Podmore et al., 2019)
|
CNN
|
86% and 77% Accuracy
|
Extract stimulus pattern features
|
2019
|
(Pedoeem et al., 2020)
|
Autoencoder
|
12.37% Sensitivity
|
Predict seizures
|
2020
|
(Cui et al., 2021)
|
CNN
|
73.22 Accuracy
|
Detect drivers' drowsy states
|
2021
|
(X. Zhang et al., 2018)
|
CNN+RNN
|
95.53% Accuracy
|
Brain typing system to convert user's thoughts to texts
|
2017
|
(L. Xu et al., 2020)
|
CNN
|
71%, 72%, 70% and 72 Accuracy
|
MI classification
|
2020
|
(Miao et al., 2020)
|
CNN
|
90% Accuracy
|
Classification of motor imagery EEG
|
2020
|
(Abdelhameed & Bayoumi, 2021)
|
Autoencoder+LSTM
|
98.79 Accuracy, 98.72 Sensitivity, 98.86 Specificity
|
Detecting seizures in pediatric patients
|
2021
|
(Frassineti et al., 2020)
|
CNN
|
85% Accuracy, 37% Sensitivity, 94% Specificity and 80% Accuracy, 57% Sensitivity and 82% Specificity
|
Diagnosis of neonatal epileptic seizures
|
2020
|
(Loza & Colgin, 2021)
|
MLP
|
41.9% and 45.9% Accuracy
|
Classify sleep stages
|
2021
|
(Schirrmeister et al., 2017)
|
CNN
|
85.4% Accuracy, 75.1% Sensitivity, 94.1% Specificity and 84.5% Accuracy, 77.3% Sensitivity and 90.5% Specifcity
|
Distinguishing pathological
|
2018
|
(Yisi Liu et al., 2020)
|
CNN
|
73.01% Accuracy and 68% Accuracy
|
Mental fatigue recognition
|
2020
|
(Lee et al., 2021)
|
GCNN
|
44.33% Accuracy and 44.40% Accuracy
|
Motor Imagery EEG Classification
|
2021
|
(Nasiri & Clifford, 2020)
|
GAN
|
75% Precision, 76% Sensitivity, 90% Accuracy and 78% Precision, 79% Sensitivity and 91% Accuracy
|
Classify sleep stages
|
2020
|
(Yukang Liu, n.d.)
|
MLP
|
85% Accuracy and 31.9% Accuracy
|
EEG-based alcoholism detection
|
2021
|
(Shalash, 2021)
|
CNN
|
94.33%, 92.57 and 93% Accuracy
|
Detect drivers' fatigue
|
2021
|
(D. Zhang et al., 2018)
|
CNN+RNN
|
98.3% Accuracy
|
EEG-based intention recognition
|
2017
|
(Normandeau, 2013)
|
MLP
|
80% Accuracy
|
Classification of motor activities (executed and imagery)
|
2015
|
(X. Zhang et al., 2017)
|
Autoencoder+RNN
|
98.2% Accuracy
|
Biometric identification
|
2021
|
(E et al., 2020)
|
LSTM
|
96.2% and 98.5% Accuracy
|
Epilepsy Prediction
|
2020
|
(Korkalainen et al., 2019)
|
CNN+LSTM
|
83.9 and 83.7 Accuracy
|
Estimation of the sleep stages
|
2019
|
(Abdelhameed & Bayoumi, 2021)
|
RNN
|
95.54% Accuracy and 95.82% AUC
|
Classification of Epileptic Signals
|
2018
|
(Sarmiento et al., 2021)
|
CNN
|
65.62% and 85.66% Accuracy
|
Recognize EEG signals in imagined vowel tasks
|
2021
|
(Bassi et al., 2021)
|
CNN
|
82.2% Accuracy and 82.5% F1-Score
|
BCI Classification
|
2021
|
(Guillot & Thorey, 2021)
|
Autoencoder
|
97% F1-Score
|
Sleep stage classification
|
2021
|
(F. Li et al., 2021)
|
CNN
|
66.5% Sensitivity, 97.9% Specificity and 67.9% Sensitivity, 97.0% Specificity
|
Classify sleep staging
|
2019
|
(Banville et al., 2021)
|
CNN
|
72.3% and 79.4% Accuracy
|
EEG-based sleep staging and pathology detection
|
2020
|
(Yan et al., 2021)
|
CNN+LSTM
|
87%, 86% and 86% Accuracy
|
Automatic Sleep Scoring
|
2020
|
(Eldele et al., 2021)
|
CNN
|
84.4%, 81.3% and 86.7% Accuracy
|
Sleep stage classification
|
2021
|
(Huang et al., 2020)
|
CNN
|
90.89% Accuracy
|
Sleep Stage Classification
|
2020
|
(J. Liu et al., 2020)
|
Autoencoder
|
89.49%, 92.86% and 96.77% Accuracy
|
EEG-Based Emotion Classification
|
2020
|
(San-Segundo et al., 2019)
|
CNN
|
99.5%, 96.5% and 95.7% Accuracy
|
Classification of epileptic EEG recordings
|
2019
|
(Islam et al., 2021)
|
CNN
|
78.22% and 74.92% Accuracy
|
Emotion Recognition
|
2021
|
(Partovi et al., 2021)
|
CNN
|
95% Accuracy
|
Three grasp motion classes (cylindrical, spherical, and lumbrical) of one hand
|
2020
|
(Das et al., 2020)
|
CNN, MLP
|
34.46% Accuracy
|
Classification task of digit recognition
|
2020
|
(Y. Zhang et al., 2021)
|
CNN
|
70.15% Accuracy, 70.18 F1-Score and 77.07% Accuracy, and 75.48% F1-Score
|
Detection Attention Deficit and Hyperactivity Disorder (ADHD)
|
2021
|
(R. Li et al., 2020)
|
GCNN
|
70.6% Accuracy
|
Fatigue-related Situation Awareness recognition
|
2020
|
(Z. Li et al., 2022)
|
GAN
|
79.45%% and 76.3% Accuracy
|
Emotion recognition
|
2021
|
4.3 Statistics and analysis of the studies included
This section provides graphs and statistics from analyzing the selected papers and the included datasets. Figure 4 shows a bar chart distributing the 47 papers by year of publication. It verifies the trend of papers in deep learning, as mentioned above. In recent years, we see a more significant increase; from 2017 to 2020, the number of papers has been multiplied by 7. This numbers conclude that there is room for researching in this field, as more papers of EEGs and deep learning seems to be published in the following years.
Another relevant piece of information that can be obtained from this preliminary analysis is the type of DL model used. This knowledge is helpful for researchers to determine which are the most potent models for processing EEGs.
As can be seen in Figure 5, the most commonly used DL model, by a wide margin, is the CNN, which appears in 55.3% of the cases either as a 1-dimensional CNN (EEGs are processed channel by channel) or 2-dimensional CNN, (EEGs are processed as a whole). Then there is a set of papers that uses MLP, 10.6%, and Autoencoders, 6.38%. In a percentage of 4.26%, we find RNN, CNN plus RNN, GCNN, and GAN. Finally, CNN with LSTM, Autoencoder with RNN, and LSTM are 2.13% of the cases. RNN and LSTMs are part of the first group and are significantly related to the signal processing field. We also should highlight the use of four hybrid models: CNN and Autoencoder combined with LSTM. These numbers give us two ideas: first, using CNNs is successful but less innovative and, second, using hybrid models seems an opportunity to make new contributions in the field.
EEGs can solve several use cases. This information helps know which application fields are less exploited, so there is scope for further research. We have classified the datasets into 11 general categories:
- Motor imagery (MI) classification. This field aims to recognize a subject's intention, (Lu et al., 2016).
- Seizure management. EEGs of patients with epilepsy, a brain disorder that consists of abnormal cerebral activities.
- Classify sleep stages. Datasets collect the five possible stages a human can experiment with while sleeping.
- EEG-based alcoholism detection. Brain patterns detection between an alcoholic or a healthy person.
- Biometric identification. This application is related to a person's unique characteristics, such as fingerprints. In this case, the study considers brain signals.
- Recognize emotions. This task consists of classifying human emotional states as the domains of arousal and valence.
- Classify levels of fatigue. Mental fatigue happens when a subject has paid attention to a task for a long time. These datasets can measure different levels of fatigue, in some cases while driving.
- Disease diagnostic. In the medical field, we typically find datasets of epilepsy, but others can diagnose diseases such as Attention Deficit and Hyperactivity Disorder (ADHD).
Figure 6 uses a pie chart to describe this information. The most frequent use case is MI EEG classification, with more than 30% of the cases. This fact is related to the BCI competition IV[1], a famous data resource in the field comprising a set of datasets for signal processing and BCI classification. Then, we can highlight four use cases among the rest: sleep stage classification, emotion recognition, seizure management, and fatigue classification. The rest of the use cases only occur once or twice: disease diagnosis, alcoholism detection, and biometric identification. We can conclude with this analysis that if we want to publish a dataset that brings value to the field, the last three use cases are not exploited a lot.
Figure 7 combines the results of both previous analyses in a bubble diagram where the X-axis represents the deep learning model and the Y-axis the possible use cases. This information is interesting when a scientist needs to decide what DL models could be used depending on the use case they are working on. The bubble size and color depending on the number of instances. The biggest bubble representing MI EEG classification with CNN makes sense because both are the most popular in their category. Then, we have papers using CNN in stages of fatigue and sleep. The information of the chart can be used to identify what models can be used with our dataset. Also, to find combinations that have not been applied before to do new contributions to science.
The following is a brief description of the information represented in the previous papers' datasets.
- BCI competition IV 2a[2]: the imagination of movement of the left hand, right hand, both feet, and tongue.
- BCI competition IV 2b[3]: motor imagery of left hand and right hand.
- DEAP and video signals[4]: emotion recognition of low arousal and low valence (LALV), high arousal and low valence (HALV), low arousal and high valence (LAHV,) and high arousal and high valence (HAHV).
- EEG Motor Movement/Imagery Dataset[5]: the active task of closing the eyes, MI of both feet, fists, left fist, and right fist.
- Multichannel EEG sustained attention driving task[6]: fatigue and non-fatigued during driving.
- Temple University EEG Corpus[7]: a compilation of different neural diseases.
- CHB-MIT Scalp EEG Database[8]: seizure and nonseizure states in epileptic patients.
- MAHNOB-HCI[9]: a scale of valence and arousal.
- High Gamma Dataset[10]: MI of the left hand, right hand, and resting.
- Sleep EDF[11]: sleep stages after temazepam intake and after placebo intake.
- AMIGOS[12]: valence, arousal, dominance, familiarity and liking, and selected basic emotions.
- Motor Imagery dataset from Weibo et al. 2014[13]: simple MI (left hand, right hand, and feet) and compound MI (both hands, left hand combined with the right foot, right hand combined with the left foot).
- The DREAMS Databases[14]: sleep spindles expert scores.
- PhysioNet/CinC Challenge 2018[15]: wakefulness, stage 1, stage 2, stage 3, rapid eye movement (REM), and undefined.
- Open source SSVEP dataset[16]: healthy subjects focused on 40 characters flickering at different frequencies.
- BCI Competition IVa[17]: MI of the left hand, right hand, and right foot.
- NMED-H[18]: rated the pleasantness, musicality, order, and level of interest of the musical stimulus.
- EEG data for driver fatigue detection[19]: drivers suffering fatigue or not.
- Neonatal EEG recordings with seizures[20]: seizure or nonseizure states of epilepsy in neonates.
- University of Bonn[21]: seizure and nonseizure states.
- Motor Imagery dataset from Zhou et al. 2016[22]: MI of the left hand, right hand, and feet.
- Sleep Heart Health Study[23]: sleep scores.
- EEG datasets for motor imagery brain-computer interface[24]: data for non-task-related and task-related states.
- SEED[25]: report of emotional reactions.
- EEG Motor Movement/Imagery Dataset a subset[26]: MI of open and close left or right fist, opening and closing left or right fist, open and close both fists or both feet, and opening and closing both fists or both feet.
- DOD-O[27]: scored apnea patients.
- DOD-H[28]: scored sleep stages.
- International BCI Competition[29]: imagine three different grasps, cylindrical, spherical, and lumbrical.
- CAP sleep database[30]: activity during NREM sleep.
- ISRUC-Sleep[31]: healthy subjects, subjects with sleep disorders, and subjects under the effect of sleep medication.
- NIRS brain-computer interface[32]: MI experiments: left and right hand or mental arithmetic and resting state.
- Bern-Barcelona EEG database[33]: patients have pharmacoresistant focal-onset epilepsy.
- BD1[34]: imagine vowels.
- MrOS Sleep[35]: sleep study.
- Database-Imaged-Vowels-1[36]: pronounce the five main vowels "a", "e", "i", "o", and "u" and six Spanish words.
- MindBigData MUSE[37]: the subject is allowed to see a digit from 0 to 9.
All the information in the datasets has been collected in the following table. The most used datasets are part of the BCI Competition, which confirms what has been pointed out above. In the first position, we can find BCI Competition IV 2a (8 times) and 2b (5 times). The number of subjects used in the research ranges from 4 to 40,983 and is directly related to the model behavior. (Roy et al., 2019) show that models increase their performance when the number of subjects exceeds 15. In the case of the duration of the tests, we can find tests of seconds, minutes, or hours (usually, these are sleep studies or patients with epilepsy). The number of channels is also a critical decision depending on the use case. (Jasper, 1958) tells that a minimum of 21 channels should be used to examine an adult brain. Any analysis does not support this information for deep learning studies, so it could be a future work to be developed. Other features that have not been studied under a minimum standard to be met are the electrodes system and the sampling frequency.
Table 3. Summary of the selected datasets.
Dataset
|
Number of subjects
|
Total tests
|
Length per test
|
Electrodes' system
|
Nº channels
|
Sampling frequency
|
Format
|
Papers
|
BCI Competition IV 2a
|
9
|
2,591
|
5 minutes
|
10-20 system
|
22 channels
|
250 Hz
|
GDF
|
8
|
BCI Competition IV 2b
|
9
|
45
|
5 minutes
|
No system
|
3 channels
|
250 Hz
|
GDF
|
5
|
DEAP and video signals
|
32
|
32
|
40 minutes
|
10-20 system
|
45 channels
|
512 Hz
|
BDF
|
6
|
EEG Motor Movement Imagery Dataset
|
109
|
3,145,160
|
2 minutes
|
10-10 system
|
64 channels
|
160 Hz
|
EDF
|
4
|
Multi-channel EEG sustained attention driving task
|
27
|
62
|
90 minutes
|
10-20 system
|
32 channels
|
500 Hz
|
SET
|
3
|
Temple University EEG Corpus
|
16,986
|
10,874
|
20 minutes
|
10-20 system
|
31 channels
|
250 Hz (87%), 256 Hz (8.3%), 400 Hz (3.8%), and 512 Hz (1%).
|
EDF
|
4
|
CHB-MIT Scalp EEG Database
|
22
|
664
|
1 to 4 hours
|
10-20 system
|
23 channels
|
256 Hz
|
EDF
|
2
|
MAHNOB-HCI
|
30
|
120
|
30 seconds
|
10-20 system
|
32 channels
|
256 Hz
|
BDF
|
2
|
High Gamma Dataset
|
14
|
1,040
|
52 seconds
|
10-20 system
|
44 channels
|
512 Hz
|
MAT
|
2
|
Sleep EDF
|
78
|
197
|
9 hours
|
No system
|
2 channels
|
100 Hz
|
EDF
|
5
|
AMIGOS
|
77
|
77
|
Variable
|
10-20 system
|
17 channels
|
256 Hz
|
MAT
|
1
|
Weibo et al., 2014
|
10
|
320
|
8 seconds
|
10-20 system
|
60 channels
|
100 Hz
|
TXT
|
1
|
The DREAMS Databases
|
8
|
8
|
30 minutes
|
No system
|
Single-channel
|
200 Hz
|
EDF
|
1
|
PhysioNet/CinC Challenge 2018
|
1,985
|
1,985
|
7.7 hours average
|
10-20 system
|
6 channels
|
200 Hz
|
MAT
|
2
|
Open source SSVEP dataset
|
35
|
35
|
4 minutes
|
10-20 system
|
64 channels
|
1000 Hz
|
MAT
|
2
|
BCI Competition IVa
|
5
|
10
|
980 seconds
|
10-20 system
|
118 channels
|
1000 Hz
|
MAT
|
1
|
NMED-H
|
48
|
97
|
36 minutes
|
EGI 128-channel
|
125 channels
|
125 Hz
|
MAT
|
1
|
EEG data for driver fatigue detection
|
12
|
12
|
5 minutes
|
10-20 system
|
40 channels
|
1000 Hz
|
CNT
|
1
|
Neonatal EEG recordings with seizures
|
79
|
79
|
74 minutes average
|
10-20 system
|
19 channels
|
256 Hz
|
EDF
|
1
|
University of Bonn
|
5
|
500
|
23.6 seconds
|
10-20 system
|
Single-channel
|
173.61 Hz
|
TXT
|
2
|
Zhou et al., 2016
|
4
|
12
|
750 seconds
|
10-20 system
|
14 channels
|
250 Hz
|
CNT
|
1
|
Sleep Heart Health Study
|
3,295
|
2,651
|
About 8 hours
|
No system
|
2 channels
|
125 Hz
|
EDF
|
5
|
EEG datasets for motor imagery brain-computer interface
|
52
|
52
|
51 minutes
|
10-10 system
|
64 channels
|
512 Hz
|
MAT
|
2
|
EEG Database Data Set
|
122
|
122
|
420 seconds
|
No system
|
64 channels
|
512 Hz
|
RD
|
1
|
SEED
|
15
|
15
|
4,575 seconds
|
10-20 system
|
62 channels
|
200 Hz
|
MAT
|
2
|
EEG Motor Movement/Imagery Dataset a subset
|
109
|
>1,500
|
8 minutes
|
10-10 system
|
64 channels
|
160 Hz
|
EDF
|
1
|
DOD-O
|
55
|
55
|
387 minutes
|
No system
|
8 channels
|
250 Hz
|
H5
|
1
|
DOD-H
|
25
|
25
|
427 minutes
|
No system
|
8 channels
|
250 Hz
|
H5
|
1
|
2020 International BCI Competition
|
15
|
30
|
500 seconds
|
No system
|
60 channels
|
60 Hz
|
MAT
|
1
|
CAP Sleep Database
|
108
|
108
|
410 minutes
|
10-20 system
|
3 channels
|
From 128 to 512 Hz
|
EDF
|
1
|
ISRUC-Sleep
|
118
|
121
|
8 hours
|
10-20 system
|
7 channels
|
200 Hz
|
EDF
|
1
|
NIRS brain-computer interface (BCI)
|
26
|
26
|
62 seconds
|
10-5 system
|
30 channels
|
200 Hz
|
MAT
|
1
|
Bern-Barcelona EEG database
|
5
|
3,740
|
20 seconds
|
10-20 system
|
64 channels
|
512 or 1024 Hz
|
TXT
|
1
|
BD1
|
15
|
15
|
480 seconds
|
10-20 system
|
18 channels
|
1024 Hz
|
MAT
|
1
|
MrOS Sleep
|
1,026
|
586
|
341 minutes
|
10-20 system
|
5 channels
|
256 Hz
|
EDF
|
1
|
Database-Imaged-Vowels
|
15
|
15
|
110 seconds
|
10-20 system
|
18 channels
|
1024 Hz
|
MAT
|
1
|
MindBigData MUSE
|
40,983
|
40,983
|
8 seconds
|
10-20 system
|
4 channels
|
220 Hz
|
TXT
|
1
|
Figure 8 shows a pie chart with the distribution of datasets according to the system used to place the electrodes around the scalp. As can be seen, 10-20 and 10-10 systems are the most used, which makes sense due to the following aspects. They are international recommendations, (S. Yang & Deravi, 2017). (Association & others, 2013) highlights that they are also the most used ones.
The following pie chart (Figure 9) represents the distribution of studies according to the frequency used to collect the data. In the first position, we can find three different measures 200, 250, and 256 Hz. The percentage of datasets using 512 and 1000 Hz are also noteworthy. This measure is directly related to the machine used to collect the data. Regarding the minimum Hz to obtain good performances in DL models, (Wen et al., 2021) demonstrate that a higher frequency does not provide better results.
Finally, we have a pie chart that compiles the file format used. Figure 10 shows that European Data Format (EDF) and MAT are the most used. The first one is a standard for storing multichannel biological and physiological signals, (Kemp & Olivan, 2003). The other relates to EEGLAB[38] , a well-known Matlab tool for brain signal processing.