1.1 Data acquisition
A public dataset called ISRUC-Sleep [10] with AASM standard was used, including the sleep disorder groups (two subsets, namely subgroup-I and subgroup-II) and the health group (subgroup-III). The database was provided by the Sleep Medicine Centre of Coimbra. It can be downloaded freely from the web site “http://sleeptight.isr.uc.pt/ISRUC_Sleep/”. The data provide 126 PSG records and sleep stages labels from two experts. There are 19 channels of physiological data for most PSG records. However, record ‘8’ and record ‘40’were excluded from subgroup-I for the analysis. The former record does not provide EEG channels of F3 and F4, while the latter one suffers some electrode problems. As a result, totally 124 records were used in this research, including 98 records from subgroup-I, 16 records from subgroup-II and 10 records from subgroup-III.
Only six-channel electroencephalogram (EEG), two-channel electrooculogram (EOG) and one-channel electromyogram (EMG) were used in this paper. The sampling frequency is 200 Hz for each channel. All these channels in ISRUC-Sleep had been filtered to eliminate noise and undesired background by the dataset itself, aiming to enhance the PSG signal quality and increase the SNR. The filtering stage comprised: (1) a notch filter to eliminate the 50 Hz electrical noise; (2) a band-pass Butterworth filter with a lower cutoff frequency of 0.3 Hz and a higher cutoff frequency of 35 Hz for EEG and EOG channels, and a lower cutoff frequency of 10 Hz and higher cutoff frequency of 70 Hz for EMG channels.
According to ASSM rules, the sleep stages of each subject in the dataset were labeled by two experts individually. Therefore, small differences existed in annotations between two experts. If sleep scores from only one expert were used, a bias would produce from a rater’s style. As a result, only 30-s sequences with consist annotations from the two sleep diagrams were extracted for analysis in this paper.
Table 1
Records used in this paper from ISRUC-Sleep
group
|
Type of participants
|
Number of records
|
Number of subjects (gender)
|
age
|
subgroup-I
|
participants with sleep disorder
|
98*
|
98 subjects
(54 male, 44 female)
|
20–85,
Avg. = 50.7,
std. = 15.9 years
|
subgroup-II
|
participants with sleep disorder
|
16
|
8 subjects
(6 male, 2 female)
|
26–79,
Avg. = 46.9,
std. = 18.7 years
|
subgroup-III
|
Healthy participants
|
10
|
10 subjects
(9 male,1 female)
|
30–58,
Avg. = 39.6,
std. = 10.1 years
|
*note: record ‘8’ and record ‘40’ were excluded in the analysis. The former record does not provide EEG channels of F3 and F4, while the latter one has some electrode problems. |
1.2 Feature extraction
1.2.1 Features from single-channel EOG
Two EOG channels are unipolar, namely'LOC-A2' and 'ROC-A1'. FFT is applied to each EOG channel to get the power spectral density (PSD). The sum of energy in sub-bands delta (1–4 Hz), theta (4–8 Hz), alpha (8–13 Hz) and beta (13–30 Hz) in each 30-s period is defined as Edelta, Etheta, Ealpha and Ebeta, while the sum of Edelta, Etheta, Ealpha and Ebeta is defined as Esum. The entropy derived from Edelta, Etheta, Ealpha and Ebeta is defined as EEntropy. Similarly, the sum of the absolute value in these sub-bands in each 30-s period is defined as Sdelta, Stheta, Salpha and Sbeta, while the sum of them is defined as Ssum. The entropy derived from Sdelta, Stheta, Salpha and Sbeta is defined as SEntropy. Feature vector within four sub-bands is defined as
EogBand4_Ft16= [EEntropy Ebeta/Edelta Edelta/Esum Etheta/Esum Ealpha/Esum Ebeta/Esum SEntropy Sbeta/Sdelta Sdelta/Ssum Stheta/Ssum Salpha/Ssum Sbeta/Ssum Sdelta Stheta Salpha Sbeta] (1)
In the same way, for eleven sub-bands (0.4-4) Hz, (4–8) Hz, (8–10) Hz, (10–13) Hz, (13–18) Hz, (18–25) Hz, (25–30) Hz, (30–36) Hz, (36–41) Hz, (41–46) Hz and (46–50) Hz [11], there are 11 ratios of 2-Norm within each band to the sum of them, 11 ratios of 1-Norm within each band to the sum of them, and the energy themselves. Consequently, the feature vector with 33 features within eleven sub-bands is defined as EogBand11_Ft33. The number of features for single-channel EOG is 49. The feature vector is as follows,
OneLeadEog=[ EogBand4_Ft16 EogBand11_Ft33] (2)
while OneLeadEog represents LOC_LeadEog for lead 'LOC-A2' and ROC_LeadEog for lead 'ROC-A1'.
1.2.2 Correlation features between two-channel EOG
Temporal signals within sub-bands delta (1–4 Hz), theta (4–8 Hz), alpha (8–13 Hz) and beta (13–30 Hz) are derived from individual FIR band-pass filter from original EEG within the frequency band 1–4 Hz, 4–8 Hz, 8–13 Hz and 13–30 Hz, respectively. The correlation coefficients [12] between two-channel EOG in four sub-bands during each 30-s period are defined as rdelta, rtheta, ralpha and rbeta, respectively. The correlation coefficient between two-channel EOG with the original waveform during each 30-s period is defined as rorg. In the same way, phase-locking value (PLV) is obtained, including PLVbeta, PLValpha, PLVtheta, PLVdelt and PLVorg.
The number of features between two-channel EOG is 10, and the feature vector is as follows,
EogBtwn= [rbeta ralpha rtheta rdelt rorg PLVbeta PLValpha PLVtheta PLVdelt PLVorg ] (3)
The number of features for one-channel EOG is 49, and the number of features between two-channel EOG is 10. Therefore, the total number of features for two-channel EOG 'LOC-A2'and'ROC-A1' is 49×2 + 10 = 108, and the whole vector of EOG is defined as,
EogFeat=[ LOC_LeadEog ROC_LeadEog EogBtwn] (4)
1.2.3 Features from single-channel EMG
The fractal dimension of EMG is defined as EmgFD, and the root mean square is defined as EmgStd in every 30 seconds.
The EMG signals of every 30-s period are transformed by Hilbert to obtain the enveloping signal. After that, the enveloping mean is defined as EnvlpMean, the enveloping maximum is defined as EnvlpMax, the enveloping root mean square is defined as EnvlpStd, and the ratio of EnvlpMax to EnvlpMean is defined as RtMaxdMean. The total number of features for single-channel EMG is 6, and the whole vector of EMG is defined as,
EmgFeat=[EmgFD EmgStd RtMaxdMean EnvlpMean EnvlpMax EnvlpStd] (5)
1.2.4 Features from six-channel EEG
This method is compared with AASM standard in this paper. Therefore, EEG features are calculated. Six-channel EEG can be divided into three groups, including {F3, F4}, {C3, C4} and {O1, O2}. For each group, a total number of 108 features can be obtained in the same way as formula (4), which is defined as F34Feat, C34Feat and O12Feat, respectively. Consequently, there are 108×3 = 324 features for all six-channel EEG.
1.2.5 Whole feature vector
For classification from EOG + EMG, the whole feature vector with 114 features from two-channel EOG (108 features) and one single-channel EMG (6 features) is defined as follows,
Feat1=[ EogFeat EmgFeat ] (6)
For comparison, the whole feature vector with 438 features from two-channel EOG (108 features), one single-channel EMG (6 features) and six-channel EEG (324 features) is defined as follows,
Feat2=[ EogFeat EmgFeat F34Feat C34Feat O12Feat ] (7)
1.3 Characteristic normalization
Physiological signals often have significant individual characteristics. For example, although the lowest EMG amplitudes in most subjects occurred during deep or REM sleep, a few subjects tended to be different, and they have the highest EMG amplitudes during wakefulness.
One normal sleep in adults may last 8 hours. During such a long period, the recording conditions variation such as skin humidity, body temperature, body movements or even worse as electrode contact loss. Besides, the discriminant information for the considered sleep stage classification lies in relative amplitudes rather than the absolute amplitudes.
If the maximum and the minimum values in the feature sequence are taken as the reference for feature normalization, it may cause an error; because both the maximum and the minimum values may be noise points. For example, most values in a feature sequence are near 1, but one noise point is 100 and the other noise point is -10. If the normalized scale is according to the maximum and the minimum values, i.e., 100 + 10 = 110, then most of the values in the normalized feature series are clustered around 0.01. Only the former noise point is 1 and the latter noise point is 0, which is obviously not the expected result of normalization.
A new ‘quasi-normalization’ method is designed in this paper. First, the original feature sequences {a(n)} are arranged in order from small to large, which is defined as {f(n)}. Set the series number of {f(n)} at the position of 10% length from the beginning as n1, the series number of {f(n)} at the position of 50% length from the beginning as n2, and the series number of {f(n)} at the position of 90% length from the beginning as n3.
The standard deviation of the sequences {f(n1:n3)} is defined as Sd.
Sd = std( f(n1:n3) ) (8)
Ku = f(n3)- f(n2) (9)
Kd = f(n2)-f(n1) (10)
s = 2*min([Sd Ku Kd]) (11)
b(n)=(a(n)- f(n2))/s (12)
Then using formula (12) for ‘quasi-normalization’, most elements in {b(n)} are transformed into the interval [-1, 1], but a few elements are out of that range. In order to make all elements locate into the interval [-2, 2], the following transform is applied,
(13)
Finally, the feature sequences {c(n)} are used for classification.
Figure 1 is an example of the quasi-normalization for EmgStd of EMG. For original index EmgStd as Fig. 1b, most elements are lower than 2, but none is lower than 0. After using formula (12) for ‘quasi-normalization’, most elements in {b(n)} are transformed into the interval [-1, 1], as Fig. 1c, but some elements are still higher than 2. After using formula (13) for data truncation, elements that higher than 2 are reset as 2.
Figure 2 quasi-normalization for PLVorg of EOG (a) Manual scoring from the first expert as blue line, and manual scoring from the second expert as red line; (b) original index PLVorg of EOG, different stages with different colors; (c) sequences {b(n)}; (d) sequences {c(n)}
1.4 Classification model selection
Random Forest (RF) [13] has some wonderful advantages, including strong generalization ability, strong anti-over-fitting ability, rapid model training, simple structure and easy constructing, which is suitable for processing high-dimensional data sets without feature selection.
1.5 Comparison of classification results
Leave-one-record-out (LOOCV) strategy was applied to the mixed group (10 healthy recordings and 114 sleep disorder recordings). The training dataset contained 123 records while the rest one record was used as the validation set. This step repeated 124 times until each record had been tested. The whole 124 times’ testing formed the final results.
Furthermore, the results were compared that derived from each signal type among EEG, EOG and EMG. Evaluation indices are employed, including accuracy, the multi-class weighted F1 score [14] and Cohen’s kappa coefficient.