A Novel Channel Selection Approach for Human Neonate’s Pain EEG Data Analysis

doi:10.21203/rs.3.rs-2390234/v1

Download PDF

Research Article

A Novel Channel Selection Approach for Human Neonate’s Pain EEG Data Analysis

https://doi.org/10.21203/rs.3.rs-2390234/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Quantitative measurement of pain using the Electroencephalogram (EEG) signals has received much attention, recently. Pain EEG data processing is associated with complexity and high computational cost. This study aims to propose a new method for selecting efficient EEG channels to determine the area of the scalp that contains the most information about brain activity during acute pain in neonates. Also, selecting relevant channels in pain assessment reduces computational costs. In this study, a new channel selection approach is proposed, which is a combination of filter and wrapper methods. A new pseudo-Sequential Forward Feature Selection (pseudo-SFFS) method is presented to reduce the computational complexity of wrapper methods. We preprocessed data by applying a bandpass filter. We used wavelet transform to extract features. After extracting the features, we applied two feature selection steps. In the first step, we applied the T-test to the extracted features. In the second step, we selected the effective channels based on the output of the applied pseudo-SFFS algorithm into Support Vector Machine (SVM), Decision Tree (DT), and Gaussian Naive Bayesian (GNB) classifiers. Using the proposed method two channels of the sensorimotor cortex including Cz and C4 channels have been selected from 18 EEG channels for pain stimulation through the left heel of neonates. Also, the results show that most of the acute pain information of neonates is related to the delta and theta frequency bands.

Channel selection

EEG

Feature extraction

Feature selection

Pain

Classification

Nowadays, pain assessment using EEG data becomes important, due to its application in pain management for patients who are unable to express their pain verbally [1–2]. Pain assessment for infants, as one of the groups who are unable to express their pain verbally, is done through several measurement tools, including CRIES (Cry, Requires O2 for oxygen saturation (SpO2) above 95, Increased vital signs and BP, Expression, and Sleeplessness index), FLACC (Face, Legs, Activity, Cry, and Consolability), and PIPP (Premature Infant Pain Profile). Pain measurement using these tools is performed by examining changes in contextual, physiological, and behavioral indicators. [3] These tools are one-dimensional measures that only measure the pain intensity and do not provide accurate information about different aspects of pain for the physician or the patient. Since these methods require human intervention, there is a possibility of human error in these methods. Also, these methods are not applicable for anesthetized or semi-anesthetized infants after surgery. [4]

Various studies have been performed about pain EEG data processing [2, 5–6]. In these studies, the processing procedures have been performed on data recorded from a set of EEG channels. These processes have a high computational cost both in terms of data size and in the complexity of the processing steps. The channel reduction procedure can reduce power consumption and computational time by reducing data size. In addition, due to the interest of the medical industry and community in portable monitoring systems for patients, channel selection leads to the implementation of low-cost portable headsets for pain monitoring.

The selection of EEG channels is challenging due to the interconnection between the electrodes [7]. The purposes of channel selection are dimensionality reduction, faster processing, performance improvement of the classification model, identification of the area of the brain that is most active in an intended event or task, and reduction of electrodes setup times. The performance of channel selection methods is determined according to the classification accuracy using the selected channels. There are two main methods for feature/channel selection: 1) the wrapper and 2) the filter methods. In the filter method, the selection of features is independent of the classifier. Filter methods are computationally faster but are less accurate and do not consider the combination of channels information. In the wrapper method, features are selected based on the applied classifier. The wrapper method may have better performance, but they require more computational cost, and they are prone to overfitting. Therefore, a combination of the two feature selection methods has been used recently. [8–9]

In this study, we evaluated the neonates' acute pain EEG data. To determine the active areas of the brain in the electrodes domain, we propose a novel channel selection approach. This approach is a combination of filter and wrapper methods. The proposed method has the accuracy of the wrapper method and, at the same time, reduces the computational complexity significantly, by thresholding for accuracy, sensitivity, and specificity. In wrapper methods, feature selection depends on the classifier approach in classification. To reduce the dependency on classifiers, the combination of several classifiers' outputs has been used to determine the effective EEG channels. In the proposed method, compared to the wrapper methods, the possibility of overfitting is reduced due to the use of several classifiers. The output of this approach identifies the EEG channels that have the most significant effect in distinguishing the no-pain and pain classes in neonates. We compared the result of the proposed method with a primary EEG channel evaluation method. The paper is organized as follows: In the next section, some studies on channel selection are examined. Section three deals with the EEG data used in this study. In section four, the data processing procedure for the proposed channel selection approach is described. The results of this work and a discussion on the results are presented in the fifth section. The conclusion is provided in the last section.

Various studies have been performed on EEG channel selection in a variety of applications, including brain-computer interface (BCI) [10–11], emotion detection [7], epilepsy diagnosis [12–13], diagnosis of the alcohol effects on the brain [14], and so on.

In [10], Sun et al. introduced a channel evaluation parameter called Position Priori Weight Permutation (PPWPE), intending to improve motor imagery (MI)-based BCI system. Introduced PPWPE includes amplitude information and channel position information. In [10], half of the channels with the large PPWPE value are selected from all the channels. Then Binary Gravitational Search Algorithm (BGSA) was used to determine the combination of optimal channels. Common Spatial Pattern (CSP) has been used to extract features from selected channels. The results of this study show that the proposed method provides better results than the classification using the extracted features by CSP from all channels data. In comparison between classification with the data of all channels, and classification with the data of the selected channels, the classification accuracy increased from 57.5–88% and from 79.4–91.1% for two data sets. The classification was performed by SVM.

In [15], Lan et al. proposed a channel selection method based on mutual information in which up to 7 channels out of 30 have been selected for mental tasks. The PSD of five frequency bands, including theta, alpha, low beta, high beta, and gamma, are extracted as features from all channels. A forward incremental strategy has been used to rank the channels. At first, the channel whose features have the maximum mutual information with the class labels is selected, and rank 1 is assigned to it. Rank 2 is assigned to a channel that has maximum mutual information with the class labels when used with the previously selected channel. This process is repeated for all channels. This method is independent of the classifier. Therefore, it can be considered a filter method. In this method, like wrapper methods, the interaction between channels is considered, but the accuracy is not as good as wrapper methods.

In [11] Li et al. presented a channel selection method based on CSP and SVM algorithms for a BCI system. The channels were ranked using the l1-CSP, and the optimal combination of channels was selected by comparison of classification accuracy. The classification was performed by SVM. The disadvantage of the proposed method is the effect of noise on the structure of CSP. Consequently, this method needs optimal environmental conditions to record EEG data.

In [7], Wosiak et al. used a multistep method to select the frequency band-channel combination that effectively recognizes emotions using EEG data. For this purpose, the data of each channel was divided into four frequency bands: theta, alpha, beta, and gamma. The reversed correlation-based algorithm (RCA) was used to select the channels that have the least correlation with each other. Therefore, this method may be help reduce the redundancy of EEG data. The proposed method is an iterative loop that starts by selecting the first band-channel with the lowest average correlation with the other band-channels. The frequency band-channel with the lowest correlation with the first frequency band-channel is selected as the second frequency band-channel, and this process is repeated. This method can be considered a filter method. However, the interaction between the channels is taken into account.

In [16], Dubost et al. evaluated the data of each channel separately using two classifiers, including Gaussian Naïve Bayes (GNB) and Quadratic Discriminant Analysis (QDA), to find the scalp area that provides more information about brain activity under general anesthesia. Only a single channel has been selected as an active electrode. Due to the involvement of classifiers in channel selection, this method can be considered a wrapper method. The volume conductor effect [14] and the interaction between channels may cause errors in determining active electrodes in this method.

A method similar to [16] was used in [2] to determine the active areas of the brain during pain. In [2], all electrodes have been selected as active electrodes. The reason has been attributed to the volume conductor effect as well as the activity of different areas of the brain during pain.

In this study, we used some of the EEG data collected in the Elizabeth Garrett Anderson Obstetric Wing, University College London Hospital (UCLH). [18] The data include a four-second epoch (two seconds before and two seconds after stimulation) recorded following the painful stimulation (Fig. 1).

The stimulations were performed using a lancet. Tissue-breaking procedures by lancet caused acute pain in neonates. The PIPP score is also available for some of these data. We considered the data that PIPP scores are reported for that. The PIPP is a measure of pain for neonates and consists of 7 indicators, including gestational age, behavioral state, heart rate, oxygen saturation (SpO2), brow bulge, eye squeeze, and nasolabial furrow. Each of the indicators is scored with a number between 0 and 3. The sum of the scores is considered the PIPP score. Therefore, the maximum pain score in this evaluation method is 21. [3] In PIPP, a score between 0 to 6 indicates minimal / no-pain, 7 to 12 indicates slight/ moderate pain, and a score more than 12 indicates severe pain. Because the PIPP score between 0 and 6 is very low and indicates minimal or no-pain, we did not include the data of neonates with PIPP scores between 0 and 6 in this study. The activities of the brain hemispheres in pain stimulation through the right side of the body differ from pain stimulation through the left side of the body. [19] So, to ensure more accurate results in indicating the active area of the brain during pain, the data related to neonates who are stimulated through the left heel and neonates who are stimulated through the right heel were separated from each other. For the PIPP scores reported in [18], data related to left heel stimulations are higher. Therefore, we included the data related to stimulation through the left heel in this study. Overall, the data of the neonates whose PIPP scores are more than 6 (moderate pain and severe pain) and whose pain stimulations have been performed through the left heel have been included in this study. Considering the mentioned conditions, we included the data of 18 neonates in this study. The PIPP scores for these neonates are between 7–17, with a mean of 10.9 and a median of 11.

The EEG data recording was performed through 20 electrodes O1, O2, T7, T8, F7, F3, F4, FCz, F8, P7, P8, TP9, TP10, POz, C3, Cz, C4, CP3, CPz CP4, which were placed on the head according to a modified international 10–10 electrode placement system (Fig. 2). The reference electrode was positioned in Fz, and the ground electrode was positioned in FC1 or FC2 (depending on the position of the neonate). The signal sampling rate is 2000 Hz. The data of FCz and POz are not available for all neonates, so the data of these channels were not included in this study. Therefore, we studied the data of 18 channels.

At first, for the initial evaluation of the separation of no-pain and pain classes, the data is processed in MATLAB R2016b through the steps shown in Fig. 3.

4.1 Preprocessing

EEG signals are distorted by various noises and artifacts, including line noise (50Hz), EOG, and EMG artifacts. To select the data related to EEG frequency bands, we applied a third-order bandpass Butterworth filter in the frequency range of 0.5-70Hz to the data. Using this filter, most EMG artifacts located in the frequency range of 10-2500 Hz are also eliminated [3]. A bandstop Butterworth filter (48-52Hz) is used to remove the 50 Hz power-line contamination. The data is downsampled to 120 Hz to prepare data for feature extraction using the wavelet.

4.2 Feature extraction

Many biological signals, including EEG signals, are non-stationary, and their properties change over time and frequency. Therefore, time-frequency analyses are very useful in extracting the features of these signals. Wavelet transform is one of the most practical time-frequency methods to extract features from EEG signals [20–22]. In this study, we applied Discrete Wavelet Transform (DWT) to EEG signals. [23] Determining the number of decomposition levels and selecting the mother wavelet are critical parameters of DWT. In each level of decomposition, the DWT applies a high-pass and a low-pass filter to the signal, thereby signal is decomposed into different frequency bands. The outputs of low-pass and high-pass filters provide approximate (A) and accurate (D) signal information, respectively. In this study, the EEG signals have been decomposed into five levels using the db6 mother wavelet. Since the data were downsampled to 120Hz in the preprocessing step, by decomposing the signal in five levels, coefficients of each level of decomposition are driven to the wavelet coefficients of EEG frequency bands. The classification was done by different mother wavelets, and considering the acceptable performance of most classifiers, the db6 was selected as the mother wavelet. The coefficients of decomposition levels have been considered as features of EEG signals. Figure 4 shows the decomposition levels using the DWT. The HP and LP are high-pass and low-pass filters applied to the EEG signals, respectively. The coefficients of A5, D5, D4, D3, D2, and D1represent the Delta (0.5-4Hz), Theta (4-8Hz), Alpha (8-12Hz), Beta (12-30Hz), low Gamma (30-60Hz), and high Gamma (60-120Hz) frequency bands coefficients. 293 features related to the wavelet coefficients of six frequency bands have been extracted from the data of each channel. So, 293x18 features have been extracted from the data of each subject, where 18 represents the number of EEG channels.

4.3 Feature selection

Feature selection is a process to find the optimal feature set with reduced dimensions from a feature set with large dimensions. So, the feature selection eliminates irrelevant data and maintains valuable data in classification. Selecting the feature improves the performance of the classification. [15]

In this study, we used the T-test as the first feature selection step. Among the feature selection methods, the T-test is one of the filter methods which selects features regardless of the classifier. The filter methods use global statistical information to select the features. In the T-test, the normal distribution of data samples is assumed. This method is also efficient in cases where the data distribution is quasi-normal distribution. Before using the T-test, we checked the distribution of each feature using a histogram chart. Here, the two-sample T-test is used to measure the capability of a feature in distinguishing no-pain and pain classes. Two-sample T-test or Student's T-test is a parametric test that examines the significant difference between the mean of two data sets. [24–25] The value of α in the T-test is set to 0.05 (α = 0.05). It should be noted that the features rejected by the T-test do not affect data classification, however, the features selected by the T-test are unreliable. It means that the selected features by the T-test may be useful in classification or not. Among the 18*293 features extracted in the previous step, 209 features were selected by the T-test.

4.4 Classification

Different types of classifiers have been used to classify pain-related EEG data [2, 4–6, 26–27]. In this study, we investigated the performance of the SVM, the NB, the Discriminant Analysis (DA), including Linear Discriminant Analysis (LDA) and Diagonal-QDA, the K-Nearest Neighbor (KNN), and the Decision Tree (DT) in pain EEG signal classification.

4.4.1 SVM

The SVM is a supervised learning algorithm that has been widely used to classify EEG data. An SVM searches for a hyperplane to separate the classes. The hyperplane maximizes the margin (space that does not include observations) between two classes [27–28]. Computationally, finding the best location for a hyperplane is an optimization problem that uses a kernel function to create boundaries to separate classes [28–30]. In this study, the SVM with linear kernel has been used.

4.4.2 NB

The NB classifies data based on probabilities, features independence assumption, and Bayes' theorem. In NB, the presence or absence of a feature does not influence the presence or absence of any other feature. Since NB assumes the independence of features, the addition of redundant ones that are not independent of each other affects the learning process of this classifier, negatively. In this classifier, the input data is assigned to the class that has the most similarity with the statistical distribution of that class data. [26, 31–32] Here, the normal distribution of data has been considered, and GNB is used to classify the data.

4.4.3 DA

The DA is a classifier that assumes normal distribution for the data of each class. To determine the boundary between different classes, DA determines the normal distribution parameters for each class and projects the data to a new space, in which in the new space the within-class scatter is minimized, and the scatter between classes is maximized. Here, the performance of two types of DA classifiers, including LDA and Diagonal-QDA, has been investigated in no-pain and pain EEG signal classification. LDA and QDA have linear and quadratic decision boundaries, respectively. LDA can only learn linear boundaries, whereas QDA can learn quadratic boundaries and therefore is more flexible. For LDA, the models of classes have the same covariance matrix, and only the means of classes are different. In QDA, the mean and the covariance of the data of each class are different. Diagonal-QDA is similar to QDA but estimates the diagonal covariance matrix. [33–36]

4.4.4 KNN

The KNN is a local classifier suitable for multimodal distributed data. The KNN method is a simple statistical classification method, and its main strategy is to identify the number of k nearest samples to the unknown input samples. Class determination of the input data is done according to most k samples. Here the number of neighbors is chosen to be k = 9. [5]

4.4.5 DT

The DT is a supervised non-parametric learning method for classification. This classifier does not make any prior assumptions about the data distribution. In DT, the model is created using simple decision rules inferred from features. Creating a model in a DT is based on dividing a complex problem into several simple sub-problems, and this process is repeated recursively. DT consists of a root node, decision nodes, and a set of leaf nodes. Figure 5 shows an example of a DT.

Here, Gini's Diversity Index is used to select the features of the root node and decision nodes. The Gini index is one of the criteria used to build decision trees. This index can be defined as a measure of feature purity. The Gini index is a number between 0 to 0.5. A feature with a lower Gini index is preferable to one with a higher Gini index. The Gini index of data set D is calculated from Eq. (1):

$$Gini(D)=1 - \sum\limits_{y} {{p^2}(y)}$$

In Eq. (1), y is the number of classes, and p(i) is the observed fraction of classes with class i that reach the node. The minimum value of the Gini index occurs when the node consists of one class only, and the maximum value occurs when all classes in the data set have equal probability. [37–39]

To reduce the risk of overfitting in the DT, a general strategy is to use pruning to remove some features of the training set or branches of the tree caused by noise. Pruning is the process of reducing a tree by converting some branch nodes into leaf nodes and removing the leaf nodes below the main branch. Once the validation set is available, the tree can be pruned according to the validation error. In this study we used post-pruning. After building the DT, if removing a branch reduces the validation error, the branch will be removed. This method is called post-pruning. [39–40]

4.5 Performance Evaluation

We calculated the parameters of classification performance, including accuracy, sensitivity, and specificity, using the confusion matrix. About 80% of the data were considered for training and the rest for testing. We used the k-fold cross-validation procedure to divide the data into training and testing sets. The number of folds is K = 6, and the training and testing procedure is repeated six times. We calculated the final performance evaluating parameters by averaging six confusion matrices from 6 repetitions. The results of no-pain and pain classification using the six classifiers are shown in Table 1.

Table 1

The results of no-pain and pain EEG signal classification using six classifiers
	Accuracy	sensitivity	specificity
SVM	97.22	100	94.44
LDA	94.44	100	88.89
QDA	97.22	94.44	100
NB	97.22	94.44	100
DT	83.33	83.33	83.33
KNN	69.44	100	38.89

The results of the classification show that classifiers have acceptable performance in separating the no pain and pain EEG signals, except KNN. This means, the approach that KNN considers in classification is not suitable for separating these features, and considering the closest neighbor samples to the test data is not a suitable method for determining the class of input data. Since the three classifiers, including SVM, NB, and DT, apply entirely different approaches to data classification, we selected these three classifiers as the final classifiers in the proposed channel selection algorithm to identify the active area of the brain during pain.

4.6 Proposed Channel selection approach

The proposed method of this study to select active electrodes in the separation of no-pain and pain classes of EEG signals is voting from the output of the pseudo-SFFS algorithm applied to classifiers that have completely different methods of classification. For the data of this study, the SVM, DT, and GNB classifiers were selected. Figure 6 shows the process flow of the proposed EEG channel selection algorithm in neonates’ pain EEG data.

Classifiers use different procedures to classification. It may be possible to separate features linearly, but not with probabilities and classifiers such as NB (and vice versa). Since the goal is to select the brain's active area, it is necessary to check the separability of the features from different aspects. Therefore, classifiers that take entirely different procedures to classification are chosen for the proposed algorithm. SVM with a linear kernel and LDA search a linear separation space of the feature. Furthermore, the pseudo-SFFS feature selection algorithm shows a significant overlap between the selected features by these two classifiers. Since the SVM accuracy is higher (according to Table 1), this classifier was chosen among SVM and LDA. DT has a completely different procedure compared to the other mentioned classifiers, and at the same time, the accuracy obtained using this classifier is acceptable. So, DT is another chosen classifier. NB and QDA classifiers both resulted in similar accuracy (according to Table 1). Based on the pseudo-SFFS algorithm, all features selected by QDA have also been selected by GNB. So, GNB has been chosen as the third classifier for the proposed channel selection approach.

The reason for using the T-test before the subsequent pseudo-SFFS algorithm is to reduce the iteration of the pseudo-SFFS loop and thus reduce computational costs. Since the feature selection by pseudo-SFFS is based on classifiers, to reduce the classifier effect, the features will be considered as the final effective features that are selected by at least two classifiers.

Pseudo-SFFS algorithm

SFFS is a feature selection method to determine a set of features with lower dimensions that these features are uncorrelated and these features are most relevant to the problem. [41] The SFFS feature selection method is an iterative greedy search algorithm. The fitness function of this algorithm is determined using the classifier performance. So, this method is one of the wrapper methods. The pseudocode of this method is described below. [15, 42–43]

Start with an empty set of features: ${Y_0}=\{ \phi \}$

Select the best feature: ${x^+}=\,\,\arg {\hbox{max} _{{x^+}}}_{{ \notin {Y_k}}}[J({Y_k}+{x^+})]$

If $J\left( {{Y_k}+{X^+}} \right)>J\left( {{Y_k}} \right):{Y_{k+1}}={Y_k}+{x^+}$ and $k=k+1$

Go to step (2)

The flowchart of the pseudo-SFFS method that we introduce in this study is shown in Fig. 7. In this method, to reduce the computational cost, the following are considered:

In SFFS, in each iteration, one feature is added to the feature set, and the accuracy of the classifier obtained by the new feature set is compared with the previous one. If the classifier's performance is improved, the feature remains in the feature set, otherwise, the feature is removed. Unlike SFFS, in the proposed pseudo-SFFS algorithm, the accuracy obtained in each iteration is compared with the constant value. It means, the value of the maximum accuracy is thresholded. The value of the thresholded accuracy (acc_th) for the end of the iteration has been selected according to the classifier's performance without applying the second feature selection step. According to Table 1, the threshold value for SVM and GNB is 97.22%, and for DT is 83.33%. Due to the greedy and iterative search approach, SFFS is an algorithm with a high computational cost for feature selection. By thresholding the accuracy, the number of iterations is reduced. So the computational cost is reduced.

In the feature selection process, there may exist feature sets with equal accuracy but different sensitivity and specificity. So, we applied thresholding on sensitivity and specificity, too. The threshold value for sensitivity and specificity are equal. Therefore, the feature set that has almost the same performance in detecting no-pain and pain classes, has been selected. In this way, the probability of correct detection of pain class in cases where the amount of pain perception is low increases. As shown in Fig. 7, the range of sensitivity and specificity is determined according to the value of the Rs. Rs is calculated according to the data in the Table 1 and from Eq. (2).

$$Rs=ac{c_{th}} - sen=ac{c_{th}} - spec=\frac{{\left| {sen - spec} \right|}}{2}$$

In Eq. (2), acc_th, sen, and spec are the accuracy, sensitivity, and specificity of the classifiers, respectively, whose values are taken from the Table 1. Rs is 2.78 for SVM and NB and 0 for DT.

In the process of adding a feature to the feature set, we may have more than one feature set with the highest accuracy and also the same sensitivity, and specificity. In this case, the feature set related to an added feature is selected that an added feature contains more information lonely, and the classifier's performance is higher using that feature.

By adding the second feature selection step, the classifier's performance is expected to enhance. Since the purpose is to find the active areas of the brain during pain, with the aim of reducing the computational cost, enhancing the performance of the classifiers has been neglected.

4.7 Channels evaluation

For more investigation, we have performed a channel evaluation approach similar to [2] and [16]. Each channel data has been evaluated in the procedure shown in Fig. 3, and this process was carried out for each channel, separately. Therefore, the process was repeated 18 times (for 18 channels). Channels evaluation results are shown in Table 2.

Table 2

Performance evaluation of each channel in no-pain and pain EEG signal separation. Acc, Sen, and Spec are for accuracy, sensitivity, and specificity, respectively.
classifier	SVM			NB			DT			LDA			QDA			Average accuracy obtained from five classifier
channel	Acc	Sen	Spec	Acc	Sen	Spec	Acc	Sen	Spec	Acc	Sen	Spec	Acc	Sen	Spec	Average accuracy obtained from five classifier
Cz	80.56	83.33	77.78	91.67	83.33	100	86.11	83.33	88.89	75	83.33	66.67	91.67	83.33	100	85.00
Cpz	75	88.89	61.11	86.11	94.44	66.67	75	72.22	77.78	75	83.33	66.67	86.11	94.44	77.78	79.44
F8	75	83.33	66.67	80.56	83.33	77.78	58.33	66.67	50	66.67	83.33	50	80.56	83.33	77.78	72.22
T8	83.33	88.89	77.78	86.11	83.33	88.89	61.11	66.67	55.56	83.33	94.44	72.22	86.11	83.33	88.89	80.00
TP10	83.33	88.89	77.78	72.22	77.78	66.67	66.67	72.22	61.11	75	88.89	61.11	72.22	77.78	66.67	73.89
P8	88.89	94.44	83.33	80.56	77.78	83.33	55.56	61.11	50	72.22	88.89	55.56	80.56	77.78	83.33	75.56
O2	86.11	88.89	83.33	88.89	83.33	94.44	58.33	66.67	50	86.11	88.89	83.33	88.89	83.33	94.44	81.67
F4	83.33	88.89	77.78	69.44	83.33	55.56	66.67	66.67	66.67	69.44	77.78	61.11	69.44	83.33	55.56	71.66
C4	75	88.89	61.11	77.78	88.89	66.67	66.67	66.67	66.67	75	83.33	66.67	77.78	88.89	66.67	74.45
Cp4	Removed by T-test
F7	86.11	100	72.22	77.78	72.22	83.33	66.67	66.67	66.67	77.78	88.89	66.67	77.78	72.22	83.33	77.22
T7	83.33	94.44	72.22	75	72.22	77.78	58.33	55.56	61.11	69.44	83.33	55.56	75	72.22	77.78	72.22
TP9	63.89	83.33	44.44	69.44	55.56	83.33	61.11	94.44	27.78	75	77.78	72.22	69.44	55.56	83.33	67.78
P7	86.11	94.44	77.78	75	72.22	77.78	63.89	77.78	50	72.22	77.78	66.67	75	72.22	77.78	74.44
O1	83.33	83.33	83.33	91.67	94.44	88.89	66.67	66.67	66.67	83.33	77.78	88.89	91.67	94.44	88.89	83.33
F3	77.78	88.89	66.67	72.22	55.56	88.89	69.44	83.33	55.56	80.56	88.89	72.22	72.22	55.56	88.89	74.44
C3	86.11	100	72.22	86.11	88.89	83.33	66.67	72.22	61.11	61.11	61.11	61.11	86.11	88.89	83.33	77.22
Cp3	Removed by T-test

The selected channels using the proposed channel selection approach, which includes Cz and C4, are shown in Fig. 8. Wavelet coefficients of CP3 and CP4 channels were removed by the T-test, so their classification results are not reported in Table 2. Due to pain stimulation from the left heel, the selection of the right lobe's electrode and the middle electrode of the sensorimotor cortex is physiologically reasonable. The results of the feature selection by the proposed algorithm show that the wavelet coefficients of the theta frequency band from the Cz, as well as the wavelet coefficients of the delta frequency band from the C4, have improved the performance of at least two classifiers. Also, the results show that no coefficient has been selected from the alpha and gamma frequency bands. As shown in Table 2, according to the average accuracies obtained from five classifiers (last column of Table 2), the Cz also has the best performance in separating no-pain and pain classes in evaluating each channel data, separately. The C4 electrode has acceptable but not higher performance than other channels. This shows that the information of the C4 completes the information of Cz, and C4 does not contain the most information, lonely. Also, as shown in Table 2, the channels not selected by the algorithm have a high accuracy in separating the non-pain and pain classes. This can be attributed to the volume conductor effect. The volume conductor effect explains the fact that electrical potentials generated in a specific small area of the brain propagate spatially in different areas of the brain. [17] So, some channels may contain the information of a single brain source. The proposed method is based on the SFFS algorithm, so among the channels that contain repetitive information, the channels with the most and strongest information have been selected.

The results of comparing the selected frequency bands and active areas of the brain during pain by the proposed method of this study with similar studies are reported in Table 3. As can be seen from the results, in most cases, the selected active areas are related to the sensorimotor area, which is consistent with the results of this study. Regarding the selected frequency bands, the difference in some cases can be attributed to the difference in the age group and the type of pain stimulation.

Using the proposed pseudo-SFFS algorithm, DT is stopped after three iterations, GNB after seven iterations, and SVM after four iterations. According to the number of input features to the pseudo-SFFS algorithm, which is 209, it can be seen that compare to the SFFS method, the number of loop iteration has decreased significantly. This decrease in the number of loops reduces the computational cost significantly.

Table 3

The comparison of the result of this study in determining the active area of the brain and determining the active frequency band during pain with similar studies
Related study	Pain type	Age	Active frequency band during pain	Active area of brain during pain
[6]	Radicular pain	> 25	Beta and gamma	C3 and Cz
[44]	Pain stimulation of an amputee’s phantom limb	-	-	Cz for noxious stimulation. C6, C4, Cz, CP6, and FT8 for moderately intense stimulation
[5]	Cold presser test	20–28	More activity in delta and alpha	Frontal, occipital and sensory-motor
[27]	thermal stimulation	18–22	Theta, beta, and gamma	prefrontal cortex and contralateral sensorimotor
This study	Acute pain	Neonates	Delta and theta	Cz and C4 (Sensorimotor)

We performed classification using only two channels, including Cz and C4, according to the procedure shown in Fig. 9, to check the correctness of the selected channels. The results are reported in Table 4. As the results are shown in Table 4, classification performance by using all channel data is obtained by using only Cz and C4 channels data, which confirms the correctness of the output of the proposed channel selection algorithm.

Table 4

The results of the classification of no-pain and pain classes using the selected Cz and C4 EEG channels
	Accuracy	sensitivity	specificity
SVM	97.22	100	94.44
GNB	97.22	94.44	100
DT	88.89	83.33	94.44

In this study, the proposed channel selection approach is applied to the neonates' pain EEG data. But this approach can be generalized to determine active areas of the brain in other applications such as BCI, diagnosis of epilepsy, sleep disorders, etc. In other applications, there may be differences in the classifiers that are chosen for voting. The general structure of the proposed channel selection approach is shown in Fig. 10. As shown in Fig. 10, the number of N classifiers participate in the voting. Classifiers are chosen according to the classification accuracy and classification procedure. In this study, we chose N = 3. The computational cost will increase, by increasing the number of classifiers, but the selected EEG channels will be more reliable. In this proposed approach, the features that are selected at least by the H number of classifiers will be considered as final selected features. We chose H = 2 (H ≤ N). The closer H to N, the fewer EEG channels will be selected as active channels, but the accuracy of classification using the selected areas will probably be lower.

Selecting effective EEG channels eliminates hardware limitations, reduces the high cost of signal collection, and also reduces processing time. In this paper, we proposed a new method to select the active EEG channels during pain in neonates. The proposed method is a combination of conventional filter and wrapper methods, and channel selection is performed by voting from the output of N classifiers (three classifiers for data of this study). Classifiers that adopt different approaches for classification were selected for voting. In this way, it can be ensured that the selected feature has the ability to distinguish no-pain and pain classes from different aspects. Since wrapper feature selection methods have inherently high computational complexity, the thresholding method in the proposed pseudo-SFFS algorithm has reduced the computational complexity significantly. In this method, two channels are selected out of 18 electrodes. The selected channels identify the scalp area that contains the most brain activity during pain for neonates.

The selected channel-frequency band by the proposed method of this study are Cz-theta and C4-delta. Information of C4 is complementary to Cz information.

Suggested topics for future research is improvement and optimization method for determining the value of N and H for the proposed channel selection approach. Also, the use of deep learning methods instead of machine learning to determine the active areas of the brain is another suggested topic for future research.

Ethical approval

This article examined EEG signals of the neonates' pain dataset [18], which is freely available in the public domain. This article does not contain any studies with human participants performed by any of the authors.

Informed consent

Written informed consent was obtained from the parents prior to each study [18].

Competing Interest

The authors declare that they have no conflict of interest.

Authors contributions

Safa Talebi wrote the main manuscript text and prepared figures. All authors reviewed the manuscript.

Funding

"not applicable"

Availability of data and materials

The data are freely available on the UK data service website.

https://reshare.ukdataservice.ac.uk/853311/

Herr, K., Coyne, P. J., McCaffery, M., Manworren, R., & Merkel, S.: Pain assessment in the patient unable to self-report: position statement with clinical practice recommendations. Pain management nursing, 12:230-250. (2011). https://doi.org/10.1016/j.pmn.2011.10.002
Alazrai, R., Al-Rawi, S., Alwanni, H., & Daoud, M. I.: Tonic cold pain detection using Choi–Williams time-frequency distribution analysis of EEG signals: a feasibility study. Applied Sciences. 9(16), 3433. (2019). https://doi.org/10.3390/app9163433
Ahn, Y., & Jun, Y.: Measurement of pain-like response to various NICU stimulants for high-risk infants. Early human development. 83:255-262. (2007). https://doi.org/10.1016/j.earlhumdev.2006.05.022
Hadjileontiadis, L. J.: EEG-based tonic cold pain characterization using wavelet higher order spectral features. IEEE Transactions on Biomedical Engineering. 62:1981-1991. (2015). https://doi.org/10.1109/TBME.2015.2409133
Nezam, T., Boostani, R., Abootalebi, V., & Rastegar, K.: A novel classification strategy to distinguish five levels of pain using the EEG signal features. IEEE Transactions on Affective Computing. 12:131-140. (2018). https://doi.org/10.1109/TAFFC.2018.2851236
Levitt, J., Edhi, M. M., Thorpe, R. V., Leung, J. W., Michishita, M., Koyama, S., ... & Saab, C. Y.: Pain phenotypes classified by machine learning using electroencephalography features. NeuroImage. 223:117256. (2020). https://doi.org/10.1016/j.neuroimage.2020.117256
Wosiak, A., & Dura, A.: Hybrid Method of Automated EEG Signals’ Selection Using Reversed Correlation Algorithm for Improved Classification of Emotions. Sensors. 20:7083. (2020). https://doi.org/10.3390/s20247083
Alotaiby, T., Abd El-Samie, F. E., Alshebeili, S. A., & Ahmad, I.: A review of channel selection algorithms for EEG signal processing. EURASIP Journal on Advances in Signal Processing, 2015:1-21. (2015). https://doi.org/10.1186/s13634-015-0251-9
Sánchez-Maroño, N., Alonso-Betanzos, A., & Tombilla-Sanromán, M.: Filter methods for feature selection–a comparative study. In International Conference on Intelligent Data Engineering and Automated Learning (pp. 178-187). Springer, Berlin, Heidelberg. (2007). https://doi.org/10.1007/978-3-540-77226-2_19
[10] Sun, H., Jin, J., Kong, W., Zuo, C., Li, S., & Wang, X.: Novel channel selection method based on position priori weighted permutation entropy and binary gravity search algorithm. Cognitive Neurodynamics. 15:141-156. (2021). https://doi.org/10.1007/s11571-020-09608-3
Li, M., Ma, J., & Jia, S.: Optimal combination of channels selection based on common spatial pattern algorithm. In 2011 IEEE International Conference on Mechatronics and Automation (pp. 295-300). IEEE. (2011). https://doi.org/10.1109/ICMA.2011.5985673
Moctezuma, L. A., & Molinas, M.: EEG channel-selection method for epileptic-seizure classification based on multi-objective optimization. Frontiers in neuroscience. 14:593. (2020). https://doi.org/10.3389/fnins.2020.00593
Duun-Henriksen, J., Kjaer, T. W., Madsen, R. E., Remvig, L. S., Thomsen, C. E., & Sorensen, H. B. D.: Channel selection for automatic seizure detection. Clinical Neurophysiology. 123:84-92. (2012). https://doi.org/10.1016/j.clinph.2011.06.001
Ong, K. M., Thung, K. H., Wee, C. Y., & Paramesran, R.: Selection of a subset of EEG channels using PCA to classify alcoholics and non-alcoholics." In 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference. IEEE, 2006. pp. 4195-4198. (2005). https://doi.org/10.1109/IEMBS.2005.1615389
Lan, T., Erdogmus, D., Adami, A., Mathan, S., & Pavel, M.: Channel selection and feature projection for cognitive load estimation using ambulatory EEG. Computational intelligence and neuroscience. (2007) https://doi.org/10.1155/2007/74895
Dubost, C., Humbert, P., Benizri, A., Tourtier, J. P., Vayatis, N., & Vidal, P. P.: Selection of the Best Electroencephalogram Channel to Predict the Depth of Anesthesia. Frontiers in Computational Neuroscience. 13:65. (2019). https://doi.org/10.3389/fncom.2019.00065
van den Broek, S. P., Reinders, F., Donderwinkel, M., & Peters, M. J.: Volume conduction effects in EEG and MEG. Electroencephalography and clinical neurophysiology. 106:522-534. (1998). https://doi.org/10.1016/S0013-4694(97)00147-8
Jones, L., Laudiano-Dray, M. P., Whitehead, K., Verriotis, M., Meek, J., Fitzgerald, M., & Fabrizi, L.: EEG, behavioural and physiological recordings following a painful procedure in human neonates. Scientific Data. 5:1-10. (2018). https://doi.org/10.1038/sdata.2018.248
Symonds, L. L., Gordon, N. S., Bixby, J. C., & Mande, M. M.: Right-lateralized pain processing in the human cortex: an FMRI study. Journal of neurophysiology. 95:3823-3830. (2006). https://doi.org/10.1152/jn.01162.2005.
Uyulan, C., & Erguzel, T. T.: Analysis of time–frequency EEG feature extraction methods for mental task classification. International Journal of Computational Intelligence Systems. 10:12801288. (2017)
Akin, M.: Comparison of wavelet transform and FFT methods in the analysis of EEG signals. Journal of medical systems. 26:241-247. (2002). https://doi.org/10.1023/A:1015075101937
Alturki, F. A., AlSharabi, K., Abdurraqeeb, A. M., & Aljalal, M.: EEG signal analysis for diagnosing neurological disorders using discrete wavelet transform and intelligent techniques. Sensors. 20:2505. (2020). http://dx.doi.org/10.3390/s20092505
Zamani, J., & Naieni, A. B.: Best feature extraction and classification algorithms for EEG signals in neuromarketing. Frontiers in Biomedical Technologies. 7:186-191. (2020). https://doi.org/10.18502/fbt.v7i3.4621
Wang, D., Zhang, H., Liu, R., Lv, W., & Wang, D.: t-Test feature selection approach based on term frequency for text categorization. Pattern Recognition Letters. 45:1-10. (2014). http://dx.doi.org/10.1016/j.patrec.2014.02.013
Zhou, N., & Wang, L.: A modified T-test feature selection method and its application on the HapMap genotype data. Genomics, proteomics & bioinformatics. 5:242-249. (2007). https://doi.org/10.1016/S1672-0229(08)60011-X
Huang, G., Xiao, P., Hung, Y. S., Iannetti, G. D., Zhang, Z. G., & Hu, L.: A novel approach to predict subjective pain perception from single-trial laser-evoked potentials. Neuroimage. 81:283-293. (2013). http://dx.doi.org/10.1016/j.neuroimage.2013.05.017
Misra, G., Wang, W. E., Archer, D. B., Roy, A., & Coombes, S. A.: Automated classification of pain perception using high-density electroencephalography data. Journal of neurophysiology. 117:786-795. (2017). https://doi.org/10.1152/jn.00650.2016
Lin, Yuan-Pin, Chi-Hong Wang, Tien-Lin Wu, Shyh-Kang Jeng, and Jyh-Horng Chen.: Support vector machine for EEG signal classification during listening to emotional music." In 2008 IEEE 10th workshop on multimedia signal processing. IEEE, 2008:127-130. https://doi.org/10.1109/MMSP.2008.4665061
Richhariya, B., & Tanveer, M.: EEG signal classification using universum support vector machine. Expert Systems with Applications. 106:169-182. (2018). https://doi.org/10.1016/j.eswa.2018.03.053
Smola, A., & Vishwanathan, S. V. N. Introduction to machine learning. Cambridge University, UK, 32(34). (2008)
Szuflitowska, B., & Orłowski, P.: Comparison of the EEG signal classifiers LDA, NBC and GNBC based on time-frequency features. Pomiary Automatyka Robotyka. 21:39-45. (2017). https://doi.org/10.14313/PAR_224/39
Witten, I. H., Frank, E., Hall, M. A., Pal, C. J., & DATA, M.: Data Mining Practical machine learning tools and techniques. (Third Edition) (Vol. 2, No. 4). (2005)
Subasi, A., & Gursoy, M. I.: EEG signal classification using PCA, ICA, LDA and support vector machines. Expert systems with applications, 37:8659-8666. (2010). https://doi.org/10.1016/j.eswa.2010.06.065
Wu, S. L., Wu, C. W., Pal, N. R., Chen, C. Y., Chen, S. A., & Lin, C. T.: Common spatial pattern and linear discriminant analysis for motor imagery classification. In 2013 IEEE Symposium on Computational Intelligence, Cognitive Algorithms, Mind, and Brain (CCMB). IEEE. pp. 146-151. https://doi.org/10.1109/CCMB.2013.6609178
Kołodziej, M., Majkowski, A., & Rak, R. J.: Linear discriminant analysis as EEG features reduction technique for brain-computer interfaces. Przeglad Elektrotechniczny, 88:28-30. (2012)
Alkan, A., & Günay, M.: Identification of EMG signals using discriminant analysis and SVM classifier. Expert systems with Applications, 39:44-47. (2012). https://doi.org/10.1016/j.eswa.2011.06.043
Hegelich, S.: Decision trees and random forests: Machine learning techniques to classify rare events. European Policy Analysis, 2:98-120. (2016). https://doi.org/10.1007/978-3-319-71011-2_13
Zhou, Z. H.: Ensemble methods: foundations and algorithms. ( Second Edition) . CRC press. (2012)
Santos Bastos, N. D., Adamatti, D. F., & Billa, C. Z.: Decision tree to analyses eeg signal: a case study using spatial activities. In Latin American Workshop on Computational Neuroscience, Springer, Cham, 2017:159-169. https://doi.org/10.1007/978-3-319-71011-2_13
Arvaneh, M., Guan, C., Ang, K. K., & Quek, H. C.: EEG channel selection using decision tree in brain-computer interface. In Proceedings of the Second APSIPA Annual Summit and Conference, 2010:225-230.
Ghayab, H. R. A., Li, Y., Abdulla, S., Diykh, M., & Wan, X.: Classification of epileptic EEG signals based on simple random sampling and sequential feature selection. Brain informatics, 3:85-91. (2016). https://doi.org/10.1007/s40708-016-0039-1
Alakuş, T. B., & Türkoğlu, İ.: Feature selection with sequential forward selection algorithm from emotion estimation based on EEG signals. Sakarya University Journal of Science, 23:1096-1105. (2019). https://doi.org/10.16984/saufenbilder.501799
Fairley, J., Georgoulas, G., & Vachtsevanos, G.: Sequential feature selection methods for Parkinsonian human sleep analysis. In 2009 17th Mediterranean Conference on Control and Automation, IEEE, 2009:1468-1473. https://doi.org/10.1109/MED.2009.5164754
Tayeb, Z., Bose, R., Dragomir, A., Osborn, L. E., Thakor, N. V., & Cheng, G.: Decoding of pain perception using EEG signals for a real-time reflex system in prostheses: A case study. Scientific reports, 10:1-11. (2020). https://doi.org/10.1038/s41598-020-62525-7

No competing interests reported.

Download PDF

Editorial decision: Major revision
06 Jul, 2023
Reviews received at journal
21 Feb, 2023
Reviewers agreed at journal
14 Feb, 2023
Reviewers invited by journal
14 Feb, 2023
Editor assigned by journal
21 Dec, 2022
Submission checks completed at journal
21 Dec, 2022
First submitted to journal
18 Dec, 2022

You are reading this latest preprint version

A Novel Channel Selection Approach for Human Neonate’s Pain EEG Data Analysis

Status:

Version 1

Abstract

Figures

1 Introduction

2 Related Works

3 Data Description

4 Data Processing

4.1 Preprocessing

4.2 Feature extraction

4.3 Feature selection

4.4 Classification

4.4.1 SVM

4.4.2 NB

4.4.3 DA

4.4.4 KNN

4.4.5 DT

4.5 Performance Evaluation

4.6 Proposed Channel selection approach

4.7 Channels evaluation

5 Discussion

6 Conclusions

Declarations

References

Additional Declarations

Status:

Version 1