Theory
The primary focus is on eradicating fear conditioning associated with food, a mechanism regulated by the BLA and the CEA, using NF. To extinguish this fear conditioning, inhibitory neurons must suppress the activity of CEA neurons responsible for initiating the fear response. This suppression is facilitated by neurons located in the Intercalated Cell Masses (ICMs), which are activated by neurons in the Prefrontal Cortex (PFC) [see Fig. 2] (Sah and Westbrook 2008). The PFC serves multiple roles but especially the execution of higher-level cognitive functions (Menon and D’Esposito 2022). One such cognitive function is the evaluation and optimization of feedback, where different subregions of PFC serve distinct roles (Verharen et al. 2020). Utilizing NF, RelaxNeuron generates feedback based on a patient's emotional response to images of food on the screen. Initially, patients may receive negative feedback; however, any alleviation in their fear response is met with positive feedback. In real-time, the PFC works to maximize this positive feedback, ultimately activating the ICMs and inhibiting the CEA neurons. The aim is to extinguish the conditioned fear response toward foods.
To evaluate the patient's emotional response, RelaxNeuron employs a combination of physiological metrics (as outlined in Section 2.2) and eye-tracking error calculations (covered in Section 2.3) as the patient focuses on moving images of food. This approach capitalizes on recent advancements in webcam based eye-tracking technology and affective computational neuroscience.
Theory of Predicting Emotion
According to Verma and Tiwary, human emotion can be represented in a three-dimensional emotional space defined by three axes: valence (V), arousal (A), and dominance (D) [see Fig. 3] (Verma and Tiwary 2017). Typically, each axis ranges from 1 to 9. For instance, in this model, a state of happiness might be characterized by the coordinates (V, A, D) = (9, 3, 4), while sadness could be denoted as (1, 1, 2). This contrasts with Russell's more traditional two-dimensional model, which utilizes only V and A axes (Russell 1980). In this 2D model, emotions such as fear and anger are indistinguishable due to similar low-valence and high-arousal values. However, the 3D model overcomes this limitation by introducing the D axis and can differentiate between emotions like anger (high dominance) and fear (low dominance) too.
Recent advancements in affective neuroscience, coupled with the integration of various bio-sensors and machine learning techniques, have significantly improved the accuracy of emotion prediction. Traditional methods primarily relied on subjective measures like direct user questionnaires, facial expression analysis, and motion detection. Such measures are susceptible to cultural and individual variation and can be consciously manipulated. In contrast, bio-sensors such as EEG and electrocardiograms (ECG) offer objective metrics by capturing an individual's physiological state, which indirectly but accurately reflects their emotional state (discussed more in the following section). Importantly, these metrics are largely invariant across different individuals and are resistant to conscious manipulation. Advancements in machine learning algorithms enable robust training protocols for AI to categorize various physiological states with corresponding emotional states. Common bio-sensors employed in emotion prediction studies include EEG, ECG, EDA, PPG, respiration belts, fMRI, and PET (Houssein, Hammad, and Ali 2022; Egger, Ley, and Hanke 2019; Phan et al. 2002; Yu and Sun 2020). Due to considerations of cost, portability, and accessibility, this study employs EEG, EDA, and ECG.
EEG
EEG technology captures electrical activity in the human cerebral cortex and has proven valuable in diagnosing various medical conditions like seizure, depression, and Alzheimer's disease (de Aguiar Neto and Rosa 2019; Perez-Valero et al. 2021; Satapathy and Loganathan 2021; Alotaiby et al. 2014). Data is acquired via electrodes usually positioned on the skull based on the international 10–20 system [see Fig. 4]. However, EEG mainly monitors cortical surface activity, unable to access deeper emotional processing areas like the hippocampus, amygdala, and hypothalamus. Despite this limitation, EEG has shown reliability in assessing emotional states (Bota et al. 2019). For instance, alpha wave asymmetry in the frontal lobe indicates emotional valence; stronger right frontal lobe (i.e. Fp2 electrode) activity suggests negative emotions and vise versa (Balconi and Mazza 2009; Jatupaiboon, Pan-ngum, and Israsena 2013). EEG signals are commonly divided into five frequency bands—delta, theta, alpha, beta, and gamma—each linked to specific neural states [see Fig. 5].
Recent research challenges the notion that more electrodes yield better data. Studies have shown that even three or single-channel EEGs can be as effective as full-brain setups in diagnosing conditions like depression (Cai et al. 2016; Perreau-Linck et al. 2010). Another study found that a 2-channel EEG was as accurate as a traditional 32-channel setup in predicting emotional V values (Wu et al. 2017).
In the context of RelaxNeuron, the focus is on detecting low emotional valence thus making the detection of the right frontal activity crucial. Progress is indicated by rising valence levels, potentially leading to the discontinuation of the software. To keep a balance between a simplified hardware setup, user-friendly experience, and cost-effectiveness, while still reliably capturing crucial emotional data, a single-channel EEG configuration featuring an electrode placed at the Fp2 position is deemed adequate.
ECG
The brain and heart are connected via the autonomic nervous system (ANS), in which both indirectly influence each other’s behavior. The SNS and parasympathetic nervous system (PNS) forms the ANS. The centers of the ANS’s control over the heart rhythm are located at the medulla oblongata. Without any external factor, both SNS and PNS provide minimal amount of stimulation to the cardiac muscle and cause it to have an autonomic tune. However, upon excitation, the cardio-accelerator releases the neurotransmitter norepinephrine and causes the HR to increase drastically. This process occurs throughout the SNS and is commonly known as the “fight or flight” response. As for the decrease in the HR, the cardio-inhibitory centers release the neurotransmitter acetylcholine to the PNS. This activation can be referred to as the “rest and digest” operation. SNS and PNS stimulation flows through the cardiac plexus, cervical ganglia, and superior thoracic ganglia to the sinoatrial (SA) nodes and atrioventricular (AV) nodes, with the nerves’ fibers reaching the atria and ventricles. The dynamic, continuous, and bidirectional communication of both organs affects one’s perception, emotion, intuition, and general health. Hence, detecting the cardiac rhythm for emotion recognition is necessary for a robust and accurate result.
EDA
Sudomotor activity - subtype of sympathetic nervous system(SNS) responsible for sweat production - is closely tied to the emotion. When emotional arousal triggers sweat production, the increased moisture elevates skin electrical conductance. This surge in conductance is captured as an EDA signal, providing a real-time physiological metric, making EDA a valuable tool for evaluating autonomic functions and assessing cognitive arousal levels (Illigens and Gibbons 2009).
One of the most prominent features of EDA signals is the Skin Conductance Responses (SCRs), transient spikes resulting from SNS activation in response to stimuli [see Fig. 6]. The temporal and amplitude characteristics of SCRs can theoretically allow for estimations of stimulus timings and intensities originating from neural control centers.
The production of SCRs is influenced by at least three neural pathways: hypothalamic control, contralateral and basal ganglia influences, and the reticular formation within the brainstem (Sequeira and Roy 1993; Roy, Sequeira, and Delerm 1993). Among these, the pathway involving the frontal cortex is particularly significant, as it is associated with attention and emotional states of the individual (Hugdahl 1995).
Eye-tracking error of the patient of the moving food image
Not only relying on these traditional emotion predictors like EEG, EDA, and ECG, but the software aims to augment its prediction precision by incorporating a multifaceted approach, specifically employing eye-tracking technology to gauge patients' emotional states. Studies have demonstrated that patients with AN encounter difficulties in maintaining visual focus on food-related images (Puttevils et al. 2023; Meregalli et al. 2023; Giel et al. 2011). This phenomenon is hypothesized to be linked to fear-conditioning responses towards food. It stands to reason that patients may struggle to sustain attention on stimuli that induce fear or anxiety. This interconnection between the performance of eye-tracking and the fear conditioning offers a compelling avenue for therapeutic intervention: by normalizing eye-tracking performance, there may be a collateral modulation of the fear-conditioning neural circuits. To this end, the software quantifies the patient's ability to effectively track moving food images and aims to reduce this tracking error through targeted training.
EEG
This study employs the FocusCalm EEG system by Hacosco, based in Tokyo, Japan [Fig. 7]. Not only is this portable EEG device cost-effective, priced at approximately $250, but it is also incredibly user-friendly. Setup is virtually effortless - users simply need to press a single button and place the device on their forehead. The device features a dry electrode, eliminating the need for conductive gel and thereby offering additional convenience. Aesthetically, FocusCalm is designed to be both cool and unobtrusive, ensuring it does not interfere with one's hairstyle, unlike conventional EEG, and thereby lowering the barriers to usage.
Most crucially, FocusCalm is a single-channel EEG system, specifically configured with its target electrode at the Fp2 location, fitting our need precisely. In this setup, the FpZ electrode serves as the ground electrode, stabilizing the electrical signal, while the Fp1 electrode functions as the reference electrode to minimize noise. This means that the output signal represents the value at Fp2, subtracted by that at Fp1. The features of FocusCalm align seamlessly with the research objectives, providing a cost-effective, straightforward, and reliable means for effective treatment, with an electrode configuration ideally suited to our study's aims.
ECG
This study employs an Arduino Mega board (Arduino, MA, US, $51.83) to interface with the ECG sensor sourced from DFRobot, US (SKU : SEN0213) [see Fig. 8]. For ECG data acquisition, the user is required to affix three adhesive pads to the chest area. These pads are designed for convenience, necessitating no preparatory steps other than removal of their protective seal before application. Notably, the entire setup is cost-effective, with the total expenditure amounting to a mere $19.90, and offers a high degree of customizability to meet the specific requirements of our research.
EDA
EDA sensor are provided by Grove, US [see also Fig. 8]. Similarly, the EDA sensor is user-friendly, requiring the participant to simply insert two fingers into designated finger holders to enable measurement. It only costs $10.97 and are also deployed in the Arduino Mega board.
Feedback Structure - the Processing Layer and Data Layer
Patients receive feedback at 6-second intervals while watching a moving food image on the screen. During this time, the software monitors various sensors, including EEG, EDA, and ECG, as well as the patient's eye movements relative to the image. Every 6 seconds, the software evaluates the user's eye-tracking error and emotional state using a pre-trained model. These metrics are compared to a predetermined "healthy" response (detailed in section 4.3) to food stimuli to calculate an error score. The software then calculates the rate of change for each metric by comparing current values to those from 6 seconds prior. For instance, if the emotional error decreases by 30%, a score of -30 is assigned to that metric. If the eye-tracking error decreases by 8%, it receives a score of -8. These scores are then normalized to a 0-100 scale to avoid bias. The individual scores are summed to generate an overall score. This total score measures how much the patient’s error from the target state changed during the last 6s. Therefore, a more negative score is an indicative of patient progress since it would mean that this error decreased. Conversely, a less negative or positive score suggests that the patient's condition has worsened because it would mean that the error in fact increased.
Feedback Modality - the Surface Layer
After deciding the type of feedback to offer, patients receive auditory cues along with a performance ranking. The use of auditory feedback is informed by a meta-analysis I conducted, which showed its superior effectiveness in EEG-based Neurofeedback (NF) for treating PTSD (Matsuyanagi 2023). In practice, patients first hear a target tone of 250 Hz at the beginning of the session. Subsequent tones, tailored to individual performance, aim to close the gap between the feedback tone and this target tone. Positive feedback results in a tone closer to the target, while negative feedback yields a tone further away. A consistent score will produce an unchanged tone. Additionally, a brief reward sound is played when the feedback tone closely matches the target. To enhance motivation, the patient's rank, displayed on-screen, will rise or fall based on the quality of the feedback received.
Target Emotional Status
Deciding the target emotional status of the patient is a crucial aspect of this treatment. AN patients have a strong preoccupation with food, often to the extent of food addiction (Tran et al. 2020). A significant number of patients who recover from AN later develop Bulimia Nervosa (BN), a disorder characterized by binge-eating.
The goal of the treatment is to optimize outcomes by normalizing the patient's emotional statuses of valence, arousal, and dominance to match those of an average healthy individual in response of viewing a highly-palatable foods like cakes and pizzas. This would mean to lead the patient to have high values in all valence, arousal, and dominance in response to seeing those foods. However, this approach risks potentially predisposing AN patients to developing BN as studies indicate that BN patients exhibit abnormally high arousal and dominance levels when exposed to palatable foods (Becker et al. 2018; Thomas and Lovell 2015).
To mitigate this risk, the software aims to normalize the patient's arousal and valence levels to those exhibited by healthy individuals when viewing neutral foods instead of highly palatable foods. Specifically, the target arousal and dominance levels are set at 3 and 5 (Miccoli et al. 2016), respectively [see Fig. 11].
Emotion Prediction
A neural network model using deep learning algorithm is trained to predict the VAD value of the user from the sensor signals. First, after retrieving data and preparing them, EEG is denoised, and features are extracted from each sensors which is than used to train a deep learning algorithm that is evaluated its accuracy [see Fig. 12].
Data and Material
The model is trained using DEAP database (Koelstra et al. 2012). DEAP is a publicly available dataset for emotion analysis. This database contains EEG, EOG (Electrooculography), EMG (Electromyography), EDA, Respiration sensor, Plethysmograph(Heart Rate), and skin temperature sensor signals corresponding with a certain set of labels of valence, arousal, and dominance. This database contains data of 32 participants with each participant going through 40 trials that is each one minute long. These signals were acquired while the patient watched a short one minute movie that were aimed at evoking a certain emotion. Then the corresponding valence, arousal, and dominance labels are acquired by a subjective rating of each short movie done directly by the participants in a scale from 1 to 9. For arousal and dominance, the higher score indicate the higher level of that and for valence, higher score indicate a more positive emotion. EEG was recorded with 32 electrodes, placing according to the international 10–20 system. Each electrode recorded 63 s EEG signal, with 3 s baseline signal before the trial.
In this paper, we used preprocessed EEG-Fp2, EDA, and Plethysmograph data. For training our model, features are extracted and a classification model is trained using feedforward neural network.
For training our model, 3 s baseline are removed and each trial is epoch into 6 s segments to match our software implementation. So in total, we acquire 12,800 labeled input vectors (i.e. 60/6 * 40 * 32 = 12, 800).
Denoising EEG
Noise has greatly impacted the EEG signal from its origin and is still is (“Article: Methods of Denoising of Electroencephalogram Signal: A Review Journal: International Journal of Biomedical Engineering and Technology (IJBET) 2015 Vol.18 No.4 Pp.385–395 Abstract: Electroencephalogram (EEG) Is Obtained as a Result of Electrical Activity of Neurons in the Brain. These Signals Have Very Small Amplitudes and Hence Are Quite Prone to Contamination by Different Artefacts. The Major Types of Artefacts That Affect the EEG Are Baseline Wandering, Power Line Noise, Eye Movements, Electromyogram (EMG) Disturbance, and Electrocardiogram (ECG) Disturbance. The Presence of Artefacts Makes the Analysis of EEG Difficult for Clinical Evaluation and Information. To Deal with These Artefacts, Numerous Methods and Techniques Have Been Evolved by Different Researchers. These Methods Include Regression, Blind Source Separation, Wavelet and Empirical Mode Decomposition Etc. This Paper Provides a Review of These Methods for Denoising of EEG Signal. Inderscience Publishers - Linking Academia, Business and Industry through Research” n.d.). Power line, eye movement including blinking, muscle movements, electrode noise, and even one’s heart can contaminate the output of the raw EEG [100] thus hiding the true neural activity underlying certain response of interest. Hence denoising EEG is a crucial step. Common denoising methods use various kinds of approaches including principle component analysis (PCA), independent component analysis (ICA), regression analysis, or recently wavelet transforms. Although the selection of these currently rely on the researcher’s decision, studies have demonstrated the high efficacy of a hybrid method, which is to combine two of these to denoise raw signals. In the case of single channel EEG, Sheoran (Sheoran, Kumar, and Kumar 2014) recently proposed a hybrid denoising method which combined wavelet transform and ICA to effectively denoise a single-channel EEG and showed its high efficiency in denoising single channel EEG compared to traditional methods using only single method.
Figure 13 summarizes the step for the Wavelet-ICA denoising method utilized in this study. From its nature, ICA requires an input matrix rather than a single vector. This makes it incompatible with single-channel signals. To overcome this, Sheoran proposed to apply a wavelet decomposition to the single-channel signal to generate a multi-dimention vector. He specifically proposed using a Stationary Wavelet Transform (SWT) with a symlet mother wavelet at decomposition level 10 to create the input matrix. SWT decomposes a signal into two components at each level, approximate and detail. The approximate component catches the low-frequency part and the detail component catches the high-frequency part, which is typically a noise. These component are obtained by convolving the signal with low-pass and high-pass filter, respectively. At each level, approximate component is decomposed into approximate and detail component further more. Thus after computing SWT on the raw EEG signal, all the approximate component at each level then serves as the input for FastICA, which computes the mixing and unmixing matrices, as well as the matrix of independent components of the multi-dimension input signal. In short, FastICA tries to decompose an input signal into a set of source signal. It does so by trying to maximize the negative entropy for each source signal, in another words, it tries to maximize the non-Gaussianity between the sum of each source signal. After identifying the sources of interest among the output of FastICA, these are then multiplied by the mixing matrix A to obtain the wavelet components of the signal. Finally, inverse SWT is used for wavelet reconstruction to recover the original signal form.
In this study, Fourier transform is used to select the components relevant for our aim. Fourier transform transforms the time-domain signal into a frequency-domain signal. It outputs the spectrum density of each frequency value. To decrease computational cost, Fast Fourier transform (FFT) is applied to each components calculated by the FastICA and the sum of power spectrum density for alpha power (8 ~ 12Hz) for each component is computed. After realigning the components in order of the value of its total alpha power, than every component after two is chosen. Bias is set to well-track the change of alpha-wave for a robust prediction of patient’s V value.
Feature Extraction
Feature extraction is vital in emotion recognition models, influencing their accuracy, interpretability, and adaptability. Features quantify emotional states through measurable attributes like EEG brain wave patterns or EDA skin conductance. Quality feature selection reduces noise, making it easier for the model to identify meaningful data patterns. Feature extraction methods fall into two broad categories: manual and deep. Manual methods offer full control over input features, enabling tailored selections based on existing domain knowledge, extracting physiologically meaningful features. This is crucial for interpretability and fine-tuning, particularly relevant in neurotechnology research with clinical applications. Deep methods automate this process using neural networks but lack the transparency and customization options provided by manual extraction. Our study opted for manual feature extraction over deep methods because it offers an extra layer of validation often absent in deep learning's black-box approach crucial for effective and safe neurotechnology applications.
Features can generally be divided into three categories: time-domain, frequency-domain, and time-frequency domain. Time-domain features, relevant to bio-sensors like EEG and EDA, include simple metrics such as average amplitude, standard deviation, and kurtosis without requiring transformations. Frequency-domain features result from Fourier Transformed signals and focus on frequency characteristics. Time-frequency domain features offer a hybrid, capturing both time and frequency information. Thus, the chosen method of feature extraction significantly impacts the model's effectiveness and applicability in emotion recognition research.
Feature extraction for EEG
EEG is characterized by its non-stationarity, meaning, their variance depends on time (i.e. covariance between different time points depends on that specific time). Therefore, frequency-domain features can not well express the characteristic of the EEG signal as common transforms in the frequency domain analysis like FT assume that the amplitude and the frequency of a specific component stay constant for the whole duration. However, this is not the case of most biological signals (Huang and Wu 2008). Thus denoised EEG is applied with a Hilbert Huang Transform, a specific mathematical transformation of a signal into a time-frequency domain [see Fig. 14].
Hilbert Huang Transform (HHT)
In HHT (Huang and Wu 2008), empirical mode decomposition (EMD) is applied, resulting in multiple intrinsic mode functions (IMFs) that represent the original signal according to the formula:
S(t) = ∑(1 < i < K)IMFi(t) + RK(t)
where RK(t) is either a monotonic or constant residual. Than Hilbert transform is applied to each IMF to obtain its analytical signal, characterized by its instantaneous amplitude Ai(t) and instantaneous phase Θi(t). The instantaneous frequency fi(t) is then derived from the derivative of Θi(t) using the equation fi(t) = (1/2π)(dΘi/dt). This provides a time-frequency representation of amplitude Ai(t). Finally, the mean frequency and their amplitudes, the peak frequency and its amplitude, the skewness, the spectral entropy, the kurtosis, the mean amplitude, and the spectral centroid of each analytical signal is derived as an EEG feature vector.
Feature extraction for ECG
HRV features
A typical ECG signal has three segmented waves per cycle: the P wave from atrial depolarization, the QRS complex with the highest amplitude due to ventricular depolarization, and the T wave from ventricular repolarization [see Fig. 16]. QRS detection is crucial for heart activity analysis, including heart rate variability (HRV) and inter-beat intervals (IBI). Various algorithms exist for QRS detection, often complicated by noise (Hasnul et al. 2021). One effective real-time method uses stationary wavelet transform (SWT), developed by Kalidas and Tamil, evaluated positively on databases like MIT-BIH Arrhythmia and AHA (Kalidas and Tamil 2017).
In the SWT-based QRS detection method, the raw ECG signal is initially resampled to 80 Hz to reduce computational load. A 2-level SWT is then applied using the 'db3' mother wavelet. This operation generates level 2 detail coefficients corresponding to the [10, 20] Hz frequency band. These coefficients are squared and subjected to moving window averaging (MWA), a process that accentuates the QRS peaks. The MWA helps in isolating the peaks associated with the R-waves from other signal components, making them more prominent for detection. A specific rule set is applied to the normalized MWA signal to identify these R-peaks. Once identified, the algorithm updates the locations of these R-peaks and calculates beat-to-beat intervals and other parameters. Dynamic thresholds are adjusted based on historical data, enabling the algorithm to recognize premature beats in subsequent three-second ECG segments.
Upon identifying QRS peaks, various HRV features like average heart rate, standard deviation of RR intervals, and the ratio of low-frequency to high-frequency components are extracted [see Fig. 15 red box]. This SWT-based method is computationally efficient and offers high sensitivity, making it a strong candidate for real-time QRS detection in noisy ECG signals.
Spectral Features
It is commonly accepted that the high-frequency (0.15 to 0.4 Hz)(HF) components of ECG signal are solely influenced by the parasympathetic system and the low-frequency components (0.045 to 0.15 Hz)(LF) are influenced by both the sympathetic and parasympathetic nervous systems. In the presence of a stressor, these frequency bands’ amplitude are known to increase significantly.
Given that ECG signals are non-stationary, time-variant spectral analysis is done with the ECG signal to extract features of the HF and LF components in a time-depending manner [see Fig. 15 blue box]. Thus, HHT, as described above in the EEG section, is applied to the raw ECG signal to extract LF and HF spectral features. After applied with HHT, the mean band power of each LF and HF, band power asymmetry, and its spectral entropy for each IMFs are computed to form the ECG feature vector. Together with the HRV features, total of 17 features are extracted from the ECG signal.
Feature Extraction for EDA
SCR features
Among EDA, both SCRs and Skin Conductance Levels (SCLs) - slow changes of the EDA that reflects an inherent biological drift - are present [see Fig. 17]. SCRs are crucial for assessing emotional states, and their accurate extraction is vital. Mathematical models automatically separate SCRs and SCLs. SCRs are modeled as a convolution between the infinite impulse response function, representing sweat diffusion, and a non-negative sparse driver. SCLs are modeled linearly with respect to time.
Among various decomposition methods such as thresholding and non-negative deconvolution, cubic-spline based non-negative sparse deconvolution (cvxEDA) is most effective (Posada-Quintero and Chon 2020; Greco et al. 2016). The cvxEDA method by Greco et al. [108] minimizes a convex optimization formula: with variables Mq for SCRs, and Bl + Cd for SCLs, and y representing the received signal. Subject to non-negativity constraints, this problem is efficiently solved via sparse Quadratic Programming (QP). The model requires no pre or post-processing and incorporates physiologically sound priors, making it computationally efficient for real-time processing. Regularization parameters alpha and gamma are set to 8e-4 and 1e-2 respectively, determined through system evaluations.Mean amplitude, standard deviation of SCR peaks, area under the curve, mean decay time, and mean peak intervals are included in the EDA feature vector [see Fig. 18 purple box].
Spectral Features
Motivated by the spectral analysis of heart rate variability (HRV), it has been recently proposed that in the presence of several stressors, a significant increase in the spectral power of the EDA was found in the same band as the low frequencies of HRV, which are known to be, at least in part, controlled by the sympathetic nervous system. The expanded frequency range of 0.08–0.24 Hz (accounting for an additional 5–10% of the spectral power of EDA) was proposed as an index of sympathetic control based on the power spectral analysis of EDA, termed EDASymp. This index was sensitive to stress in a similar fashion to time-domain measures (i.e., SCRs). Thus, time-variant spectral analysis of frequency band 0.08–0.24 Hz is also applied to the raw EDA signal and features are extracted in the same fashion as the above described method done with the time-variant spectral analysis of ECG signals [see Fig. 18 orange box]. Total, 10 features are extracted from the EDA signal.
Classification
In the burgeoning field of emotion prediction, researchers have employed a panoply of machine learning architectures, ranging from Support Vector Machines, Random Forests, k-nearest neighbor, and naive bayes (Houssein, Hammad, and Ali 2022). And within the last decade, use of deep leaning algorithm, like feedforward neural network, recurrent neural network, and convolutional neural network has increased rapidly and shown superiority against more traditional machine learning algorithm in predicting emotion. Recurrent neural network takes the sequence of data as an input and produces a sequence of states that captures relevant information from the sequence. Thus, it works well with time-series data but not with a multi-source non-time sequence of feature vector. Convolutional neural networks were built for image processing and consequently expects an input with dimension higher than 1 which makes it unsuitable for this study. In our study, we employed a feedforward neural network architecture that leverages multi-input pathways (Havasi et al. 2020) to effectively process heterogeneous sensor data—specifically, data from EEG, EDA, and ECG [see Fig. 19]. Given the distinct characteristics and importance of each sensor type in capturing unique physiological markers, our model was designed to treat these feature sets separately through distinct sub-networks before merging them for final regression tasks. Each sub-network consisted of multiple hidden layers with varying numbers of neurons and used rectified linear unit (ReLU) activation functions. These specialized sub-networks served as feature extractors that learned unique representations from the EEG, EDA, and ECG data, respectively. After processing these feature sets individually, the outputs of the sub-networks were concatenated and passed through additional dense layers, culminating in a final layer responsible for multi-output regression. This architecture not only accommodated the disparate nature of the features but also optimized the network's capacity to learn more robust and complementary representations from multi-source sensor data, thereby enhancing the model's predictive accuracy.