The ECG Heartbeat Categorization Dataset is widely used in academic research for detecting and classifying arrhythmic heartbeats. These signals are pre-processed and segmented, with each segment representing a single heartbeat.
In this paper, we present a convolutional neural network (CNN) for the binary classification of ECG signals. The primary advantage of using a CNN is its ability to automatically detect essential features without human intervention. This reduces the need for extensive pre-processing and minimizes manual effort while enhancing performance [17].
We employ two classification algorithms, ResNet and WaveNet, to classify ECG signals. CNN models are renowned for their strong performance, ResNet (Residual Network) and WaveNet are tailored for specific deep learning tasks. ResNet utilizes residual blocks, allowing the creation of very deep neural networks without encountering the vanishing gradient problem, thus improving performance in image classification and recognition tasks. WaveNet, developed to generate tem poral sequences such as speech or music, captures long-term dependencies with its extended convolutional layers. Together, these architectures offer high-performance solutions for various tasks. We will compare the results produced by each algorithm. The first stage involves data preprocessing, where the data is divided into training and testing sets. The ECG signal is classified into multiple categories, including a binary classification of normal versus abnormal signals. The following figure illustrates the overall ECG signal classification system:
The aim of the proposed method is to classify ECG signals using a convolutional neural network (CNN). First, the patient’s ECG signal is extracted and stored in a database specifically designed for heartbeat classification. These signals are then analyzed and categorized into two groups: normal and abnormal. During the training and testing phases, two deep learning models ResNet and WaveNet are compared. These models are selected to analyze the ECG signals through different layers, including convolutional and fully connected layers. The performance of the models is evaluated using the ROC curve and confusion matrix to assess classification accuracy. The ultimate goal is to determine the most effective model for ECG signal categorization (see Fig. 2).
3.1 Dataset
The ECG Heartbeat Categorization Dataset comprises two collections of heartbeat signals derived from two well-known classification datasets: the MIT-BIH Arrhythmia Database and the PTB Diagnostic ECG Database. Both datasets contain a sufficiently large number of samples to train deep neural networks. This dataset is used to explore heartbeat classification with deep neural network architectures and to investigate the capabilities of transfer learning. The signals in the dataset reflect the shape of the electrocardiogram (ECG) under both normal conditions and various arrhythmias and myocardial infarctions.
The database consists of 4 files:
-mitbih train.csv
-mitbih train.csv
-mitbih test.csv
ptbdb abnormal.csv
ptbdb normal.csv
And five classes:
-Normal (N)
-Premature ventricular contraction R-on-T (PVC R-on-T)
-Premature ventricular contraction (CVP)
-Premature or ectopic supraventricular beat (SP or EB)
-Unclassified beat (UB)
The ECG Heartbeat Categorization Dataset is widely used in the academic research community for detecting and classifying arrhythmic heartbeats. The signals are preprocessed and segmented, with each segment corresponding to a single heartbeat.
3.2 The CNN model
The convolutional neural network (CNN) is a significant technology in the field of deep learning. It is a specialized type of neural network model designed to process multidimensional data, such as images and videos. CNNs leverage the structured properties of spatial, temporal, or multidimensional data by applying successive convolution and pooling operations (see Fig. 3).
CNNs are widely employed in machine learning and artificial intelligence for tasks such as object recognition, image segmentation, face detection, and image classification They have made substantial advancements in these areas due to their ability to effectively extract relevant features from data [18].
3.2.1 ResNet model
The ResNet (Residual Network) model is a type of convolutional neural network (CNN) introduced in 2015 to address the issue of performance degradation associated with increasing network depth. It employs residual connections, or” shortcuts,” that bypass certain layers, thus facilitating gradient propagation during training and mitigating the problem of vanishing gradients2 (see Fig. 4).
-Residual blocks consist of several stacked convolutional layers with a residual connection that adds the block’s input to its output. This design allows the gradient to flow more smoothly through the layers3.
- Residual connections can skip one or more convolutional layers, creating” high ways” for information to traverse the network more effectively.
- Residual blocks often incorporate a” bottleneck” design, using 1x1 convolutions to reduce dimensionality, followed by 3x3 convolutions, and then 1x1 convolutions to restore dimensionality. This approach reduces the number of parameters.
- Typical ResNet models, such as ResNet-50, ResNet-101, and ResNet-152, have 50, 101, and 152 layers of depth, respectively. These deeper networks offer improved performance compared to shallower networks on complex tasks, such as image classification [19].
3.2.2 WaveNet model
WaveNet is an advanced deep learning-based speech generation model developed by Google DeepMind. This model is capable of directly modeling raw speech data and excels in tasks such as text-to-speech and speech synthesis. WaveNet employs dilated convolutions and causal convolutions. Dilated convolutions expand the receptive field as a function of network depth, enabling the model to effectively capture long-range dependencies in speech data [20].
The core component of WaveNet is causal convolution, which ensures that the model maintains the correct temporal order of data when processing it. This is achieved by using causal convolutions, as illustrated in the figure. For images, the equivalent of causal convolution is masking convolution, which can be implemented by constructing a masking tensor and convolution kernel for pointwise multiplication before applying it. In contrast, for one-dimensional data such as audio, causal convolution is easier to implement, and the output of the convolution can be shifted temporally. In this work, a deep learning framework using WaveNet is trained and applied to ECG heartbeat classification based on the dataset (see Fig. 5).
3.3 Performance measurement
The primary task of the CAD system is to classify ECG signals, which is challenging due to their complex morphology, noise, and the need for robust data extraction. Accurate classification necessitates a thorough understanding of anatomy and variations in the signals, making medical signal analysis quite complex. To address this issue, we implemented a simplified approach using deep neural networks (CNNs) to identify anomalies. CNNs are well-known for their ability to extract significant features from raw signal. In this study, we developed a custom architecture and fine-tuned the parameters to create a comprehensive solution model.
• Confusion matrix: The confusion matrix, also known as the error matrix, is an N×N matrix used to evaluate the performance of a classification model, where N is the number of target classes. This matrix establishes a comparison between the actual values of the targets and the predictive values of the machine learning model, thus providing an in-depth understanding of the model’s performance and the types of errors it produces. It comprises four essential values [21].
• True Positive (TP): A true positive result occurs when a test correctly identifies an actual condition or result. Example: In a medical test for a disease, a true positive result means that the test indicates that a person has the disease and that person actually has the disease.
• True Negative (TN): A true negative result occurs when a test correctly identifies a result that the disease is not present or that is not actually present. Example: For the same medical test, a true negative result means that the test indicates that a person does not have the disease and that person actually does not have the disease.
• False Positive (FP): A false positive result occurs when a test incorrectly identifies a result that the disease is present or that is not actually present. Example: In a medical test, a false positive result means that the test indicates that a person has the disease but that person actually does not have the disease [22].
• False Negative (FN): A false negative result occurs when a test incorrectly identifies a disease or result that does not actually exist. Example: In medical testing, a false negative result means that the test indicates that a person does not have the disease, but the person actually does have the disease. • Balanced accuracy: Balanced accuracy is a measure used to assess the quality of a classifier, particularly when classes are imbalanced. one class appears more frequently than another. This is often the case when anomalies or diseases are detected. Balanced accuracy normalises positive and negative predictions according to the number of positive and negative strains, respectively, and calculates their average by dividing the sum by two [23].
$$\:\mathbf{B}\mathbf{a}\mathbf{l}\mathbf{a}\mathbf{n}\mathbf{c}\mathbf{e}\mathbf{d}\mathbf{a}\mathbf{c}\mathbf{c}\mathbf{u}\mathbf{r}\mathbf{a}\mathbf{c}\mathbf{y}\:=\frac{\mathbf{T}\mathbf{P}\mathbf{R}+\mathbf{T}\mathbf{N}\mathbf{R}}{2}\left(1.1\right)$$
• Precision: Precision corresponds to the proportion of correctly predicted positive observations in relation to the total number of predicted positive observations [23].
$$\:\mathbf{P}\mathbf{r}\mathbf{e}\mathbf{c}\mathbf{i}\mathbf{s}\mathbf{i}\mathbf{o}\mathbf{n}\:=\:\frac{\mathbf{T}\mathbf{P}}{\mathbf{T}\mathbf{P}\:+\mathbf{F}\mathbf{P}}\:$$
1.2
• Recall: Recall is the ratio of correctly predicted positive observations to all observations in the actual class.
$$\:\mathbf{R}\mathbf{e}\mathbf{c}\mathbf{a}\mathbf{l}\mathbf{l}\:=\frac{\mathbf{T}\mathbf{P}\:}{\mathbf{T}\mathbf{P}\:+\mathbf{F}\mathbf{N}}\:\left(1.3\right)\:$$
• F1-Score: The F1 score is a harmonic measure of accuracy and memorability. This measure balances false positives and false negatives, offering a single score that takes into account all aspects of classification performance.
$$\:\mathbf{F}1-\mathbf{s}\mathbf{c}\mathbf{o}\mathbf{r}\mathbf{e}\:=\frac{2\:\mathbf{*}\mathbf{T}\mathbf{P}\:}{2\mathbf{T}\mathbf{P}\:+\mathbf{F}\mathbf{P}\:+\mathbf{F}\mathbf{N}}\:\left(1.4\right)$$
• AUC: The area under the curve (AUC) is an indicator of the performance of a classification model, representing the threshold between the rate of true positives and the rate of false positives.
$$\:\mathbf{A}\mathbf{U}\mathbf{C}\:=\frac{1}{2}\varvec{*}(\frac{\mathbf{T}\mathbf{P}}{\mathbf{T}\mathbf{P}\:+\mathbf{F}\mathbf{N}\:}+\frac{\mathbf{T}\mathbf{N}}{\mathbf{T}\mathbf{N}\:+\mathbf{F}\mathbf{P}})$$
1.5
• The ROC curve: The ROC curve visually illustrates the difference between sensitivity and specificity, with the x-axis representing the false positivity rate and the y-axis representing the true positivity rate. The Area Under the Curve (AUC) assesses the ability of the ROC curve to distinguish between classes [24].