Classification of ECG signals using CNN models

doi:10.21203/rs.3.rs-5310626/v1

Electrical activity is essential for blood circulation, and any alteration in the orderly pattern of excitation waves propagating through the heart can lead to arrhythmias.

The electrocardiogram (ECG) is a widely used tool for diagnosing arrhythmias, as it is rapid, affordable, and non-invasive. However, manual interpretation of ECG data is often time-consuming and subject to human error. With improved training, deep learning (DL) could provide a more effective alternative for rapid, automatic classification. This study presents a novel deep learning architecture, including a convolutional neural network (CNN), for classifying cardiac arrhythmias. Real ECG signals from the Heartbeat Categorization ECG dataset were used for training and validation. The models’ performance was evaluated using the confusion matrix, which determined precision, accuracy, recall, F1 score, mean, and AUC-ROC. Both models demonstrated exceptional performance, achieving accuracies of 0.995 and 0.996 on training and test datasets, outperforming existing methods in terms of accuracy and efficiency.

ECG signal

multi-class classification

CNN model

Prediction

ResNet model

WaveNet model

An electrocardiogram (ECG) is a standard test used to monitor cardiac activity. Many cardiac abnormalities, including arrhythmias—the general term for abnormal heart rhythms can be identified through an ECG. The basis of arrhythmia diagnosis involves distinguishing between normal and abnormal heartbeats based on the morphology of the electrocardiogram and accurately classifying them into different diagnoses [1][2].

For cardiologists, the process of identifying and classifying arrhythmias can be challenging, as it often requires analyzing every heartbeat recorded by a Holter monitor for hours or even days. This prolonged analysis increases the risk of human error due to fatigue. An alternative approach is to employ deep classification techniques. Recently, deep learning—a subset of machine learning—has emerged as a promising field for the intelligent classification of cardiac anomalies. Currently, the two most popular and widely used models for time series classification are Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) [3].

1.1 Motivations

CNNs and deep learning are mainly used to process complex and diverse data, for the ability to automatically extract features and for high performance in tasks where traditional methods may be less effective. Deep learning (DL) has the potential to be a more suitable substitute for quick, automatic classification with improved training [4][5]. With better training, deep learning (DL) could be a better alternative for rapid, automatic classification. The present study introduces a new deep learning architecture, particularly a convolutional neural network (CNN), for classifying cardiac arrhythmias.

1.2 Objectives

This work aims to: • Propose a system for deep recognition of electrocardiogram beats. • Trains and verifies two models using real ECG signals from the heartbeat categorisation dataset. • Evaluate model results on the test portion of the ECG heartbeat categorization system, focusing on accuracy, ROC curve analysis and confusion matrices. • Optimize hyperparameters to improve model performance and accurately distinguish between normal and abnormal ECG data.

1.3 Paper Organizations

Our paper is organized as follows:

Section 1 is an introduction that gives an overview of cardiac anomalies, deep learning models as well as the objective and motivation.

Section 2 contains related work that has been carried out to compare with my work.

Then let’s integrate the part of the proposed method in section 3 that summarizes my work. The latter is based on two CNN models: Resnet and Wavenet, which use the heartbeat categorization ECG data base to classify data as normal and abnormal. We describe the data base, its content (the train and test part) and its classes, the CNN model as well as the two Resnet and Wavenet models (their architecture and operation).

Section 4 is results and discussion, which gives the results of training and testing by detail for the two models, as well as a comparison between the 2 models, followed by a comparison with related work. And we end with a conclusion.

Analysis of the electrocardiogram (ECG), a medical monitoring device that records heart activity, is the current method of detecting and classifying disease. Unfortunately, finding professionals to examine a large volume of ECG data takes up far too much time and money. As a result, methods based on deep learning to recognise ECG features have been steadily gaining ground.

The various research studies on the classification of cardiac arrhythmias based on ECG signals can be divided into several categories depending on the model used, the performance and the databases.

Maciej et al. [6] compared the effectiveness of discrete wavelet transform (DWT) and continuous wavelet transform (CWT) in detecting abnormalities in electrocardiographic (ECG) signals. Their study highlights the strengths and weaknesses of each approach in analyzing ECG data for the identification of cardiac anomalies. Two types of anomalies, cardiac and congestive heart failure, were studied. In the DWT approach, statistical parameters of the transformed ECG signals are used to generate diagnostic features, which are then subjected to selection before being used as input attributes for different classifiers.

On the other hand, in the CWT approach, ECG signals are transformed into two dimensional images, which are then fed into a convolutional neural network (CNN) to generate diagnostic features and perform the final classification. The results showed that the DWT approach outperformed the CWT, with an accuracy of 0.9754 in detecting anomalies in the test data. This study highlights the advantages and performance of both methods, offering interesting perspectives for anomaly detection in ECG signals using wavelet transform and deep learning techniques.

Guo et al. [7] present a study exploring the use of a densely connected convolutional neural network (DenseNet) combined with a gated recurrent unit (GRU) network to address the challenge of ECG classification among patients. Their approach aims to enhance classification accuracy by leveraging the strengths of both architectures in processing temporal and spatial features of ECG data. A deep learning model architecture is proposed and evaluated using the MIT-BIH arrhythmia and supraventricular databases. The results obtained show that, without applying complex data preprocessing or feature engineering methods, our two models significantly outperformed the state-of-the-art for supraventricular (SVEB) and ventricular (VEB) arrhythmia classifications on the unpublished test dataset (with an improved F1 score from 0.5108 to 0.6125 for SVEB detection and from 0.8859 to 0.8975 for VEB detection, respectively). As no patient- or device-specific information is used at the training stage in this work, it can be considered as a more generic approach to deal with scenarios in which varieties of ECG signals are collected from different patients using different types of sensing devices.

Joao Pestana et al [8] demonstrate the superiority of the DWT approach over CWT in ECG anomaly detection. The DWT approach gave an accuracy of 0.9778, a sensitivity of 0.9720 and an accuracy of 0.9754 for anomaly detection. The experiments were performed on test data from the analysed ECG database. The authors used numerical descriptors based on statistics of all decomposition levels to form input attributes to a family of 9 classifiers. The results showed that the DWT approach with feature selection based on statistical parameters gave remarkable performance in anomaly detection compared with the CWT approach.

Jing Qin et al [9] presented a novel temporal generative antagonist network for anomaly detection in electrocardiography. Experimental results showed that the proposed model achieved a precision of 0.969, a recall of 0.918, an F1 of 0.943 and an AUC of 90.59. These superior performances testify to the effectiveness of incorporating temporal constraints with Bi-LSTM and mini-batch training to improve the stability of model convergence, resulting in a reliable detector of anomalies in electrocardiography.

Jing Qin et al. [10] introduced a novel temporal generative adversarial network for ECG anomaly detection. Experimental results showed that the proposed model achieved a precision of 0.969, a recall of 0.918, an F1 score of 0.943, and an AUC of 90.59. These superior results highlight the effectiveness of incorporating temporal constraints with bidirectional LSTM and mini-batch training, which enhance the stability of model convergence and result in a reliable anomaly detector in electro cardiography.

Abrar Alamr et al. [11] demonstrated that their proposed model achieved impressive results in terms of accuracy for detecting anomalies in ECG signals. Furthermore, on the MIT-BIH dataset, the model achieved an accuracy of 0.895. This performance in terms of accuracy demonstrates the effectiveness of the proposed model for detecting anomalies in ECG signals.

Singh et al. [12] concentrated on employing recurrent neural networks (RNNs), specifically Long Short-Term Memory (LSTM) networks, to categorize cardiac arrhythmias. The findings demonstrated that the LSTM RNN model outperformed the Gated Recurrent Unit (GRU) and conventional RNN models, with an accuracy of 0.881. The model, which segments ECG signals for categorization, was trained and tested using the MIT-BIH Arrhythmia database. The authors propose that greater advancements in machine learning applications for cardiac arrhythmia detection might be achieved by augmenting the quantity of neurons and iterations in the hidden layers.

Kachuee et al. [13] proposed a method based on deep convolutional neural networks for accurate heartbeat classification and knowledge transfer between different tasks. Their results showed average accuracies of 0.934 for arrhythmia classification and 0.959 for myocardial infarction classification. These competitive performances were achieved using the PhysioNet MIT-BIH and PTB Diagnostics datasets.

Vandana Singh et al. [14] addressed the classification of cardiovascular arrhythmias, which can be either life-threatening or non-life-threatening. The electrocardiogram (ECG) plays a crucial role in clinical studies and diagnostic analysis. According to the AAMI standards, the MIT-BIH dataset is divided into four categories: normal (N), ectopic supraventricular (S), ectopic ventricular (V), and fusion (F). This work aims to classify arrhythmias based on AAMI standards. After detecting each beat, it is analyzed within a specific window size. Features extracted from the ECG signals include RR intervals, local binary patterns (LBP), morphological changes, wavelet variations, and various magnitude values. The study found that using LBP as a single feature achieved the best performance, with an accuracy of 0.908, compared to other features.

Mengze Wu et al. [15] reported that their proposed convolutional neural network (CNN) achieved an impressive accuracy of 0.9741, with a sensitivity of 0.9705 and a specificity of 0.9935 for arrhythmia classification. These results highlight the model’s effectiveness in identifying arrhythmia micro classes, such as Left Branch Block and Right Branch Block, compared to other machine learning techniques like Back Propagation and Random Forest. Key contributions of this research include the CNN’s ability to process unfiltered datasets with anti-noise features and a detailed analysis of ECG micro classes, providing a valuable reference for ECG signal analysis and cardiovascular disease diagnosis.

Taissir Fekih Romdhane et al. [16] emphasized the effectiveness of the electrocardiogram (ECG) in diagnosing cardiovascular diseases and detecting arrhythmias. Many methods in the literature involve steps such as denoising, segmenting com ponents, feature extraction, and classification. This work proposes a deep learning approach using a convolutional neural network (CNN) that performs automatic feature extraction in parallel with classification, eliminating the need for manual feature extraction techniques. The proposed method also includes a novel cardiac segmentation algorithm that starts each ECG segment at an R point and ends after 1.2 times the median RR interval within a 10-second window. This approach is efficient and does not require complex filtering or assumptions about signal morphology. The study also introduces an optimization step using a focal loss function, which emphasizes minority classes by increasing their importance. The model, trained and evaluated on MIT-BIH and INCART data, achieved overall accuracy of 0.984, an F1 score of 0.9838, precision of 90.837, and recall of 0.984 (see Table 1).

The results indicate that the focal loss function improves categorization accuracy for minority classes and overall performance, outperforming current state-of-the-art methods.

Table 1

Precision of Models
Author	Model	Database	Accuracy
Maciej et al.[6]	CNN	MIT-BIH Arrhythmia	82.06%
Guo et al.[7]	DenseNet-GRU	MIT-BIH	92%
Joao Pestana et al.[8]	CNN	MIT-BIH Arrhythmia	70.74%
Jing Qin et al.[9]	GAN	MIT-BIH	95.5%
Abrar Alamr et al.[10]	AE, RNN, LSTM	ECG5000	88%
Singh et al.[11]	RNN-LSTM	MIT-BIH	89.5%
Kachuee et al.[12]	Deep Residual CNN	MIT-BIH	93%
Vandana Singh et al.[13]	Ensemble SVM	MIT-BIH	90.8%
Mengze Wu et al.[14]	CNN	MIT-BIH Arrhythmia	97.41%
Taissir Fekih Romdhane et al.[15]	CNN	MIT-BIH, INCART	98.41%

In our research, we focus on analyzing medical ECG data to enhance our understanding and improve classification results. We review and assess existing works on ECG data classification to gain a comprehensive perspective and refine the outcomes in this field. The novelty of our work lies in employing an advanced deep learning model, specifically a convolutional neural network (CNN), to boost the performance and accuracy of ECG heartbeat classification using an ECG heartbeat classification database as shown in the Table 1. Here is a summary of our work based on the comparison (see Fig. 1)

The ECG Heartbeat Categorization Dataset is widely used in academic research for detecting and classifying arrhythmic heartbeats. These signals are pre-processed and segmented, with each segment representing a single heartbeat.

In this paper, we present a convolutional neural network (CNN) for the binary classification of ECG signals. The primary advantage of using a CNN is its ability to automatically detect essential features without human intervention. This reduces the need for extensive pre-processing and minimizes manual effort while enhancing performance [17].

We employ two classification algorithms, ResNet and WaveNet, to classify ECG signals. CNN models are renowned for their strong performance, ResNet (Residual Network) and WaveNet are tailored for specific deep learning tasks. ResNet utilizes residual blocks, allowing the creation of very deep neural networks without encountering the vanishing gradient problem, thus improving performance in image classification and recognition tasks. WaveNet, developed to generate tem poral sequences such as speech or music, captures long-term dependencies with its extended convolutional layers. Together, these architectures offer high-performance solutions for various tasks. We will compare the results produced by each algorithm. The first stage involves data preprocessing, where the data is divided into training and testing sets. The ECG signal is classified into multiple categories, including a binary classification of normal versus abnormal signals. The following figure illustrates the overall ECG signal classification system:

The aim of the proposed method is to classify ECG signals using a convolutional neural network (CNN). First, the patient’s ECG signal is extracted and stored in a database specifically designed for heartbeat classification. These signals are then analyzed and categorized into two groups: normal and abnormal. During the training and testing phases, two deep learning models ResNet and WaveNet are compared. These models are selected to analyze the ECG signals through different layers, including convolutional and fully connected layers. The performance of the models is evaluated using the ROC curve and confusion matrix to assess classification accuracy. The ultimate goal is to determine the most effective model for ECG signal categorization (see Fig. 2).

3.1 Dataset

The ECG Heartbeat Categorization Dataset comprises two collections of heartbeat signals derived from two well-known classification datasets: the MIT-BIH Arrhythmia Database and the PTB Diagnostic ECG Database. Both datasets contain a sufficiently large number of samples to train deep neural networks. This dataset is used to explore heartbeat classification with deep neural network architectures and to investigate the capabilities of transfer learning. The signals in the dataset reflect the shape of the electrocardiogram (ECG) under both normal conditions and various arrhythmias and myocardial infarctions.

The database consists of 4 files:

-mitbih train.csv

-mitbih test.csv

ptbdb abnormal.csv

ptbdb normal.csv

And five classes:

-Normal (N)

-Premature ventricular contraction R-on-T (PVC R-on-T)

-Premature ventricular contraction (CVP)

-Premature or ectopic supraventricular beat (SP or EB)

-Unclassified beat (UB)

The ECG Heartbeat Categorization Dataset is widely used in the academic research community for detecting and classifying arrhythmic heartbeats. The signals are preprocessed and segmented, with each segment corresponding to a single heartbeat.

3.2 The CNN model

The convolutional neural network (CNN) is a significant technology in the field of deep learning. It is a specialized type of neural network model designed to process multidimensional data, such as images and videos. CNNs leverage the structured properties of spatial, temporal, or multidimensional data by applying successive convolution and pooling operations (see Fig. 3).

CNNs are widely employed in machine learning and artificial intelligence for tasks such as object recognition, image segmentation, face detection, and image classification They have made substantial advancements in these areas due to their ability to effectively extract relevant features from data [18].

3.2.1 ResNet model

The ResNet (Residual Network) model is a type of convolutional neural network (CNN) introduced in 2015 to address the issue of performance degradation associated with increasing network depth. It employs residual connections, or” shortcuts,” that bypass certain layers, thus facilitating gradient propagation during training and mitigating the problem of vanishing gradients2 (see Fig. 4).

-Residual blocks consist of several stacked convolutional layers with a residual connection that adds the block’s input to its output. This design allows the gradient to flow more smoothly through the layers3.

- Residual connections can skip one or more convolutional layers, creating” high ways” for information to traverse the network more effectively.

- Residual blocks often incorporate a” bottleneck” design, using 1x1 convolutions to reduce dimensionality, followed by 3x3 convolutions, and then 1x1 convolutions to restore dimensionality. This approach reduces the number of parameters.

- Typical ResNet models, such as ResNet-50, ResNet-101, and ResNet-152, have 50, 101, and 152 layers of depth, respectively. These deeper networks offer improved performance compared to shallower networks on complex tasks, such as image classification [19].

3.2.2 WaveNet model

WaveNet is an advanced deep learning-based speech generation model developed by Google DeepMind. This model is capable of directly modeling raw speech data and excels in tasks such as text-to-speech and speech synthesis. WaveNet employs dilated convolutions and causal convolutions. Dilated convolutions expand the receptive field as a function of network depth, enabling the model to effectively capture long-range dependencies in speech data [20].

The core component of WaveNet is causal convolution, which ensures that the model maintains the correct temporal order of data when processing it. This is achieved by using causal convolutions, as illustrated in the figure. For images, the equivalent of causal convolution is masking convolution, which can be implemented by constructing a masking tensor and convolution kernel for pointwise multiplication before applying it. In contrast, for one-dimensional data such as audio, causal convolution is easier to implement, and the output of the convolution can be shifted temporally. In this work, a deep learning framework using WaveNet is trained and applied to ECG heartbeat classification based on the dataset (see Fig. 5).

3.3 Performance measurement

The primary task of the CAD system is to classify ECG signals, which is challenging due to their complex morphology, noise, and the need for robust data extraction. Accurate classification necessitates a thorough understanding of anatomy and variations in the signals, making medical signal analysis quite complex. To address this issue, we implemented a simplified approach using deep neural networks (CNNs) to identify anomalies. CNNs are well-known for their ability to extract significant features from raw signal. In this study, we developed a custom architecture and fine-tuned the parameters to create a comprehensive solution model.

• Confusion matrix: The confusion matrix, also known as the error matrix, is an N×N matrix used to evaluate the performance of a classification model, where N is the number of target classes. This matrix establishes a comparison between the actual values of the targets and the predictive values of the machine learning model, thus providing an in-depth understanding of the model’s performance and the types of errors it produces. It comprises four essential values [21].

• True Positive (TP): A true positive result occurs when a test correctly identifies an actual condition or result. Example: In a medical test for a disease, a true positive result means that the test indicates that a person has the disease and that person actually has the disease.

• True Negative (TN): A true negative result occurs when a test correctly identifies a result that the disease is not present or that is not actually present. Example: For the same medical test, a true negative result means that the test indicates that a person does not have the disease and that person actually does not have the disease.

• False Positive (FP): A false positive result occurs when a test incorrectly identifies a result that the disease is present or that is not actually present. Example: In a medical test, a false positive result means that the test indicates that a person has the disease but that person actually does not have the disease [22].

• False Negative (FN): A false negative result occurs when a test incorrectly identifies a disease or result that does not actually exist. Example: In medical testing, a false negative result means that the test indicates that a person does not have the disease, but the person actually does have the disease. • Balanced accuracy: Balanced accuracy is a measure used to assess the quality of a classifier, particularly when classes are imbalanced. one class appears more frequently than another. This is often the case when anomalies or diseases are detected. Balanced accuracy normalises positive and negative predictions according to the number of positive and negative strains, respectively, and calculates their average by dividing the sum by two [23].

$$\:\mathbf{B}\mathbf{a}\mathbf{l}\mathbf{a}\mathbf{n}\mathbf{c}\mathbf{e}\mathbf{d}\mathbf{a}\mathbf{c}\mathbf{c}\mathbf{u}\mathbf{r}\mathbf{a}\mathbf{c}\mathbf{y}\:=\frac{\mathbf{T}\mathbf{P}\mathbf{R}+\mathbf{T}\mathbf{N}\mathbf{R}}{2}\left(1.1\right)$$

• Precision: Precision corresponds to the proportion of correctly predicted positive observations in relation to the total number of predicted positive observations [23].

$$\:\mathbf{P}\mathbf{r}\mathbf{e}\mathbf{c}\mathbf{i}\mathbf{s}\mathbf{i}\mathbf{o}\mathbf{n}\:=\:\frac{\mathbf{T}\mathbf{P}}{\mathbf{T}\mathbf{P}\:+\mathbf{F}\mathbf{P}}\:$$

1.2

• Recall: Recall is the ratio of correctly predicted positive observations to all observations in the actual class.

$$\:\mathbf{R}\mathbf{e}\mathbf{c}\mathbf{a}\mathbf{l}\mathbf{l}\:=\frac{\mathbf{T}\mathbf{P}\:}{\mathbf{T}\mathbf{P}\:+\mathbf{F}\mathbf{N}}\:\left(1.3\right)\:$$

• F1-Score: The F1 score is a harmonic measure of accuracy and memorability. This measure balances false positives and false negatives, offering a single score that takes into account all aspects of classification performance.

$$\:\mathbf{F}1-\mathbf{s}\mathbf{c}\mathbf{o}\mathbf{r}\mathbf{e}\:=\frac{2\:\mathbf{*}\mathbf{T}\mathbf{P}\:}{2\mathbf{T}\mathbf{P}\:+\mathbf{F}\mathbf{P}\:+\mathbf{F}\mathbf{N}}\:\left(1.4\right)$$

• AUC: The area under the curve (AUC) is an indicator of the performance of a classification model, representing the threshold between the rate of true positives and the rate of false positives.

$$\:\mathbf{A}\mathbf{U}\mathbf{C}\:=\frac{1}{2}\varvec{*}(\frac{\mathbf{T}\mathbf{P}}{\mathbf{T}\mathbf{P}\:+\mathbf{F}\mathbf{N}\:}+\frac{\mathbf{T}\mathbf{N}}{\mathbf{T}\mathbf{N}\:+\mathbf{F}\mathbf{P}})$$

1.5

• The ROC curve: The ROC curve visually illustrates the difference between sensitivity and specificity, with the x-axis representing the false positivity rate and the y-axis representing the true positivity rate. The Area Under the Curve (AUC) assesses the ability of the ROC curve to distinguish between classes [24].

To evaluate the results of our ResNet and WaveNet models, we adjusted various model parameters and hyperparameters, including learning rate, activation function, and optimizer [25].

This study uses ROC curves, confusion matrices, the value of the loss function, and the area under the curve (AUC) to assess performance.

The diagram below (Fig. 6) outlines the stages of our work, beginning with data preprocessing, followed by model development, and concluding with performance evaluation.

The process for developing and evaluating a convolutional neural network (CNN) model begins with a preliminary analysis of the data. This is followed by model development, during which the CNN architecture is established and hyperparameters are tuned. Once the model is finalized, it undergoes evaluation to assess its performance. Before comparing the results with existing work, the performance is measured using metrics such as loss, accuracy, AUC, confusion matrix, and ROC curve.

4.1 Classification using ResNet

To classify ECG signals using the ResNet model, we adjust several hyperparameters to better understand and evaluate the model’s performance.

• Learning rate

The results indicate an overall trend towards improved performance with a lower learning rate, though fluctuations in validation metrics suggest potential trade-offs between underfitting and overfitting. The table underscores how the learning rate affects the model’s ability to generalize to new data, emphasizing the importance of selecting an appropriate learning rate to optimize model performance (see Table 2).

Table 2

Model performance as a function of learning rate
Learning Rate	Loss	AUC	Val-Loss	Val-AUC
0.002	0.0301	0.9995	0.0694	0.9971
0.001	0.0194	0.9998	0.0620	0.9980
0.0001	0.0169	0.9997	0.0653	0.9976

• Activation function

Table 3

Model performance as a function of activation function
Activation Function	Loss	AUC	Val-Loss	Val-AUC
Relu	0.0253	0.9997	0.0521	0.9984
Elu	0.0355	0.9994	0.0652	0.9975
Tanh	0.0561	0.9986	0.0713	0.9975

For the Tanh function, both the training and validation losses are the highest among all functions tested. Additionally, the area under the curve (AUC) is also the lowest, indicating that the model using Tanh may perform poorly on both training and validation data compared to those using the ReLU and ELU functions. Thus, it appears that Tanh is less effective for this specific model, resulting in inferior performance in terms of generalization and accuracy as shown in the Table 3.

• Optimizer

Table 4

Model performance as a function of optimizer
Optimizer	Loss	AUC	Val-Loss	Val-AUC
RMSprop	0.0287	0.9995	0.0534	0.9993
SGD	0.0022	0.9990	0.0043	0.9980
Adam	0.0138	0.9998	0.0212	0.9997

Based on the data presented in the table, it is evident that the Adam optimizer demonstrates superior performance in terms of the area under the curve (AUC) for both training and validation datasets. SGD closely follows Adam in AUC performance (see Table 4).

However, while RMSprop is effective, it does not surpass Adam or SGD in terms of minimizing losses and achieving high AUC. These results highlight the importance of selecting the appropriate optimizer, as it significantly impacts model performance. Adam, in particular, achieves a favorable balance between minimizing losses and maximizing accuracy.

Ultimately, based on the hyperparameters, we select the optimal parameters that yield the best performance for ECG signal classification. This is determined by evaluating AUC and loss at the validation and test levels.

The AUC graph illustrates how the accuracy score evolves with the number of epochs. The AUC for training is shown by the red curve, and the AUC for validation is shown by the green curve. Both curves increase rapidly and approach nearly 1 (indicating very high accuracy) after just a few epochs. The loss graph displays the decrease in loss for both training (red curve) and validation (green curve) as a function of the number of epochs. The loss decreases quickly and stabilizes at near zero after a few epochs, indicating that the model is well adapted.

These graphs (see Fig. 7) demonstrate that the model is learning effectively and converging quickly after a few training epochs.

The confusion matrix further summarizes the performance of our model for each class in the dataset (see Fig. 8).

These matrices allow for a comparative analysis of the classification model’s performance during validation and testing. Each matrix displays the model’s predictions alongside the actual labels, providing a visual representation of the model’s classification accuracy. This helps in identifying areas where the model performs well and are as where it maybe failing, for both validation and testing phases.

The data presented in the table indicate that the model performs exceptionally well on both the validation and test datasets. Precision and accuracy scores are nearly identical for both datasets, suggesting that the model generalizes effectively and provides consistent results with new, unseen data. As light decrease in scores for the test set compared to the validation set is expected, reflecting the natural variability between different datasets.

Overall, the results affirm the model’s robustness and effectiveness, demonstrating its capability to make accurate predictions with unfamiliar data. These findings validate the model’s reliability in real-world scenarios, although some fine-tuning may be needed to enhance performance when dealing with specific or diverse datasets

Table 5

Accuracy scores for different training models
Training model	Accuracy score
1 2 3 4 5 6 7 8 9 10	0.9841 0.9841 0.9843 0.9852 0.9851 0.9847 0.9853 0.9851 0.9865 0.9864

The code seems to include a loop responsible for training several models, in particular ten models (num models = 10), using the training data supplied (xtrain, ytrain) as well as the validation data (xval, yval) (see Table 5).

In the context of deep learning or data analysis, this table (Table 6) is a crucial tool for evaluating and selecting the most effective model. It offers a clear and effective comparison of accuracy scores across different training models.

Table 6

Comparison of precision and accuracy scores for validation and test sets
Métrique	Validation	Test
Précision-score	0.9828	0.9818
Accuracy score	0.9831	0.9823

The average accuracy is 0.9851315549059017.

Table 7

F1-Scores for Model Validation
Class	Description	F1-Score
0	Normal Beat	0.9911
1	Supraventricular Premature	0.8390
2	Premature Ventricular Contraction	0.9483
3	Fusion of Ventricular and Normal Beat	0.8068
4	Unclassifiable Beat	0.9865

Table 8

F1-Scores for Model Testing
Class	Description	F1-Score
0	Normal Beat	0.9913
1	Supraventricular Premature	0.8151
2	Premature Ventricular Contraction	0.9476
3	Fusion of Ventricular and Normal Beat	0.7638
4	Unclassifiable Beat	0.9850

For most classes, the model achieves high F1 scores, with particularly strong values for normal beats (0.9911) and unclassifiable beats (0.9865). Performance is more moderate for supraventricular premature beats (0.8390) and beat fusion (0.8068) see Table 7.

The results of the model tests, as shown in the Table 8, indicate high F1 scores for normal beats (0.9913) and unclassifiable beats (0.9850). However, the scores are lower for supraventricular premature beats (0.8151) and beat fusion (0.7638).

The two tables in the Fig. 9 display the results of a classification model: one for validation data and the other for test data. Each table includes columns for precision, recall, F1 score, and support for various classes (0 to 4), which likely represent different categories. The model demonstrates excellent performance, particularly for the main class (class 0).

The confusion matrix for the ResNet model, after the validation and testing phases of ECG signal classification, is illustrated in the Fig. 10.

In the confusion matrix, the actual labels (rows) are compared with the model’s predictions (columns). According to this matrix, the model performs satisfactorily for certain categories, particularly for normal beats.

The ROC curve illustrates the exceptional performance of the ResNet model in classifying each category: normal beats, supraventricular premature beats, premature ventricular contractions, fusion of ventricular and normal beats, and unclassified beats (see Fig. 11).

The ROC (Receiver Operating Characteristic) curve, shown in the figure, evaluates the performance of the classification model across various classes. The axes represent the True Positive Rate versus the False Positive Rate, with the dotted line indicating random performance. The figure demonstrates that the model performs remarkably well for all classes, with AUCs approaching 1.

4.2 Classification using WaveNet

WaveNet is employed for ECG signal classification using the ECG Heartbeat Categorization database. To assess the performance of this model, we tune various hyperparameters, including the learning rate, batch size, number of epochs, and optimizer type.

• Learning rate

Table 9

Performance Metrics for Different Learning Rates
Learning Rate	Loss	AUC	Val-Loss	Val-AUC
0.0003	0.0095	0.9966	0.0930	0.9852
0.01	0.0907	0.9745	0.0974	0.9763
0.001	0.0096	0.9971	0.1053	0.9843

Witha learningrateof0.0003, the model demonstrates exceptional performance, achieving a training accuracy of 0.9966 and a training loss of 0.0095. However, the validation loss (0.0930) is higher than that observed with a learning rate of 0.01. Despite this, the validation accuracy remains relatively high at 0.9852.

Ata learning rate of 0.01, the model shows a lower training loss (0.0907) and a training accuracy of 0.9745 compared to the rate of 0.0003. It achieves a validation loss (0.0974) slightly better than with a learning rate of 0.0003, and a validation accuracy (0.9763) nearly identical to that with a rate of 0.0003.

With a learning rate of 0.001, the model attains a remarkable training accuracy of 0.9971 and a very low training loss of 0.0096. However, the validation loss (0.1053) is higher than with the other learning rates, and the validation accuracy (0.9843) is lower compared to the learning rate of 0.01.

In summary, the Table 9 provides crucial data on the impact of different learning rates on WaveNet model performance. Comparing these rates helps in selecting the optimal learning rate to enhance model performance, balancing training and validation accuracy along with the associated losses.

• Activation Function

Table 10

Performance Metrics for Different Activation Functions
Activation Function	Loss	AUC	Val-Loss	Val-AUC
Relu	0.0095	0.9966	0.0930	0.9852
Elu	0.0227	0.9922	0.0796	0.9849
Tanh	0.0331	0.9891	0.0617	0.9847

This table (Table 10) compares the performance of the WaveNet model using various activation functions. The results indicate:

ReLU offers the best training performance, although it comes with a higher validation loss compared to ELU and Tanh. While it achieves a good balance between training loss and accuracy, its validation performance is not as strong.

Tanh results in the lowest validation loss but has higher training loss and lower training accuracy.

This information helps in selecting the most suitable activation function to optimize model performance, based on both training and validation metrics.

• Optimizer

Table 11

Performance Metrics for Different Optimizers
Optimizer	Loss	AUC	Val-Loss	Val-AUC
RMSprop	0.0285	0.9910	0.0728	0.9835
SGD	0.4937	0.9915	0.0738	0.9841
Adam	0.0095	0.9966	0.0930	0.9852

This table (Table 11) compares the performance of the WaveNet model using various optimizers. The results show that:

Adam achieves the best training performance with the lowest loss but has a higher validation loss.

RMSprop and SGD show more moderate training and validation results, with SGD demonstrating a slight improvement in validation characteristics. This information is crucial for selecting the most appropriate optimizer for the model, based on both training and validation metrics.

The F1 score, which evaluates the performance of a classification model, is particularly useful in cases of class imbalance or when the costs of false positives and false negatives differ. The F1 score obtained is 0.9047.

Additionally, the loss and accuracy curves, along with the confusion matrix, help determine whether the model is learning effectively, as shown in Fig. 12.

The graphs presented in the Fig. 12 illustrate the training and validation curves for a deep learning model:

-Training Loss: Represented by the blue curve, it decreases steadily as the model trains.

-Validation Loss: Represented by the orange curve, it also decreases as the model trains.v Accuracy Curves:

-The blue curve shows training accuracy, which regularly increases and exceeds 0.99.

-The orange curve represents validation accuracy, which also improves over time. These curves indicate that the model is performing well and converging effectively.

Additionally, Fig. 13 presents a confusion matrix to evaluate the performance of the classification model. Each row of the matrix represents the actual class values, while each column represents the model’s predictions. This matrix is used to assess the errors and accuracy of the model for each class.

The ROC curve illustrates the true positive rates for each class in our database, as shown in Fig. 14.

These curves indicate that the model has an exceptional ability to distinguish between different classes, with AUC scores very close to 1. This demonstrates performance that is nearly identical to that of the previous model.

4.3 Results Resnet VS Wavenet

Comparing the ResNet and WaveNet models based on their accuracy rates. WaveNet achieves an accuracy of 0.9851, reflecting its exceptional performance on the tested data.

ResNet, with an accuracy of 0.9966, also demonstrates very high performance, though it is slightly lower than WaveNet.

According to this comparison, WaveNet shows a marginally higher accuracy than ResNet. However, both models exhibit excellent performance overall. The small difference between them suggests that both are highly effective and can be selected based on the specific requirements of the problem at hand (see Fig. 15).

4.4 Comparison between state-of-the-art methods and those of the system used

In this project, both methods (ResNet and WaveNet) achieved comparable results in data classification, underscoring the effectiveness of using CNNs for this type of problem. It is essential to test various algorithms to determine which is best suited for a particular issue, as no single algorithm is universally superior to others.

The ResNet and WaveNet models are far superior to the others in terms of accuracy. ResNet stands out from the previous methods with an accuracy of 0.9851, while WaveNet has a remarkable accuracy of 0.9966, which puts it at the top of the performance table. By comparison, other models such as those by Taissir Fekih Romdhane et al [10] and Mengze Wu et al [9] have accuracies of 0.9841 and 0.9741 respectively. In this way, our models show improved performance, providing more accurate results than previous methods.

This table (Table 12) facilitates the comparison of different models or methods based on their accuracy

Table 12

Comparison of Accuracy Across Various Models
Model	Accuracy
Maciej et al. [6]	82.06%
Guo et al. [7]	92%
Joao Pestana et al. [8]	70.74%
Jing Qin et al. [9]	95.5%
Abrar Alamr et al. [10]	89.5%
Singh et al. [11]	88%
Kachuee et al. [12]	93%
Vandana Singh et al. [13]	90.8%
Mengze Wu et al. [14]	97.41%
Taissir Fekih Romdhane et al. [15]	98.41%
ResNet	98.51%
WaveNet	99.66%

Cardiovascular disease remains a major health issue today. Electrocardiography (ECG) is crucial for the early detection of cardiac arrhythmias. However, specialized medical resources are limited, making the visual identification of ECG signals both challenging and time-consuming. Our project focuses on specific classes within the ECG Heartbeat Categorization database, including Normal, Premature Ventricular Contraction, Supraventricular Premature Beat, and Unclassified Beat.

The CNN model employed in this study has demonstrated superior accuracy and reliability for multi-class classification of arrhythmia data. This CNN model achieved an impressive overall classification accuracy of 0.99 and a high positive prediction rate.

The advantages of using a CNN are evident. It efficiently processes the unfiltered dataset and exhibits strong anti-noise characteristics. However, one potential drawback is that training the network is highly computationally intensive, largely due to the vast amount of data required for deep learning.

Financial support

This research received no external funding.

Conflicts of interest

None.

Author Contribution

Hnia Chettaoui: Conceptualization of the study, data collection, and implementation of the CNN models.Rabeb Mouaddeb: Preprocessing of ECG signals, experimental design, and initial analysis of results.Tarek Moulahi: Supervision, validation of the methodology, and critical review of the manuscript.Najoua Bennaji: Project administration, oversight of the research framework, and final editing of the manuscript for submission.All authors have read and approved the final version of the manuscript.

Acknowledgements

We sincerely thank the reviewers for their constructive comments, which aided in refining the structure of the survey and improving its quality.

Ethical standards

Does not apply.

Hampton, John R.: The ECG in Practice: Nottingham, UK, Churchill Living stone. 22–97 (2013)
Kahina, O., Mokrane, S.: Analysis of ECG signal using time-frequency representation. Mouloud Mammeri University (2012)
HAOUALA, Mo.: ECG Signal Classification Using Convolutional Neural Networks (CNN). Universit´ e 08 MAI 1945 Guelma (2022)
Ahmed et al.: Deep learning modelling techniques: current progress, applications, advantages, and challenges. Artificial Intelligence Review. 56, 11, 13521-13617 (2023)
Hang, Li.: Deep learning for natural language processing: advantages and challenges. National Science Review. 5, 1, 24-26 (2018)
GOLGOWSKI, Maciej, Osowski, Stanislaw.: Classical versus deep learning methods for anomaly detection in ECG using wavelet transformation. database, 1, 6, (2021)
Guo, Li., Sim, Gavin, Matuszewski, Bogdan.: Inter-patient ECG classification with convolutional and recurrent neural networks, Biocybernetics and Biomedical Engineering. 39, 3, 868–879 (2019)
Pestana, J., Belo, D., Gamboa, H.: Detection of Abnormalities in Electrocardiogram (ECG) using Deep Learning. BIOSIGNALS. 236–243 (2020)
Jing et al.: A novel temporal generative adversarial network for electrocardiography anomaly detection. Artificial Intelligence in Medicine. Elsevier, 136, 102489 (2023)
Alamr, A., Artoli, A.: Unsupervised transformer-based anomaly detection in ECG signals. Algorithms, 16, 3, 152 (2023)
Singh et al.: Classification of ECG arrhythmia using recurrent neural networks. Procedia computer science. 132, 1290–1297 (2018)
Mohammad, K., Shayan, F., Sarrafzadeh, M.: Ecg heartbeat classification: A deep transferable representation. 2018 IEEE international conference on healthcare informatics. 443–444 (2018) 27
Singh et al.: A generic and robust system for automated detection of different classes of arrhythmia. Procedia Computer Science. 167, 1801–1810 (2020)
Mengze, Wu., Yongdi, Lu., Wenli, Yang., Shen Yuong. Wong.: A study on arrhythmia via ECG signal classification using the convolutional neural network. Frontiers in computational neuroscience. 14, 564015 (2021)
Romdhane, T., Haikel, A., Ridha, O., Atri, M..: Electrocardiogram heartbeat classification based on a deep convolutional neural network and focal loss. Computers in Biology and Medicine. 123, 103866 (2020)
Singh et al.: A generic and robust system for automated detection of different classes of arrhythmia. Procedia Computer Science. 167, 1801–1810 (2020)
Lei et al.: A dilated CNN model for image classification. 7, 124087–124095 (2019)
Bhattet al.: CNN variants for computer vision: History, architecture, application, challenges and future scope. Electronics, 10, 20, 470 (2021)
Yuanyuan et al.: ECG Heartbeat Classification Detection Based on WaveNet LSTM. 2020 IEEE 4th International Conference on Frontiers of Sensors Technologies (ICFST). 54–58 (2020)
Shu et al.: Classification of heart sound signals using a novel deep WaveNet model. Computer methods and programs in biomedicine. 196, 105604 (2020)
Aguiar et al.: Decoding crystallography from high-resolution electron imaging and diffraction datasets with deep learning. Science Advances. American Association for the Advancement of Science. 5, 10, 1949 (2019)
Saidi, Sara, Boukhari, Nesrine. : Détection du covid-19 `à partir des images radiographiques grâce `a deep learning. (2022)
HAMICI, L. : L’appariement des données ECG `a base des séries chronologiques. Université de guelma (2022)
Myerson, J., Green, L., Missaka, W.: Area under the curve as a measure of discounting. Journal of the experimental analysis of behavior, 76, 2, 235–243 (2001)
Rajkumar, A., Ganesan, M., Lavanya, R.: Arrhythmia classification on ECG using Deep Learning. 2019 5th international conference on advanced computing & communication systems (ICACCS), 365–369 (2019)

No competing interests reported.

Classification of ECG signals using CNN models

Status:

Version 1

Abstract

Figures

1 Introduction

1.1 Motivations

1.2 Objectives

1.3 Paper Organizations

2 Releated works

3. The proposed Method

4 Results and discusion