The human heart has an electric transmission system that voluntarily generates regular electrical signals and transmits these signals to the entire heart. Heart disease takes the lives of many all over the world [1, 2, 3]. However, arrhythmia, i.e., irregular heartbeats caused by changes or dysfunctions of this system, has been unfamiliar to the general public.
Arrhythmia can generally be diagnosed using a measured electrocardiogram (ECG), which is a record of the electrical activity in the heart, obtained through electrodes located on the skin of the chest and limbs. An ECG usually refers to a 12-lead ECG, which gathers 12 different types of information from the heart. To precisely classify a 12-lead ECG signal, doctors examine the ECG data and diagnose specific arrhythmias based on their medical knowledge and extensive experience. Unfortunately, judgment errors are likely to occur during this process. Even an experienced specialist requires considerable time to analyze the signals, and the accuracy may not be high [4, 5]. In addition, in the case of a Holter monitor, the cardiologist cannot see the entire signal, which is usually recorded over several days. Thus, many scholars have attempted to classify 12-lead ECG signals automatically and accurately.
Thus far, rule-based algorithms for ECG signal classification have been unsuitable for use in practice, owing to their poor performance. In addition, this classification has been approached using various machine-learning methods, e.g., logistic regression [6], support vector machines (SVMs) [7], random forests [8], and K-nearest neighbors [9, 10]. The deep-learning model is as a closed model for use in real hospitals because it exhibits much better ECG signal-classification performance than conventional classification algorithms and rule-based algorithms.
The most natural deep-learning research using ECG data involves creating a deep-learning model using 12-lead ECG information measured in a hospital. Smith et al. [11] found that the accuracy of a new deep-learning network using 12-lead ECG data was higher than that of a conventional algorithm, with 13 convolutional layers and 3 fully connected layers. However, in most cases, rather than using all the ECG information, scholars have approached ECG signal classification using the information from one specific lead; e.g., lead I or lead II. Lee et al. [12] used a residual network (ResNet) with six residual blocks and an Alex network to classify atrial fibrillation (Normal /Atrial Fibrillation), which provided accuracies of 99.9% and 99.7%, respectively.
Rajpurkar et al. [4] and Hannun et al. [5] showed that a deep-learning model exceeded average cardiologists in terms of ECG discrimination ability of 12 output rhythm classes (10 arrhythmias/Normal/Noise), using a 34-layer ResNet model. It is important to note that the data they collected were large-scale, obtained from patients in actual hospitals. Their model used 91,232 modified lead-II ECG records from 53,549 patients, recorded using Zio cardiac monitors.
Recently, beyond simple convolutional neural network (CNN) structures, attempts have been made to find a better ECG signal-classification structure by using structures that produce good results for image classification. Kim et al. [13] used the visual DenseNet architecture with 34 layers for two classifications (Normal /Abnormal), with lead-II ECG data measured in a hospital. This structure achieved an overall accuracy of 98.89% and an F1 score of 99.09%. Their results showed that a single-lead ECG, rather than the 12-lead ECGs measured in a hospital, was sufficient to distinguish between normal and abnormal.
In contrast to the methods mentioned thus far, some scholars have used short-term Fourier and wavelet transforms to convert ECG data into two-dimensional (frequency, time) data and used them as input for a deep neural network. Salem et al. [14] used the transformation “spectrogram” from a one-dimensional (1D) ECG signal from the MIT-BIH dataset and the European ST-T dataset to make 2D images. They also used a 161-layer DenseNet, pre-trained on millions of images, to extract abstract information and then applied an SVM for four-class classification (Normal Sinus/Atrial Fibrillation and Flutter/Ventricular Fibrillation/ST Segment Change). Their model’s accuracy and F1 score were 97.23% and 97.35%, respectively.
Rajput et al. [15] constructed an ECG-based heartbeat-classification model that consisted of preprocessing (filtering and segmentation), feature extraction (Morlet wavelet transform and short-term Fourier transform), and a densely connected network. Their model’s F1 score was 83.4% (Normal Sinus / Atrial Fibrillation / Sinus Tachycardia / Sinus Bradycardia / Ventricular Bigeminy / Ventricular Trigeminy / Ventricular Tachycardia / Paroxysmal Supraventricular Tachycardia (PSVT) / Noise / Ventricular Ectopic Beats (VEB)).
Thus far, deep learning’s approach to ECG classification is similar to that of image classification, with deep learning layer deepening and a complex structure. In this study, we also followed this trend to find a suitable structure, among those structures that were deeper and more complex but provided good results for existing image classification, for ECG signal multi-classification.