A. ECG databases
We chose the ten databases described in Table 2 after reviewing several ECG databases in turn, including the database abbreviation, full name, and categories used in this work.
Table 2 – Each ECG database and category
Database
|
Category
|
MIT-BIH Supraventricular Arrhythmia Database (SVDB) [13, 14]
|
NB, V, S, |
|
MIT-BIH Long Term ECG Database (LTDB) [13]
|
NB, S, V, F
|
PAF Prediction Challenge Database (AFPDB) [15]
|
NB, PAF
|
MIT-BIH Arrhythmia Database (MITDB) [13, 16]
|
NB, L, R, V, /, A, +, F, ~
|
MIT-BIH ST Change Database (STDB) [13, 17]
|
NB, V, S, ~
|
European ST-T Database (EDB) [18]
|
NB, ~, V, +, s, T, S, F
|
Sudden Cardiac Death Holter Database(SDDB) [19]
|
NB, r, S, V, s, +, E
|
MIT-BIH Normal Sinus Rhythm Database (NSRDB) [13]
|
NB
|
MIT-BIH Atrial Fibrillation Database(AFDB) [13, 20]
|
AF
|
AHA Database Sample Excluded Record (AHADB) [13]
|
NB, V
|
Table 3 – Full name and quantity of each ECG category
Category
|
TR
|
VA
|
TE
|
Normal beat (NB)
|
172995
|
57666
|
57668
|
Paroxysmal atrial fibrillation (PAF)
|
82382
|
27461
|
27461
|
Atrial fibrillation (AF)
|
414000
|
138000
|
138000
|
Premature ventricular contraction (V)
|
27899
|
9301
|
9304
|
Premature or ectopic supraventricular beat (S)
|
58750
|
19582
|
19587
|
MA (A, E, L, R, T, r, s, +, /, |, ~)
|
60449
|
20153
|
20157
|
Atrial premature contraction (A)
|
1510
|
504
|
504
|
Ventricular escape beat (E)
|
247
|
83
|
83
|
Left bundle branch block beat (L)
|
4813
|
1605
|
1605
|
Right bundle branch block beat (R)
|
4312
|
1438
|
1438
|
T-wave change (T)
|
793
|
265
|
265
|
R-on-T premature ventricular contraction (r)
|
35275
|
11759
|
11759
|
ST change (s)
|
2788
|
929
|
930
|
Rhythm change (+)
|
1729
|
576
|
577
|
Paced beat (/)
|
2150
|
717
|
718
|
Isolated QRS-like artifact (|)
|
1315
|
438
|
439
|
Signal quality change (~)
|
5517
|
1839
|
1839
|
As shown in Table 2, NSRDB and AFDB have only one category in these databases. At the same time, we eliminated only a few disease categories. Table 3 shows the full name and number of each ECG category.
In Table 3, because the number of the last 11 categories is relatively less, we combine them into one category, which will be referred to in the text as multiple abnormality (MA). There are 6 categories of ECG data after the final fusion. TRS, VAS and TES are the samples of training set, validation set and test set respectively. In addition to the learning rate, the rest of the hyperparameters we use in the training process are the standard parameters, which will be conducive to the subsequent comparison between different structural blocks. The division ratio of training set, validation set and test set in this work is 3:1:1, and the PCG dataset in Table 4 and the synchronized ECG-PCG dataset in Table 6 are consistent with the division ratio. At the same time, due to the noise and interference involved in the collection process of each data, it is necessary to filter them before data segmentation.
For each ECG dataset, we need to intercept ECG segments based on R-wave points and sampling rate. In this study the sampling rate was unified to 200 Hz and the dataset consisting of multiple ECG data fusions is called ECGF.
B. PCG databases
Consistent with the processing of ECG datasets, NHS is present in multiple databases in Table 4, so the remaining categories are merged into one category after data fusion. The full name and number of each category is shown in Table 5.
Table 4 – Each PCG database and category
Database
|
Category
|
Classification of Heart Sound Recordings: The PhysioNet in Cardiology Challenge 2016 (C2016) [13, 21]
|
NHS, AHS
|
Classification of Heart Sound Signal Using Multiple Features (Y2018) [22]
|
NHS, AS, MS, MR, MVP
|
A machine learning challenge to classify heart beat sounds (K2016) [23]
|
NHS, ECS, CM, AHS
|
Heart Sound & Murmur Library (Mfour) [24]
|
NHS, APA, AA, AMA
|
Table 5 – Full name and quantity of each PCG category
Category
|
TR
|
VA
|
TE
|
Normal heart sound (NHS)
|
11585
|
3863
|
3864
|
AHSS (AHS, AS, MR, MS, MVP, ECS, CM, FHS, APA, AA, AMA)
|
5515
|
1840
|
1841
|
Abnormal heart sound (AHS)
|
3380
|
1127
|
1127
|
Aortic stenosis (AS)
|
120
|
40
|
40
|
Mitral regurgitation (MR)
|
120
|
40
|
40
|
Mitral stenosis (MS)
|
120
|
40
|
40
|
Mitral valve prolapse (MVP)
|
120
|
40
|
40
|
Extra cardiac sound (ECS)
|
217
|
73
|
73
|
Cardiac murmur (CM)
|
554
|
185
|
185
|
Artificial heart sound (FHS)
|
192
|
64
|
64
|
Abnormal pulmonary artery (APA)
|
162
|
54
|
54
|
Abnormal aorta (AA)
|
109
|
37
|
37
|
Abnormal mitral valve (AMA)
|
421
|
140
|
141
|
In Table 5, 11 types of PCG signals are integrated as abnormal heart sounds (AHSS). To split the PCG data, first the sampling rate of these data must be unified to 2000Hz, then the data segment with the length of 6000 is extracted, and finally the length is reduced to 2000. The dataset resulting from the fusion of several PCG data is called PCGF.
C. Synchronized ECG-PCG database
Table 6 – Synchronized ECG-PCG database and category
Database
|
Category
|
EPHNOGRAM: A Simultaneous Electrocardiogram and Phonocardiogram Database (EP2021) [25]
|
RS, RL, EP, ESW, EBP, EW, EBS
|
All synchronized ECG-PCG data are divided into 7 categories according to the marked acquisition status. Database information is shown in Table 6. The full names of the different categories and data division are listed in Table 7. Among them, RS and RL are the data collected under different rest states, EP, ESW, EBP, EW and EBS are under different motion states.
Table 7 – Full name and quantity of each synchronized data category
Category
|
TR
|
VA
|
TE
|
Rest: sitting on armchair (RS)
|
1123
|
374
|
375
|
Rest: laying on bed (RL)
|
2280
|
960
|
960
|
Exercise: pedaling a stationary bicycle(EP)
|
4680
|
1560
|
1560
|
Exercise: slow walk (7 min); fast walk (8 min); sit down and stand up (4 min); slow walk (6 min); rest (ESW)
|
1080
|
360
|
360
|
Exercise: Bruce protocol treadmill stress test (EBP)
|
3960
|
1320
|
1320
|
Exercise: walking at constant speed (3.7 km/h) (EW)
|
4680
|
1560
|
1560
|
Exercise: bicycle stress test (EBS)
|
3600
|
1200
|
1200
|
D. Data preprocessing
ECG and PCG, as important research objects in biomedical signals, are receiving more and more attention from academia and industry [26]. The advantages of the stable wavelet transform over other denoising methods have been demonstrated [27]. In this study, the stable wavelet method is used to denoise ECG and PCG signals. First, the sym8 wavelet basis function is selected to decompose the signal into 8 layers and denoise these decomposition coefficients. These layers are then reconstructed. Finally, the effects of the ECG and PCG signals before and after denoising are shown in Fig. 1. The grey curve is the original unfiltered data, while the red curve is the waveform drawn using the filtered data. From the comparison, it can be seen that the baseline drift, power frequency interference and other problems are eliminated after the ECG waveform is filtered. The comparison also shows that the filtered PCG waveform removes some of the noise and better preserves the information contained in the original waveform.
E. Our methods
The residual network is a milestone breakthrough in the field of deep learning. Due to its good performance, it has also been widely used in the field of intelligent recognition of medical data. The residual network adopts the idea of short connection, which effectively delays the degradation of the network [28]. In the exploration and research of applying ECG and PCG signals to the classification of cardiovascular diseases, the models designed by many scholars have achieved great success. The pooling layer is generally used to down-sample the input data [29], while the max-pooling layer can extract the main features of the current data, and at the same time, it can ensure that the input and output sizes are consistent by setting parameters. Compared to the extensive use of the convolutional layer, the max-pooling layer is generally used less frequently in the network. Professionals often distinguish ECG and PCG signals based on the abnormality of a particular segment of data, and the pooling layer has an excellent ability to capture the abnormality of the data and ignore other unimportant information. For this reason, the MCM and REC structure block was constructed for comparative testing.
In Fig. 2, the MCM structure block consists of: max-pooling + convolution + max-pooling + batch normalization (BN) [30] + ReLU [31]. The REC structure block consists of: convolution + BN + ReLU + convolution + BN + ReLU. At the same time, the obtained results are added to the original input data before the second ReLU. For the change of network channel number and feature number, the MCM and REC blocks are consistent and processed, which can further improve the comparability. In the MCM structure block, remove the first max-pooling layer is MCM1 structure block, remove the second max-pooling layer is MCM2 structure block, remove both the two max-pooling layers is MCM3 structure block, add a max-pooling layer before the first largest pooling layer is MCM4 structure block, and add a max-pooling layer after the last max-pooling layer of MCM4 is MCM5 structure block. In this paper, MCM1, MCM2, MCM3, MCM4 and MCM5 structure blocks are collectively referred to as variants of the MCM structure block. The specific structure is shown in the Appendix.
Compared with REC, the MCM proposed uses the stacking of max-pooling layer and convolution layer, which greatly reduces the number of parameters and computation. The overall network framework and structure block repetition times are shown in Fig. 3.
Both the MCM and REC blocks are reused 4 and 8 times respectively in Fig. 3. This is due to the different sampling rates of the ECGF and PCGF datasets. When these two structure blocks are used four times, they correspond to two models: MCM-4 and REC-4. Similarly, MCM-8 and REC-8 can be obtained by using them eight times. It can be seen that the other structures in the nets are the same, except for the structure blocks and their usage times, which can also improve the contrast. If the MCM structure block is replaced by one of its variants, then ten network models such as MCM1-4, MCM1-8 can be obtained.
For this network framework, the input data is first passed through a convolution layer, BN layer, ReLU and max-pooling layer, then a structure block was used and the number of times it was used was determined. For the ECGF dataset, the structure block must be repeated 4 times. For the PCGF and synchronized ECG-PCG datasets, the structure block must be repeated 8 times. Finally, the adaptive avg-pooling layer and the linear layer are input and the classification results are output.
When the same data is input, the parameters and computation amount of the two nets of MCM-4 and REC-4 are (534918, 1.36G) and (175840, 4.29G) respectively. Similarly, the results for MCM-8 and REC-8 are (1582791, 12.86G) and (4201799, 33.49G) respectively. It can be seen that the network model built by the MCM structural block has only a third of the number of parameters and computations compared to the REC, which has greater advantages in terms of memory usage and computational speed.