Cardiovascular disease (CVD) involves the heart and blood vessels and can lead to premature mortality [1]. CVD includes coronary heart disease (CHD), cerebrovascular disease, rheumatic heart disease, and other heart conditions. Approximately 17.9 million people die annually from CVD, which account for 31% of the total deaths worldwide [2]. In Malaysia, the incident of ischemic heart disease has substantially increased by 54% within 10 years and remained as the principal cause of deaths in 2017 [3]. CVD risk factors, namely, diabetes mellitus (DM), hyperlipidemia, obesity, hypertension, age, gender, smoking, and inactive lifestyle, are important predictors of CVD risk [4–5]. The Malaysian Cohort (TMC) project, which was initiated in 2006 to address the rising trends in non-communicable diseases (NCD), is a large prospective study involving 106,527 multiethnic participants [6]. More than 2000 parameters, including lipid profile, fasting blood glucose (FBG), body composition, blood pressure, and electrocardiogram (ECG), were obtained or measured from each participant.
ECG measures the electrical activity of the heart and has been extensively used in detecting heart diseases because of its simplicity and noninvasiveness. Moreover, independent risk markers for cardiovascular deaths can be found from ECG metrics, [7] which provide comprehensive information on cardiac rhythms and conduction patterns. The standard ECG uses 12 leads from 12 vantage points recorded using 10 electrodes, six of which were on the chest wall and four were on the limbs. The three limb leads are used to generate a recording, whereas the right leg lead serves as an electrical ground [8]. Among the 12 leads, lead II, which measures the potential difference between the electrodes attached to the right arm and left leg, is commonly utilized for diagnosing heart diseases. Lead II readings highlight various segments within the heartbeat and displays three of the most important waves: P, QRS, and T
[9]. The R–R interval is the time between the R peak of a heartbeat with respect to another heartbeat. The heart rate variability (HRV), which is abnormal in patients with coronary artery disease, DM, and coronary heart failure, is the interval between the consecutive normal heartbeats that reflects cardiac autonomic function [10–11]. Yadav [12] found a significant correlation among the indices of HRV by using the root mean square of successive differences (RMSSD, p = 0.018) and R–R intervals (p = 0.010). According to O’Neal [13], the standard deviations (SDs) of the R–R intervals and RMSSD are associated with an increased risk of CVD and all-cause mortality and vary by sex and race [14]. HRV can also serve as the main predictor of future vascular events [15].
Breathing rate (BR) is a key physiological parameter used in a range of clinical settings. Among the vital signs measured in acutely ill hospital patients, BR provides a highly accurate prediction of deterioration [16]. Despite the diagnostic and prognostic values, BR is still widely measured by manually counting breaths. Many algorithms have been proposed to estimate BR from ECG and photoplethysmogram signals. These BR algorithms provide opportunity for the automated, electronic, and unobtrusive measurement of BR in healthcare and fitness monitoring [17].
Machine learning (ML)-based artificial intelligence, such as knowledge-based expert systems, differs from other methods and is extensively used in the classification and prediction of CVDs [18]. The well-known ML algorithms have four types: supervised, unsupervised, semi supervised, and reinforcement learning. The supervised learning methods, which include linear discriminant analysis (LDA) [19], support vector machine (SVM) [20–22], decision tree (DT) [23], k-nearest neighbor (kNN) [24], artificial neural network (ANN) [9; 25], logistic regression [26], and fuzzy logic [27], are widely used for group classification. ANN is widely applied in predicting CHD [28], whereas SVM is frequently adopted in classifying arrhythmia [29]. The capabilities of the new ML algorithms in deep learning, such as convolutional neural network (CNN), are recently explored. Acharya [30] compared the accuracy, sensitivity, and specificity of CNN with and without noise from ECG signals.
The present study aims to identify the most significant parameters extracted from ECG signals for CVD prediction by using six types of supervised ML techniques, namely, LDA, linear and quadratic SVMs, DT, kNN, and ANN. To the best of our knowledge, this study is the first to use the raw ECG waveform in predicting CVD among Malaysian subjects. A predictive model for CVD diagnosis at an early stage is crucial in reducing and preventing the morbidity and mortality due to CVD. Furthermore, a solution for this issue is timely because the Malaysian Ministry of Health has launched the National Strategic Plan for Non-Communicable Disease (NSP-NCD 2016–2025) in response to the global challenge in combating NCDs in general and CVD in particular.