CVD poses a significant global health challenge, accounting for a substantial proportion of morbidity and mortality worldwide. According to statistics, CVD is the leading cause of death globally, responsible for more deaths than any other disease category [112]. It includes coronary artery disease (CAD), heart failure (HF), stroke, and peripheral vascular disease (PVD). The burden of CVD affects both developed and developing countries, with an increasing prevalence in low- and middle-income nations due to changes in lifestyle, urbanization, and population aging [113]. According to the World Health Organization (WHO), over 17.9 million people die yearly due to CVD, accounting for approximately 31% of all global deaths [114]. Alarmingly, this number is projected to rise in the coming decades [115], posing significant challenges to healthcare systems and economies. Tackling this growing burden requires comprehensive prevention strategies, early detection, effective management, and continued research to address the complex interplay of risk factors contributing to CVD development on a global scale. Figure 6 depicts the major risk components associated with CVD and the methods of data acquisition used in risk assessment. The depicted risk factors encompass both traditional elements, such as lifestyle factors (smoking status, alcohol, and fast-food consumption), as well as newly uncovered factors like genetic information and medical conditions. Moreover, some risk models also take environmental factors into consideration, further deepening the scope of risk assessment. This comprehensive integration of diverse risk factors, gathered through EHRs, medical imaging, omics data, wearables, and genetic testing, plays a crucial role in accurately and holistically assessing an individual's risk for developing CVD.
In the battle against CVD, conventional risk calculators served as the first line of defense. Framingham Risk Score (FRS), the first-ever CVD risk calculator, formulated in 1967 is pivotal in risk assessment [116]. Over the years, various risk calculators have evolved to address the limitations of earlier models, providing insights into specific CVD outcomes, such as coronary heart disease (CHD), stroke, and composite CVD events, in diverse target populations worldwide. Table Ⅲ discusses the 15 cardiovascular risk studies, each focusing on different CVD risk calculators. The CVD risk calculators in these studies incorporate a wide range of risk factors and biomarkers, including office-based (OBBM) and laboratory-based (LBBM) biomarkers. Some studies have also explored the integration of radiomics-based biomarkers (RBBM) to improve risk assessment accuracy [117, 118]. OBBM often includes age, sex, blood pressure, lipid levels, and BMI. LBBM includes detailed lipid profiles and blood glucose levels. GBBM leverages genetic data to assess an individual's genetic predisposition to CVD, while RBBM uses imaging data to extract features related to disease outcomes.
Among the notable advancements of AI is the AECRS (Table Ⅲ), a distinctive calculator that sets itself apart from conventional risk calculators by incorporating image-based phenotypes alongside traditional risk factors [119, 120]. Leveraging data from carotid ultrasound scans provides a more comprehensive and accurate risk assessment, identifying early signs of vascular changes and atherosclerotic plaque formation in the carotid arteries [121]. The potential of DL in PM is further increased by assimilating the analysis of carotid vessel morphology since it offers vital insights into vascular health and contributes to a thorough understanding of cardiovascular problems. As the global burden of CVD grew, a paradigm shift emerged in the field of CVD risk assessment, embracing the integration of genomics and AI. Genomics offered new insights into the genetic basis of CVD risk, unveiling susceptibility and molecular mechanisms underlying the disease [122, 123]. Simultaneously, AI harnesses the power of big data analytics for the thorough analysis of diverse datasets [124].
Table Ⅳ provides an exhaustive overview of the use of AI and genomics in the context of CVD and non-CVD disorders. They utilize ML algorithms and techniques to identify biomarkers, classify different types of cardiomyopathies and diseases, develop prediction models for CVD risk, and explore novel genetic biomarkers associated with specific diseases. For example, in CVD studies, Phan et al. [125] aimed to identify biomarkers for CVD using genomic datasets and applied various ML algorithms. The study achieved an accuracy ranging from 55% to an impressive 97%. Biomarker detection using AI and genomics has the potential to aid in early detection and management of CVD greatly. Alimadadi et al. [126] used RNA-seq data from seven datasets to classify different forms of cardiomyopathies. Random forest (RF) outperformed others with 78–84% accuracy among the algorithms employed.
TABLE Ⅲ. Cardiovascular risk studies.
SN
|
Author &
Year
|
CVD Risk Calculator
|
Outcome
|
Target Population
|
Risk Period
(years)
|
Applicable Age Range
(years)
|
Gender
|
Biomarkers
|
Risk Factors
|
OBBM
|
LBBM
|
RBBM
|
GBBM
|
1
|
Assmann et al. [112]
(1988)
|
PROCAM
|
CHD
|
Germany
|
10
|
40–65
|
M/F
|
✔
|
✔
|
✘
|
✘
|
Sex, TC, Age, SBP, Smoking, HDL-C, LDL-C, DBP, Glucose, BMI, TG
|
2
|
Stevens et al. [113]
(2001)
|
UKPDS56
|
CHD
|
UK
|
7.3
|
25–65
|
M/F
|
✔
|
✔
|
✘
|
✘
|
Age, sex, ethnicity, smoking, T2D, BMI, SBP, lipid levels, glycemia
|
3
|
Kothari et al. [114]
(2002)
|
UKPDS60
|
Stroke
|
UK
|
10
|
25–65
|
M/F
|
✔
|
✔
|
✘
|
✘
|
Age, sex, smoking, diabetes, SBP, TC: HDL-C, AF
|
4
|
Conroy et al. [115]
(2003)
|
SCORE
|
Fatal CVD events
|
EU
|
5
|
19–80
|
M/F
|
✔
|
✔
|
✘
|
✘
|
Sex, TC, age, SBP, smoking, LR
|
5
|
NIPPON DATA80 Research Group [116]
(2006)
|
NIPPON
|
CVD, Stroke, CHD
|
Japan
|
10
|
> 30
|
M/F
|
✔
|
✔
|
✘
|
✘
|
Age, sex, BMI, smoking, SBP, TC, GT
|
6
|
Ridker et al. [117]
(2007)
|
RRS
|
MI, IS, Coronary revascularization, Cardiovascular death
|
USA
|
10
|
> 45
|
F
|
✔
|
✔
|
✘
|
✘
|
Age, SBP, BMI, HbA1c, DM, TC, HDL, Smoking, hs-CRP, FH of MI, apolipoprotein B-100, apolipoprotein A-I, Medication use, Menopausal, History of HTN, Alcohol use, Exercise
|
7
|
Mendis et al. [118]
(2007)
|
WHO/ISH
|
CVD
|
HCDR
|
10
|
40–70
|
M/F
|
✔
|
✔
|
✘
|
✘
|
Age, Sex, Smoking, DM, SBP, HDL, TC
|
8
|
Woodward et al. [119]
(2007)
|
ASSIGN
|
Composite CVD
|
Scotland
|
10
|
30–74
|
M/F
|
✔
|
✔
|
✘
|
✘
|
Age, SIMD, FH, DM, Smoking, SBP, TC, HDL-C
|
9
|
Hippisley-Cox et al. [120]
(2007)
|
QRISK1.0
|
MI, CHD, Stroke, TIAs
|
UK
|
10
|
35–74
|
M/F
|
✔
|
✔
|
✘
|
✘
|
Age, Sex, Smoking, SBP, BMI, FH, TDS, TSC: HDL-C
|
10
|
Hippisley-Cox et al. [121]
(2008)
|
QRISK2.0
|
CHD, Stroke
|
UK
|
10
|
35–74
|
M/F
|
✔
|
✔
|
✘
|
✘
|
Age, Sex, Smoking, Ethnicity, SBP, BMI, FH, TDS, TSC: HDLC, FH of CHD, T1D, T2D, RD, RA, AF
|
11
|
D’Agostino et al. [122]
(2008)
|
FRS
|
CVD events, IHD, HF, Cerebrovascular events
|
USA
|
20
|
40–65
|
M/F
|
✔
|
✔
|
✘
|
✘
|
Sex, TC, Age, SBP, Smoking, HDL-C, LDL-C, DBP, Glucose, BMI, TG
|
12
|
Goff et al. [123]
(2014)
|
ACC/AHA pooled cohort equation
|
Atherosclerotic CVD events
|
USA
|
7.5
|
40–79
|
M/F
|
✔
|
✔
|
✘
|
✘
|
Ethnicity, age, sex, SBP, DBP, TC, HDL, LDL, DM, smoking
|
13
|
Hippisley-Cox et al. [124]
(2017)
|
QRISK3.0
|
CVD
|
UK
|
10
|
25–84
|
M/F
|
✔
|
✔
|
✘
|
✘
|
Age, Sex, Smoking, Ethnicity, SBP, BMI, FH, TDS, TSC: HDLC, FH of CHD, T1D, T2D, RD, RA, AF, CKD,
|
14
|
Khanna et al. [125]
(2019)
|
AECRS1.0
|
CVD, Stroke
|
Japan
|
10
|
68.96 ± 10.98
|
M/F
|
✔
|
✔
|
✔
|
✘
|
Age, Sex, Smoking, HTN, Dyslipidemia, FH, IMT, TPA, TC, LDL, HDL, TG, HbA1c, FBS
|
15
|
Viswanathan et al. [126]
(2020)
|
AECRS2.0
|
CVD, Stroke
|
South Asian-Indian
|
10
|
14–85
|
M/F
|
✔
|
✔
|
✔
|
✘
|
Age, Sex, Smoking, Ethnicity, SBP, DBP, FH, LDL, DM, TC, CKD, T2DM, HTN, Artery Type, CUSIP, eGFR, IMT, ESR, PA
|
CVD: Cardiovascular disease, PROCAM: Prospective Cardiovascular Münster, UKPDS: United Kingdom Prospective Diabetes Study, SCORE: Systematic COronary Risk Evaluation, RRS: Reynolds Risk Score, WHO: World Health Organization, ISH: International Society of Hypertension, ASSIGN: ASsessing cardiovascular risk using SIGN guidelines, QRISK: QRESEARCH cardiovascular risk algorithm, FRS: Framingham Risk Score, ACC: American College of Cardiology, AHA: American Heart Association, AECRS: AtheroEdge Composite Risk Score, CHD: Coronary heart disease, MI: Myocardial infarction, IS: Ischemic stroke, TIAs: Transient ischemic attacks, IHD: Ischemic heart disease, HF: Heart failure, EU: European Union, M: Male, F: Female, OBBM: Office based biomarkers, LBBM: Laboratory based biomarkers, GBBM: Genomics based biomarkers, RBBM: Radiomics based biomarkers, TC: Total cholesterol, SBP: Systolic blood pressure, HDL-C: High-density lipoprotein cholesterol, LDL-C: Low-density lipoprotein cholesterol, DBP: Diastolic blood pressure, BMI: Body mass index, TG: Triglycerides, T1D: Type 1 diabetes, T2D: Type 2 diabetes, AF: Atrial fibrillation, GT: Glucose tolerance, HbA1c: Hemoglobin A1c, DM: Diabetes mellitus, hs-CRP: high-sensitivity C-Reactive Protein, FH: Family history, HTN: Hypertension, SIMD: Scottish index of multiple deprivation, TSC: Total serum cholesterol, TDS: Townsend deprivation score, RD: Renal disease, RA: Rheumatoid arthritis, CKD: Chronic kidney disease, IMT: Intima media thickness, TPA: Total plaque area, FBS: Fasting blood sugar, CUSIP: Carotid ultrasound image-based phenotypes, eGFR: estimated Glomerular Filtration Rate, ESR: Erythrocyte sedimentation rate, PA: Plaque area., HCDR: Hypothetical cohorts of different regions.
TABLE Ⅳ. AI-Genomics CVD/non-CVD.
SN
|
Author
& Year
|
Disease
Type
|
Objective
|
Models
|
Dataset
|
Data Size
|
Input
|
Performance
Metrics
|
Conclusion
|
CVD
|
1
|
Phan et al. [127]
(2012)
|
CVD
|
To describe the pipeline for biomarker identification for CVD and exemplify it by analyzing 4 genomic datasets.
|
KNN, linear SVM, LR, Bayesian
|
Blood Gene Exp. CAD, Baseline Macrophages Atherosclerosis, Monocytes FH, Monocytes Atherosclerosis.
|
370 samples in 4 datasets
|
Gene expression data
|
ACC: 61%, 87%, 55%, 97%
|
A systematic pipeline for biomarker identification in CVD can be constructed using high-throughput genomic data and bioinformatics methods.
|
2
|
Alimadadi et al. [128]
(2020)
|
CM
|
To classify different types of cardiomyopathies.
|
svmRadial, pcaNNet, DT, ENet, RF
|
7 datasets in the GEO database
|
137 samples in 7 datasets
|
RNA-seq data
|
*ACC: 80%, 83%, 78%, 84%, 82.66%
|
RF outperformed the others as it was the only one to show an increase in all the metrics.
|
3
|
Steinfeldt et al. [129]
(2022)
|
CVD
|
To develop and validate the prediction model for the 10-year risk of MACE.
|
NeuralCVD- DSM
|
UK Biobank
|
395,713 CVD-free participants
|
29 clinical predictors and 6 PGS
|
∆C-index: 0·006, 95%
CI: 0·005–0·007
NRI: 0·0116, 95%
CI: 0·0066–0·0159
|
When additional high polygenic risk was present, those with low to moderate clinical risk and ages less than 50 years experienced a substantial rise in overall risk.
|
4
|
Kwon et al. [130]
(2022)
|
AF
|
To classify AF vs. non-AF.
|
CNN-GWAS
|
Yonsei AF cohort, Korea AF Network, KoGES, Korean Genome Rural cohort, 3-independent ethnic-specific GWAS
|
6358 subjects selected from 4 cohorts
|
SNPs
|
AUC: 0.74 ~ 0.82
|
CNN-GWAS algorithms predict AF phenotype moderately accurately using only genetic information, capturing cumulative gene effects and interactions.
|
5
|
Venkat et al. [131]
(2023)
|
HF, AF, other CVD disease
|
To identify genes associated with CVD diseases and predict the disease.
|
RF
|
Self-made dataset
|
61 CVD patients
|
Gene expression data and clinical data
|
ACC: 90.9%, 95%,95.9%
|
Predicted the association of highly significant HF, AF, and other CVD genes with demographic variables.
|
Non-CVD
|
6
|
Khalifa et al. [132]
(2020)
|
KIRC, BRCA, LUSC, LUAD,
UCEC
|
To classify 5 different types of cancer.
|
BPSO-DT, CNN
|
Tumor gene expression dataset
|
2086 samples
|
RNA-Seq gene expression data
|
ACC: 96.90%
|
The present work exceeds prior relevant work regarding testing accuracy for 5 tumor types.
|
7
|
Peng et al. [133]
(2021)
|
AD, IBD, T2D, BRCA
|
To identify the high-risk individuals by calculating the PRS.
|
BiLSTM
|
UK biobank
|
351,022 participants
|
SNPs, clinical features
|
AUC: 0.8624, 0.6585, 0.7316, 0.6660
|
DeepPRS outperforms traditional techniques in terms of performance.
|
8
|
Li et al. [134]
(2021)
|
AD
|
To classify AD patients and HC.
|
GWAS, ResNet
|
ADNI database
|
988 subjects
|
SNP genotype data
|
ACC: 71.38%, 92.65%
|
DLG model performs better than the traditional GWAS model. Also, they discovered novel genetic biomarkers of AD.
|
9
|
Zekavat et al. [135]
(2022)
|
Multiple
|
To calculate FD & VD.
|
U-Net Ensemble
|
UK biobank cohort
|
54,813 participants
|
Retinal fundus photographs
|
ACC: 95.6%
|
GWAS identified 7 new loci related to FD and 13 with VD. PheWAS discovered systemic and ocular phenotypes associated with retinal microvasculature.
|
10
|
Hahn et al. [136]
(2022)
|
T2D
|
To integrate genetic information and metabolite profiles to predict T2D risk.
|
LR, RF
|
KoGES Ansan Ansung cohort
|
1425 participants
|
Demo. +gPRS+
clinical features+
metabolites
|
ACC: 81.2%, 85.4%
|
RF-based model using clinical factors, gPRS, and metabolites predicted T2D risk more accurately than the LR-based model.
|
CVD: Cardiovascular disease, CM: Cardiomypathy, AF: Atrial fibrillation, HF: Heart failure, MACE: Major adverse cardiac event, KNN: K-nearest neighbor, SVM: Support vector machine, LR: Logistic regression, svmRadial: Support vector machine with radial kernel, pcaNNet: Principal component analysis neural networks, DT: Decision tree, ENet: Elastic net, RF: Random forest, DSM: Deep survival machine, CNN: Convolutional neural network, GWAS: Genome-wide association studies, CAD: Coronary artery disease, FH: Familial hypercholesterolemia, GEO: Gene expression omnibus, KoGES: Korean genome Epidemiology Study, RNA: Ribonucleic acid, PGS: Polygenic score, ACC: Accuracy, CI: Confidence interval, C-index: Concordance index, NRI: Net reclassification index, AUC: Area under curve, KIRC: Kidney renal clear cell carcinoma, BRCA: Breast Cancer, LUSC: Lung squamous cell carcinoma, LUAD: Lung adenocarcinoma, UCEC: Uterine corpus endometrial carcinoma, AD: Alzheimer’s disease, IBD: Inflammatory Bowel Disease, T2D: Type 2 Diabetes, BPSO-DT: Binary particle swarm optimization with decision tree, BiLSTM: Bidirectional Long Short Term Memory, ADNI: Alzheimer's Disease Neuroimaging Initiative, Demo.: Demographics, gPRS: genome-wide polygenic risk score, DLG: Deep learning genomics, FD: Fractal dimension, VD: Vascular density, PheWAS: Phenome-wide association study.
*ACC: These are the average accuracies for the 5 algorithms in classifying different types of cardiomyopathies.
Steinfeldt et al. [140] developed a prediction model for the 10-year risk of major adverse cardiac events (MACE) using NeuralCVD-DSM. The model showed an improved C-index and net reclassification index (NRI), providing better risk stratification for individuals with low to intermediate clinical risk. This demonstrates the potential of AI in enhancing cardiovascular risk prediction, aiding in better preventive measures. Using CNN-GWAS, Kwon et al. [141] classified atrial fibrillation (AF) vs. non-AF. The model achieved moderate accuracy in predicting AF phenotype based on genetic information, with an area under the curve (AUC) ranging from 0.74 to 0.82. This highlights the potential of AI algorithms in leveraging genetic data for improved disease classification. Venkat et al. [142] identified CVD-associated genes and predicted CVD diseases using an RF approach. The model achieved high accuracy, particularly in predicting HF, AF, and other CVD diseases, ranging from 90.9–95.9%. Evidence like this demonstrates the promise of AI in areas like identifying genes and disease prediction, opening avenues for progress in PM. For non-CVD studies, Khalifa et al. [143] classified five types of cancer across four organs using BPSO-DT and CNN. The study achieved a high accuracy of 96.9% in classifying cancer types, surpassing the accuracy achieved by other related works. This emphasizes the significance of AI in cancer classification, potentially aiding in precision oncology. Peng et al. [144] identified high-risk individuals for various diseases, including Alzheimer's Disease (AD), Inflammatory Bowel Disease (IBD), Type 2 Diabetes (T2D), and Breast Cancer (BRCA) using Bidirectional Long Short-Term Memory (BiLSTM). The model achieved promising performance with AUC values ranging from 0.6585 to 0.8624, outperforming traditional methods. This indicates the potential of AI in identifying individuals at risk for specific diseases. Li et al. [145] classified AD patients and healthy controls using GWAS and ResNet, achieving accuracies of 71.38% for AD classification and 92.65% for healthy control classification. The study also discovered novel genetic biomarkers for AD. This showcases the potential of AI in genomics research and its role in uncovering new insights into complex diseases. Zekavat et al. [146] calculated fractal dimension (FD) and vascular density (VD) using the U-Net-based Ensemble, identifying seven novel loci associated with FD and 13 with VD. The model achieved an outstanding accuracy of 95.6% in this task. This demonstrates the potential of AI in analyzing complex patterns in medical imaging and uncovering novel disease-related markers. Hahn et al. [147] integrated genetic information and metabolite profiles to predict T2D risk using LR and RF. The RF-based model outperformed the logistic regression-based model, achieving an accuracy of 85.4%. This indicates the potential of AI in combining diverse data sources for improved disease risk prediction.
The table's findings collectively emphasize AI and genomics' significant contributions to modern medical research. These studies demonstrate the power of AI algorithms in extracting meaningful patterns from complex genomic data, facilitating disease diagnosis, risk prediction, and the discovery of novel genetic biomarkers. This personalized approach to risk assessment signifies a promising step towards improving the precision of cardiovascular risk prediction and advancing precision medicine in the battle against CVD. By embracing such innovative approaches and continuing to explore the potential of genomics and AI, the future holds promise for a more proactive, personalized approach to CVD management, ultimately improving patient outcomes worldwide.