Distribution of AAA SNPs in the genome and their gene regulation
The literature search identified a total of 86 SNPs related to AAA, 48 of which originated from recorded GWAS databases and 38 were from case-control or cohort studies whose frequency in AAA patients were at least 3% higher than those in the control group. Most of the identified SNPs were annotated based on their chromosomal position (n = 49), followed by GTEx tissue expression (n = 28) and direct gene coding (n = 9) (Fig. 2A-C). There is a random distribution of SNPs across different chromosomes, although chromosome 3 and 9 predominantly harbor most of the AAA SNPs (Fig. 2D). The Minor Allele Frequency (MAF) of AAA SNPs follow a random distribution. No SNPs were found on chromosome 17, X, and Y, suggesting that AAA SNPs are not sex-linked (Fig. 2D). Several SNPs only encodes and/or affects 1–3 genes, while a sub category of SNPs affects up to 20 genes (rs1800629, 20 genes; rs361525, 11 genes; rs352140, 9 genes; rs58749629, 6 genes). On average, each chromosome contains 4 SNPs, we’ve highlighted chromosomes 3 and 9 in Fig. 2E as they contain the most SNPs. In chromosome 3 we found 8 SNPs that affect a single gene (TGFBR2) and 1 SNP that affects 9 different genes (ITIH4, DNAH1, SFMBT1, GNL3, GLYCTK, NT5DC2, STIMATE, GLYCTK-AS1, MUSTN1). The complete SNP chromosome mapping in a 1:250,000 scale of pixel to base pair can be seen in Supplementary Fig. 1.
Single SNPs can affect multiple genes depending on which region the SNPs are inside of the genome. Most of the SNPs associated with AAA reside in the intronic, regulatory, upstream, downstream, or intergenic region of their respective gene (Fig. 2F). In particular, SNPs with high CADD pathogenicity scores (> 10) result in non-synonymous, noncoding change, and stop gained mutations. A high CADD pathogenicity score is indicative of the deleterious effect of the SNP, as compared to other possible mutations within the human genome 28.
SNPs and the associated genes are linked with high pathogenicity in AAA
We identified 15 SNPs affecting 20 genes with a CADD (Combined Annotation Dependent Depletion) pathogenicity score above 10 (Table 1). CADD scores correlate with pathogenicity, disease severity, regulatory effects, and complex trait associations. A score greater than 10 indicates that the nucleotide substitution is predicted to be the 10% most deleterious substitutions within the human genome, a score of 20 or greater indicates the 1% most deleterious, a score of 30 or greater indicates the 0.1% most deleterious and so on. We found 1 SNP with a CADD score above 30 (rs5516, KLK1), 1 SNP with a CADD score above 20 (rs1801133, MTHFR), and 13 SNPs with a CADD score above 10.
Table 1
The list of SNPs with high CADD (Combined Annotation Dependent Depletion) pathogenicity score. Higher CADD scores correlate to the deleterious effect of the nucleotide substitution, as compared to all possible substitutions within the human genome.
No | SNP ID | Affected Gene(s) | SNP consequence | MAF | CADD pathogenicity score* | Protein-Coding SNP (yes/no) |
1 | rs5516 | KLK1 | STOP_GAINED | 0.3073 | 34 | yes |
2 | rs1801133 | MTHFR | NON_SYNONYMOUS | 0.2454 | 25.6 | yes |
3 | rs429358 | APOE | NON_SYNONYMOUS | 0.1506 | 17.4 | yes |
4 | rs2276109 | MMP12 | UPSTREAM | 0.05551 | 17.25 | no |
5 | rs11591147 | PCSK9 | NON_SYNONYMOUS | 0.00639 | 17.03 | yes |
6 | rs1800795 | STEAP1B, IL6-AS1, IL6 | REGULATORY, INTRONIC | 0.1412 | 16.22 | no |
7 | rs7255 | LDAH, GDF7, C2orf43 | UPSTREAM, NONCODING_CHANGE, DOWNSTREAM | 0.4139 | 14.92 | no |
8 | rs2836411 | ERG | REGULATORY, INTRONIC | 0.3269 | 14.67 | no |
9 | rs2230806 | ABCA1 | NON_SYNONYMOUS | 0.4397 | 14.3 | yes |
10 | rs2071307 | ELN | NON_SYNONYMOUS | 0.2204 | 13.86 | yes |
11 | rs243865 | MMP2-AS1, MMP2 | UPSTREAM | 0.1366 | 11.61 | no |
12 | rs2285053 | MMP2 | UPSTREAM | 0.1512 | 10.81 | no |
13 | rs1571590 | TGFBR1 | INTRONIC | 0.1216 | 10.78 | no |
14 | rs1799983 | NOS3 | NON_SYNONYMOUS | 0.1763 | 10.55 | yes |
15 | rs10757278 | CDKN2B, CDKN2B-AS1 | REGULATORY, DOWNSTREAM | 0.4081 | 10.42 | no |
*A significant CADD pathogenicity score is defined as a score greater than 10 |
Abbreviations: KLK1 = Kallikrein 1, MTHFR = Methylenetetrahydrofolate Reductase, APOE = Apolipoprotein E, MMP12 = Matrix Metallopeptidase 12, PCSK9 = Proprotein Convertase Subtilisin/Kexin Type 9, STEAP1B = STEAP Family Member 1B, IL6-AS1 = IL6 Antisense RNA 1, IL6 = Interleukin 6, LDAH = Lipid Droplet Associated Hydrolase, GDF7 = Growth Differentiation Factor 7, ERG = ETS Transcription Factor ERG, ABCA1 = ATP Binding Cassette Subfamily A Member 1, ELN = Elastin, MMP2-AS1 = MMP2 Antisense RNA 1, MMP2 = Matrix Metallopeptidase 2, TGFBR1 = Transforming Growth Factor Beta Receptor 1, NOS3 = Nitric Oxide Synthase 3, CDKN2B = Cyclin Dependent Kinase Inhibitor 2B, CDKN2B-AS1 = CDKN2B Antisense RNA 1 |
High CADD score does not necessarily correlate with frequency in AAA (Table 2). The SNP rs5516, which codes for KLK1 is a stop-gained mutation. The CG allele of this SNP is 17.8% more expressed in Australian AAA patients than in control groups 29. The SNP rs1801133, coding for MTHFR is a non-synonymous mutation, its CT/CC allele is only 3% more expressed in Greek AAA patients than in control 30. On the other hand, The SNP rs1800629 is a non-coding SNP related to TNF-α, it is 16.6% more expressed in AAA but only has a CADD score of 4.365 31.
Table 2
Top 20 upregulated SNPs in AAA patients. Data availability is only for SNPs that were in case-control studies. SNPs with both High CADD score and frequency are highlighted in the table implies both high frequency and CADD score, implying its association with pathogenicity, disease severity, regulatory effects, and complex trait in AAA.
No | SNP ID | Affected Gene(s) | Allele/ Genotype | AAA frequency (Δ frequency) | CADD score | Reference |
1 | rs5516 | KLK1 | CG | 57.9 (17.8) | 34 | Biros et al, 201141 |
2 | rs1800795 | IL-6 | C | 51.9 (10.8) | 16.22 | Jabłońska et al, 202131 |
3 | rs2230806 | ABCA1 | KK | 43.6 (13.3) | 14.3 | Zhao et al, 201642 |
4 | rs243865 | MMP-2 | CC | 64.9 (8.6) | 11.61 | Saracini et al, 201243 |
5 | rs1800629 | TNF-α | GA | 45.2 (16.6) | 4.365 | Jabłońska et al, 202131 |
6 | rs2071307 | ELN | GG | 39.9 (8.6) | 13.86 | Saracini et al, 201243 |
7 | rs3091244 | CRP | CT, CA | 47 (14) | 0.004 | Saratzis et al, 201444 |
8 | rs1800469 | TGFB1 | TT | 31.2 (11.5) | 5.903 | Zuo et al, 201545 |
9 | rs7635818 | CNTN3 | CC | 27.1 (11.2) | 1.023 | Rašiová et al, 202146 |
10 | rs3091244 | CRP | TT, AA, TA | 20 (11) | 0.004 | Saratzis et al, 201444 |
11 | rs3775290 | TLR3 | C | 68.3 (10.7) | 9.846 | Jabłońska et al, 202047 |
12 | rs1333049 | CDKN2B, CDKN2B-AS1 | CC | 31.6 (10.6) | 1.579 | Wei et al, 201438 |
13 | rs10757278 | CDKN2B, CDKN2B-AS1 | G | 56.5 (8.8) | 10.42 | Wei et al, 201438 |
14 | rs5516 | KLK1 | GG | 15 (9) | 34 | Biros et al, 201141 |
15 | rs352140 | TLR9 | T | 52.4 (8.6) | 0.066 | Jabłońska et al, 202047 |
16 | rs1466535 | LRP1 | CT | 43 (8) | 6.903 | Galora et al, 201548 |
17 | rs3918242 | MMP-9 | CT | 29.9 (7.7) | 0.056 | Crkvenac Gregorek et al, 201649 |
18 | rs3019885 | SLC30A8 | TT | 39.7 (7.5) | 6.938 | Galora et al, 201548 |
19 | rs2252070 | MMP-13 | GG | 21.4 (7.3) | 9.775 | Saracini et al, 201243 |
20 | rs5182 | AGTR1 | CT | 46.5 (6.5) | 0.64 | Zuo et al, 201545 |
Abbreviations: KLK1 = Kallikrein 1,TNF-α = Tumor Necrosis Factor Alpha, CRP = C-reactive Protein, ABCA1 = ATP Binding Cassette Subfamily A Member 1, TGFB1 = Transforming Growth Factor Beta-1, CNTN3 = Contactin-3, IL6 = Interleukin 6, TLR3 = Toll-like receptor 3, CDKN2B = Cyclin Dependent Kinase Inhibitor 2B, CDKN2B-AS1 = CDKN2B Antisense RNA 1, MMP2 = Matrix Metallopeptidase 2, TLR9 = Toll-like receptor 9, ELN = Elastin, LRP1 = Low Density Lipoprotein Receptor-related Protein 1, MMP9 = Matrix Metallopeptidase 9, SLC30A8 = Solute Carrier Family 30 Member 8, MMP13 = Matrix Metallopeptidase 13, AGTR1 = Angiotensin II Receptor Type 1 |
Biological traits associated with AAA SNPs
Using snpXplorer AnnotateMe platform, gene enrichment analysis from the GWAS-catalog associations of the AAA SNPs showed strong correlations with various lipid measurements such as LDL-cholesterol measurement (52%), total cholesterol measurement (34%), triglyceride measurement (26%), HDL-cholesterol measurement (23%), and CRP measurement (14%) (Fig. 3A). Only 49% of all tested SNPs were found to be directly labeled with the AAA trait, this may suggest that the remaining 51% have an indirect association with AAA occurrence through various lipid metabolism pathways and genes. As for associations with other cardiovascular diseases, AAA and CAD share 25% of the same SNPs, while AAA and MI only share 10%.
In addition to SNPs, we also present the GWAS-catalog associations of the genes associated with AAA SNPs. AAA SNPs associated genes have correlations with various diseases such as coronary artery disease (21%), myocardial infarction (12%), and type II diabetes (10%). AAA SNPs-associated genes were also found to be associated with genes relating to total cholesterol measurement (17%), LDL-cholesterol measurement (15%), CRP measurement (12%), HDL-cholesterol measurement (10%), and triglyceride measurement (9%) (Fig. 3B).
Gene Ontology analysis of AAA SNPs
Gene-set enrichment analysis was done with the genes associated with AAA SNPs, using Gene Ontology (GO) as the gene-set source. The gene-set enrichment was then visualized using REVIGO 24. Annotated GO terms were selected using a 2-step process as described by Tesi et al. (2021) in the snpXplorer web server 23. The resulting GO term clustering and annotation in Fig. 4 depicts the most prominent and significant biological processes and associated genes in AAA based on the snpXplorer annotation algorithm.
The AAA-associated genes were found to be prominently involved in cell population proliferation, followed by regulation of cell population proliferation (p = 3.19x10− 8), muscle cell proliferation (p = 3.08x10− 6), regulation of smooth muscle cell proliferation (p = 4.97x10− 6), regulation of plasma lipoprotein particle levels (p = 1.58x10− 5), regulation of protein kinase activity (p = 2.15x10− 5), regulation of phosphorus metabolic process (p = 2.77x10− 5), and blood circulation (p = 4.63x10− 5). These annotated terms were found to be most significant based on semantic similarity (Supplementary Fig. 2) and a dynamic cut tree algorithm for term-based clustering and p values (Supplementary Fig. 3). The most significant gene ontology terms associated with the SNPs associated genes are summarized in Table 3.
Table 3
List of top 20 gene ontology terms associated with the total SNPs associated genes in AAA. Bolded terms are annotated in the REVIGO clustering from Fig. 4.
Term ID | Term Description | P value |
GO:0008283 | cell population proliferation | 2.33x10− 8 |
GO:0042127 | regulation of cell population proliferation | 3.19x10− 8 |
GO:0033002 | muscle cell proliferation | 3.08x10− 6 |
GO:0048660 | regulation of smooth muscle cell proliferation | 4.97x10− 6 |
GO:0048659 | smooth muscle cell proliferation | 5.06x10− 6 |
GO:0008285 | negative regulation of cell population proliferation | 1.37x10− 5 |
GO:0097006 | regulation of plasma lipoprotein particle levels | 1.58x10− 5 |
GO:0045859 | regulation of protein kinase activity | 2.15x10− 5 |
GO:0019220 | regulation of phosphate metabolic process | 2.77x10− 5 |
GO:0051174 | regulation of phosphorus metabolic process | 2.77x10− 5 |
GO:1901700 | response to oxygen-containing compound | 3.06x10− 5 |
GO:0001932 | regulation of protein phosphorylation | 3.73x10− 5 |
GO:0010033 | response to organic substance | 3.83x10− 5 |
GO:0008015 | blood circulation | 4.63x10− 5 |
GO:0050673 | epithelial cell proliferation | 5.39x10− 5 |
GO:0043549 | regulation of kinase activity | 6.82x10− 5 |
GO:0032270 | positive regulation of cellular protein metabolic process | 7.19x10− 5 |
GO:1901698 | response to nitrogen compound | 7.37x10− 5 |
GO:1905952 | regulation of lipid localization | 8.05x10− 5 |
GO:0003013 | circulatory system process | 8.80x10− 5 |
*P value was measured using the gost function in gprofiler2 R package (cumulative hypergeometric p-value) |
Significant signaling networks associated with AAA SNPs.
To assess the interaction between SNPs associated genes related to AAA, gene-gene interactomes were constructed and plotted using GeneMANIA plugins in cytoscape (Fig. 5). The most significant Genes associated with top 25 GO annotated functions were included to further grouped and visualized on the network module to see their network association according to their shared pathways, co-localization, co-expression, and physical interaction (Fig. 5A, Table 4). This analysis also includes several additional genes predicted to share a strong interaction with some of our gene candidates. 14 genes were defined associated with lipid metabolism pathways; regulation of plasma lipoprotein level, regulation of lipid localization, regulation of lipid transport, regulation of cholesterol transport (Fig. 5B). 10 genes were associated with extracellular matrix (ECM) organization (Fig. 5C). 11 genes were associated with smooth muscle cell proliferation pathways (Fig. 5D). 10 genes were associated with reactive oxygen species metabolism (Fig. 5E). IL-6 shared three modules, the most out of all genes, involved in lipid metabolism, ECM organization, and smooth muscle cell proliferation pathways.
Table 4
The enriched Gene ontology (GO) term and pathways from the SNPs associated gene queries according to GENEmania plugin of Cytoscape.
Pathways | ID | Term | Genes | P-value |
Lipid metabolism | GO: 0097006 | Regulation of plasma lipoprotein level | PCSK9, LIPA, PLTP, LRPAP1, LPA, LDLR | 2.1E-06 |
GO: 1905952 | Regulation of lipid localization | SPP1, IL6, PCSK9, PLTP, ANXA2, AGTR1, CRP, LDLR, LRP1,ABCA1, APOE | 4.5E-06 |
GO: 0032368 | Regulation of lipid transport | PCSK9, LIPA, PLTP, LRPAP1, LPA, LDLR | 0.00036 |
GO: 0032374 | Regulation of cholesterol transport | PCSK9, PLTP, ANXA2, LRP1, ABCA1, APOE | 0.0010 |
Extracellular matrix organization | GO: 0030198 | Extracellular matrix organization | IL6, LRP1, MMP13, MMP9, MMP2, MMP12, FLOT1, TGFB1, TGFBR1, TNFRSF1A | 0.00036 |
Smooth muscle cell proliferation | GO: 0033002 | Muscle cell proliferation | TGFBR1, TGBR3, MMP2, MMP9, IL6, IL6R, TNF, TRIB1, CDKN1A, AIF1, ELN | 2.1E-06 |
GO: 0048659 | Smooth muscle cell proliferation | MMP2, MMP9, IL6, IL6R, TNF, TRIB1, CDKN1A, AIF1, ELN | 1.6E-05 |
Reactive oxygen species (ROS) metabolic process | GO: 0027593 | Reactive oxygen species metabolic process | AGTR1, F2, CRP, NOS3, JAK2, TGFBR2, TGFB1, CDKN1A, PKD2, TNF | 0.00093 |
*Abbreviations: PCSK9 = Proprotein Convertase Subtilisin/Kexin type 9, LIPA = Lipase A, PLTP = Phospholipid Transfer Protein, LRPAP1 = Low Density Lipoprotein Receptor Related Protein Associated Protein 1, LPA = Lipoprotein A, LDLR = Low Density Lipoprotein Receptor, SPP1 = Secreted Phosphoprotein 1, IL6 = Interleukin 6, ANXA2 = Annexin A2, AGTR1 = Angiotensin II Receptor Type 1, CRP = C Reactive Protein, ABCA1 = ATP Binding Cassette Subfamily A Member 1, APOE = Apolipoprotein E, MMP2 = Matrix Metallopeptidase 2, MMP3 = Matrix Metallopeptidase 3, MMP9 = Matrix Metallopeptidase 9, MMP13 = Matrix Metallopeptidase 13, LRP1 = Low Density Lipoprotein Receptor-related Protein 1, FLOT1 = Flotilin 1, TGFBR1 = Transforming Growth Factor Beta Receptor 1, TGFB1 = Transforming Growth Factor Beta 1, TGFBR3 = Transforming Growth Factor Beta Receptor 3, TNF = Tumor Necrosis Factor, TNFSR1 = Tumor Necrosis Factor Receptor Superfamily Member 1A, CDKN1a = Cyclin Dependent Kinase Inhibitor 1A, AIF1 = Allograft Inflammatory Factor 1, ELN = Elastin, TRIB1 = Tribble Pseudokinase 1, F2 = Thrombin,, NOS3 = Nitric Oxide Synthase 3, JAK2 = Janus Kinase 2, PKD2 = Polycystin 2, Transient Receptor Potential Cation Channel. |
**P value of the enrichment is measured by hyper-geometric test based on GeneMania in-platform algorithm |
Clinical characteristics of AAA patients
We found 15 case-control studies whose patient characteristics were detailed, totaling to a sample size of 10.956 (5676 control & 5280 case). The available clinical data were Age (mean), Men (n), Aortic diameter (mm, mean), Smoking (current & past), Hypertension, Diabetes, CAD (coronary artery disease), PAD (Peripheral Artery Disease), and Dyslipidemia. Data variabilities between each clinical parameter are different depending on the availability. Detailed summary of each study can be found in Supplementary Table 1.
The control group and case group shared a similar average of age (69.1 and 70.9 years). Several clinical parameters consist aortic diameter (54 mm in AAA and 20.95 mm in control, p = 2.003e-05), history of smoking (past or current, p = 0.037), hypertension (20.3% difference between case and control, p = 0.013), and dyslipidemia (p = 0.042) were positively associated with AAA. Conversely, there were no significant differences on the presence of diabetes, CAD, and PAD between the two populations. These association is summarized in Fig. 6.