The summary of our analytical approach is provided in Fig. 1. Using our electronic phenotyping, we defined a total of 10,081 CKD cases and 266,724 controls in the UKBB and 11,820 CKD cases and 22,763 controls in the AoU dataset among those participants with both high-quality sequence and SNP microarray data available.
Autosomal Dominant Polycystic Kidney Disease (ADPKD)
We first identified all PKD1 and PKD2 variants that were either pLoF or reported as ‘P’ by at least two ClinVar submitters without conflicts (model M1). A total of 172 and 34 carriers of such variants were found in the UKBB and AoU, corresponding to the overall prevalence of approximately 0.036% and 0.034%, respectively. We performed a Meta-PheWAS analysis of both UKBB and AoU datasets to assess phenome-wide associations of M1 variants (Fig. 2a). The top associated phecode was “Cystic Kidney Disease” with OR = 295.7 (95%CI: 214.3–408.0, P = 9.0E-263), as expected. We also detected significant associations with a variety of CKD-related phecodes, including “End-stage renal disease”, OR = 52.8 (95%CI: 31.2–89.3, P = 2.1E-49) and “Kidney replaced by transplant” OR = 112.1 (95%CI: 71.5-175.7, P = 4.9E-94), as well as multiple other renal and extra-renal complications of ADPKD (Supplemental Data 3), confirming that M1 variant definitions have robust phenotypic signatures across both biobanks. We also tested the effects of these variants on the risk of CKD, as defined by our phenotyping algorithm, after adjustment for age, sex, diabetes, batch, and ancestry (Supplementary Table 6). In the meta-analysis of both cohorts, the risk of CKD was 17-fold higher in the ADPKD M1 variant carriers compared to non-carriers (OR: 17.1, 95%CI: 11.1–26.4, P = 1.8E-37).
We next investigated the effect of polygenic background on the risk of CKD by computing our previously validated GPS for CKD30 in all UKBB and AoU participants. After APOL1 and ancestry adjustments, the polygenic score was standard normal-distributed across ancestries in both UKBB and AoU datasets (Supplementary Fig. 2). Because this risk score has not been previously tested in AoU participants, we first confirmed that the GPS was indeed associated with increased risk of CKD in this dataset (OR per SD = 1.39, 95%CI: 1.36–1.43, P = 5.9E-125, adjusted for age, sex, diabetes, batch, and genetic ancestry). All participants were then stratified based on their ADPKD QV carrier status, and the effects of the GPS were re-examined within each stratum across both UKBB and AoU datasets combined. In the meta-analysis, the OR per SD of the GPS was 2.28 (95%CI: 1.55–3.37, P = 2.7E-05) in the M1 QV carriers and 1.72 (95%CI: 1.69–1.76, P < E-300) in the non-carriers (Table 1). Despite the trend for a greater effect of the GPS among the carriers, the GPS-by-carrier interaction test was not statistically significant in either cohort or in the combined meta-analysis.
Table 1
Performance metrics for the GPS in ADPKD and COL4A-AN M1, M2, and M3 carriers and non-carriers in the meta-analysis of UKBB and AoU cohorts. OR adjusted for age, sex, diabetes, PCs of ancestry, and genotyping array or batches; AUC was calculated for the full model (GPS and covariates) and for GPS alone without covariates (crude); variance explained was calculated for the GPS alone by estimating variance explained by the full model (GPS and covariates) minus the variance explained by the covariates-only model. P-values are two-sided and not corrected for multiple testing. CI: Confidence Intervals.
Model
|
Cases/controls
|
OR (95% CI), P-value
|
AUC full model
(95%CI)
|
AUC crude
(95%CI)
|
Variance explained
|
ADPKD variants
|
|
|
|
|
|
Non-carrier
|
21,901/275,638
|
1.72 (1.69–1.76), P < E-300
|
0.78 (0.78–0.78)
|
0.62 (0.62–0.62)
|
0.039
|
M1
|
41/81
|
2.28 (1.55–3.37), P = 2.6E-05
|
0.96 (0.92-1.00)
|
0.69 (0.59–0.79)
|
0.128
|
M2
|
44/95
|
2.21 (1.37–3.58), P = 3.3E-05
|
0.97 (0.93-1.00)
|
0.70 (0.60–0.80)
|
0.103
|
M3
|
52/215
|
5.25 (2.31–11.9), P = 7.4E-05
|
0.97 (0.94-1.00)
|
0.69 (0.60–0.78)
|
0.076
|
COL4A-AN variants
|
|
|
|
|
|
Non-carrier
|
21,901/275,638
|
1.70 (1.68–1.73), P < E-300
|
0.77 (0.77–0.77)
|
0.63 (0.63–0.63)
|
0.038
|
M1
|
99/1193
|
1.78 (1.22–2.58), P = 2.4E-03
|
0.94 (0.91–0.97)
|
0.59 (0.52–0.65)
|
0.019
|
M2
|
112/1,344
|
2.47 (1.56–3.94), P = 1.3E-04
|
0.93 (0.90–0.96)
|
0.62 (0.56–0.68)
|
0.014
|
M3
|
172/1,884
|
1.93 (1.26–2.95), P = 2.3E-03
|
0.89 (0.86–0.92)
|
0.60 (0.55–0.65)
|
0.019
|
We next estimated the CKD risk for each tertile of the GPS distribution among the M1 variant carriers compared to the middle tertile of the non-carriers (i.e., reflecting the average population risk) across both AoU and UKBB (Fig. 3 and Supplementary Table 7). Among the QV carriers, we observed a clear gradient of CKD risk as a function of GPS, ranging from OR = 3.03 (95%CI 1.03–8.95, P = 4.4E-02) for the lowest tertile to OR = 54.4 (95%CI 26.1–113.0, P = 9.6E-27) for the highest tertile of polygenic risk. These results demonstrate that the GPS significantly alters the penetrance of ADPKD M1 qualifying variants.
In the subgroup analyses, we examined QVs in PKD1 and PKD2 separately and observed similar patterns of GPS effects within each of the gene-defined subgroups (Supplementary Fig. 3). Similarly, we examined QVs by variant type (truncating vs. missense) and observed a consistent pattern of GPS effects for both subgroups (Supplementary Fig. 4). Lastly, we investigated the effect of the GPS on the risk of CKD among ADPKD carriers defined under the alternative QV models (M2 and M3, Supplementary Table 7). Similar results on the penetrance of CKD were observed, demonstrating that our findings were also robust to less stringent QV definitions.
Collagen IV Alpha Associated Nephropathy (COL4A-AN)
We next examined the effect of GPS on the risk of CKD in the carriers of COL4A-AN variants compared to the average risk of non-carriers. In this analysis, we used a less stringent MAF < 0.001 for variant filtering, considering that the most severe phenotype of COL4A-AN is observed under a recessive model. Under M1, we defined a total of 1,435 carriers in the UKBB and 310 carriers in the AoU dataset, corresponding to the overall prevalence of approximately 0.31% and 0.32%, respectively.
In the Meta-PheWAS analysis for M1 carriers across both UKBB and AoU datasets (Fig. 2(b)), the top associated phecode was “Hematuria” with OR = 2.3 (95% CI: 2.0-9.6, P = 4.8E-48). Other phenome-wide-significant associations included “Kidney replaced by transplant” (OR = 3.1, 95%CI: 2.0-23.8, P = 3.8E-07), “Nephritis, nephrosis, renal sclerosis” (OR = 2.34, 95%CI: 1.81–10.39, P = 4.1E-11), “Proteinuria” (OR = 3.94, 95%CI: 2.77–51.6, P = 1.6E-14) and “Chronic glomerulonephritis, NOS” (OR = 2.98, 95%CI: 1.92–19.7, P = 9.0E-07). The complete list of phenotypic associations is provided as Supplemental Data 4. Compared to non-carriers, the M1 QV carriers had a 37% increased risk of CKD as defined by our e-phenotype (OR = 1.37, 95%CI: 1.13–1.64, P = 8.5E-04), M2 carriers had 25% increased risk (OR = 1.25, 95%CI: 1.00-1.56, P = 4.9E-02), and M3 carriers had 48% increased risk (OR = 1.48, 95%CI: 1.23–1.77, P = 2.6E-05) in the combined meta-analysis under a dominant model (Supplementary Table 8). In comparison, the M3 recessive genotype was associated with a 3.38-fold higher risk (OR = 3.38, 95%CI: 1.88–6.08, P = 4.7E-05).
We next investigated the effect of polygenic background on the risk of CKD among M1 QV carriers compared to noncarriers. Similar to ADPKD, the GPS had a significant effect on the risk of CKD among both COL4A-AN carriers (OR per SD of GPS = 1.78, 95%CI: 1.22–2.58, P = 2.4E-03) and non-carriers (OR per SD of GPS = 1.70, 95%CI: 1.68–1.73, P < E-300) in the meta-analysis (Table 1). There was no significant GPS-by-carrier interaction (P = 8.1E-01). Similar to ADPKD, we observed a gradient of CKD risk as a function of the GPS among M1 carriers, from no increased risk (OR = 1.01, 95%CI 0.63–1.86, P = 7.8E-01) for the lowest GPS tertile to a 2.5-fold higher risk (OR = 2.53, 95%CI 1.66–3.85, P = 1.4E-05) for the top GPS tertile when compared to the middle tertile of non-carriers (Fig. 4).
We also explored the recessive model by testing for GPS effects among individuals with the M3 risk genotype (QV homozygotes, compound heterozygotes, or COL4A5 hemizygous males). For individuals with the risk genotype, the top tertile of the GPS conveyed a 6.73-fold higher risk of CKD (OR = 6.73, 95%CI: 2.59–17.5, P = 8.8E-05), while the bottom tertile conveyed a 2.29-fold higher risk of CKD (OR = 2.29, 95%CI 0.64–8.12, P = 2.0E-01) compared to the middle tertile of individuals without the risk genotype (Supplementary Fig. 5).
Our sensitivity analyses included alternative variant models (Supplementary Table 9) and separate analyses of autosomal (COL4A3 and COL4A4) and sex-linked (COL4A5) genes (Supplementary Table 10). These analyses confirmed the direction-consistent effect of the GPS across all different subgroups. We note that recessive analyses for M1 and M2 models were underpowered due to the low overall frequency of recessive genotypes defined under these models.