Study subjects and grouping
This study enrolled 1489 patients diagnosed with ESRD. For the control group, we included 1161 subjects with blood samples for genotyping, who exhibited no substantial evidence for CKD. This study was approved by the ethics committee of Guangdong provincial people's hospital in accordance with the Declaration of Helsinki. Written informed consent was obtained from all participants.
End stage renal disease
For the study participants, we included individuals diagnosed with ESRD based on International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9‐CM), code 585 or International Classification of Diseases, Tenth Revision, Clinical Modification (ICD‐10‐CM), code N18.5 or N18.6 as well as those with a kidney transplant. Briefly, ESRD is characterized by a irreversible decline in kidney function, typically indicated by a glomerular filtration rate (GFR) below 15 ml/(min 1.73m²), the presence of uremic toxicity, and the need for renal replacement therapy. This includes: 1) Patients receiving renal replacement therapy, such as dialysis, or those who have undergone a kidney transplant 2) Individuals with Stage 5 CKD who may not yet be on dialysis but exhibit significant uremic toxicity and are at a stage where dialysis or transplantation is imminent or being considered
Genotyping, quality control and imputation
Genomic DNA was extracted from blood sample using AxyPrep™ Blood Genomic DNA Purification Miniprep Kit (Axygen, AP-MN-BL-GDNA-250) and quantified using ELIASA(pectra max plus 384, Molecular Devices). Qualified DNA sample of each subject was genotyped by the BeadChip Array Asian Screening Array (Illumina ASAMD-24v1-0) following the standard protocol for an Illumina Infinium™ assay. The GenomeStudio software and the calling algorithm from Illumina were used for normalized intensity data analyses.
The genotypes of 25606 single nucleotide polymorphism (SNPs) in the X chromosome were extracted for subsequent analysis. Prior to genotype imputation, we conducted quality control (QC) procedures on the genotype data following the criteria: (i) SNPs with a maximum per-person call rate lower than 95% were excluded; (ii) SNPs violating the Hardy-Weinberg equilibrium (HWE) assumption, defined as P-values < 1e-6, were excluded. (iii) Low frequency loci with a minor allele frequency (MAF) less than 1% were removed. Samples with missing SNP genotypes greater than 5% were removed. Note there are no samples filtered in our study.
We first pre-phased the genotypes using Eagle (version 2.4.1). Subsequently, we imputed genotype dosages using Beagle(version 5) with the 1000 Genomes Project phase 3 (version 3) reference haplotypes. The reference panels consist of 3202 unrelated samples from 26 populations. Only non-pseudoautosomal regions were phased and imputed. Pre-phasing and imputation were conducted separately for females and males. Imputed SNPs with were excluded based on the following criteria: (i) P value of HWE < 10e-6. (ii) SNP with low imputation quality (imputation quality score DR2 < 0.3 and MAF < 0.01). The genetic analyses in this study were conducted using PLINK (version 1.9). The X-chromosome genotype matrix was established by coding females as diploid (0, 1, or 2) and males as homozygous diploid (either 0 or 2).
X chromosome-wide association analysis on ESRD
We first conducted a sex-stratified XWAS on ESRD risk in females and males separately, which used an age-adjusted model of logistic regression under the assumption of additive allelic effects of the SNP dosages. The significance threshold for the X chromosome-wide association was set at P = 10e− 5 based on the number of independent linkage disequilibrium (LD) blocks. We also used non-additive XWAS models to assess recessive and dominance effects to identify variations associated with ESRD, applying a more stringent threshold at P < 10e− 6. This threshold was set because, compared to additive model, dominant and recessive disease models generate weaker LD levels in patients[18].
To identify cross-sex ESRD variants, a fixed effect inverse-variance weighted meta-analysis was performed by combining female and male summary statistics using ‘metafor’ R packages (version 3.4.0). To identify sex-shared variants, we retained only the summary statistics of SNPs with low heterogeneity (I2 < 20%) between females and males for final meta-analysis. Additionally, a Z score was computed to quantify the sex differences in effect sizes, aiding in the identification sex specific variants.
Fine mapping and functional annotations on ESRD loci
Lead index variants were determined as the most significant variant within ± 5 kb windows. We defined independent associated loci on the basis of genomic positions at least 1Mb apart from each other. Conditional and joint analysis (COJO) and stepwise conditional and joint analysis (SLCT) using GCTA[19] tool were further conducted to identify independent locus in a step-wise forward selection process.
Independent ESRD loci were mapped to target genes based on the index SNP by (i) identifying the nearest gene; (ii) associating with a reported eQTL gene from the GTex database; and (iii) identifying colocalized genes. The colocalization analysis of each locus, including variants within the ± 1 Mb region surrounding the index SNP, was conducted using the R package ‘coloc’. Within this region, the presence of at least one genome-wide significant eQTL variant (P-values < 10e − 5) was required before testing for colocalization. A posterior probability (PP) > 70% for a shared variant between eQTL and ESRD associated locus was defined colocalization.
Association between ESRD loci and kidney traits
A cross-sectional lookup analysis was conducted by referring to two published GWAS in European and Japanese populations, which encompassed the associations between candidate loci and kidney traits, including creatinine, estimate glomerular filtration rate (eGFR), uric acid, urea nitrogen, and albuminuria. ESRD loci identified in our analysis were mapped to the full summary statistics of these two published GWASs to identify replicated loci. Overlapping locus with a P-value < 0.05 in previous studies were considered replicated.