Study workflow
The study design and workflow are shown in Fig. 1. Briefly, the strong control of XCI over epigenetic modifications in Xi implies widespread methylation of CpGs over the inactivated region of Xi13,14. Therefore, we hypothesized that 2-allele demethylation in a CpG site expected to be under XCI would provide evidence for its defective control or maintenance. Consequently, we reasoned that in a bulk biological sample, the global frequency of defective XCI could be measured by the percentage of infrequent 2-allele demethylation events in CpGs under XCI. After showing robust evidence that this is the case, we then aimed to show that the percentage of defective XCI in bulk data could be robustly detected in different tissues and correlated with relevant clinical cancer outcomes, using multiple lines of evidence from independent studies (Tables S1-S2).
X-Ra definition
We first selected CpGs that were mapped to X-linked genes with strong previously published evidence of consistent inactivation5,6. We then estimated the frequency of defective XCI in a female biological sample by surveying the number of CpGs that were observed with 2-allele demethylation, i.e. methylation beta values < 0.2. We defined X-Ra (X-Reactivation) as the percentage of these events, to measure the quantitative level of partial activation of Xi supported by methylation removal. Therefore, under a strong XCI, X-Ra is expected to be low, and when XCI is weakened, X-Ra should increase. A list of 4,862 CpGs was compiled from harmonized CpGs mapped in TCGA workflow to genes reportedly under XCI5,6. For each study, we selected CpGs in the reference panel and extracted the beta methylation levels. Finally, for each individual, X-Ra was the percentage of CpGs under XCI with beta values lower than 0.2 (See Methods).
X-Ra as a measure of a patient’s defective maintenance of XCI
Using methylation data of 446 constitutional biological samples from 12 different tissues from TCGA (Table S1), we established X-Ra as a measure of defective XCI frequency in a biological sample. We first observed that CpGs under XCI with 2-allele demethylation were infrequent and varied among women. Across all 12 healthy tissues, we found that 96% of the 2-allele demethylation calls were in 20% of women and 64% in only 5% of women (Fig. 2A).
We then annotated XCI CpGs with 15-chromatin states in blood and DNase hypersensitivity regions in six constitutional tissues15 released from the ROADMAP Epigenomics Mapping Consortium (ChromHMM v1.10)16. We observed that while XCI CpGs with 2-allele demethylation spread through the chromosome (Fig. S1), they were significantly enriched in transcription start sites (TSS) and DNase hypersensitivity regions, in relation to all the CpGs under XCI. They were also significantly depleted in heterochromatin, ZNF genes, and repeats, weakly repressed PolyComb, and strong transcription states (Fig. 2B). These results were in line with previous work showing that the CpGs in the island-containing promoters of genes under XCI are consistently methylated in Xi3,13, and with the notion that the more accessible the CpGs are, the more likely they are to undergo 2-allele demethylation.
We subsequently confirmed typically low percentages of X-Ra in females, i.e. low frequency of 2-allele demethylation in XCI CpGs (Fig. 2C), suggesting high XCI maintenance in women, but not in men (Fig. S2). We validated that X-Ra was a measure of XCI status, as random permutations on XCI labels of the CpGs in X robustly showed higher mean values of X-Ra than correct labeling in 12 different tissues (P < 0.001 in all cases, Fig. 2D). We also observed that 67% (P = 7.12×10− 28) of the X-Ra variance was explained by X-Ra computed with only the 2-allele methylation calls that were present in less than 5% of the women, demonstrating that the infrequent demethylation of CpGs in Xi is the main contributor of the measure (Fig. 2E). Overall, our analyses showed that X-Ra can be regarded as a suitable and robust quantitative measure of global defective maintenance of XCI in women.
X-Ra as a marker of female cancer
We examined the association of X-Ra with the tumor status of samples from the 12 previously studied tissues (Table S3). We found that tumors featured a substantial increase in the percentage of XCI CpGs with 2-allele demethylation (Fig. 3A). To ascertain the molecular mechanisms associated with X-Ra in cancer, we conducted a differential expression analysis for each tumor (Fig. S3). We found several significant genes at the genome-wide level (P < 3.79×10− 6). Importantly, we observed a strong downregulation of XIST with X-Ra (Log2FC=-0.067, P = 1.32×10− 11, Fig. 3B, Table S2), further demonstrating X-Ra as a measure of defective XCI maintenance. Meta-analyses of 13,192 genes from cancer studies identified 537 genome-wide significant genes (Table S4). Thirty-two of these genes have been reported to be under XCI (enrichment OR = 2.87, P = 6.5×10− 7), underlying the expected consequence of partial activation of Xi by downregulation of XIST expression. The top genome-wide significant genes included LRCH1 (Leucine Rich Repeats and Calponin Homology Domain Containing 1) and PHF11 (Plant Homeodomain Finger Protein 11), which play important roles in CD8 + T cell response against tumors and pathogens 17, as well as T-cell activation in autoimmune disease18. Enrichment analyses of these significant genes revealed several pathways relevant to cancer, such as signal transduction, cell proliferation, apoptosis, and viral response (Fig. S4). Overall, these results demonstrated that X-Ra was strongly correlated with global transcriptomic levels, which was consistent with the loss of XCI control in cancer, also suggesting associations with the immune response.
We observed that the cancer status of the biological samples was associated with X-Ra levels, adjusted for age (OR = 1.19, P = 6.05×10− 5, Fig. S5). Heterogeneity suggests tumor stratification according to X-Ra. While demethylation across the entire X chromosome is a general feature of cancer (Fig. S6), the risk estimates of X-Ra across tumors could still be explained substantially by the reduction of X-Ra as a measure of XCI status in cancer (Pearson’s R = 0.66, P = 0.018). Furthermore, the percentage of 2-allele demethylated CpGs of escapees, i.e. not under XCI (grey area under the blue curve Fig. 3A), was not significantly associated with cancer status (OR = 0.97, P = 0.46) (Fig. S7), suggesting a limited role of overall demethylation of the X chromosome in the association between X-Ra and cancer.
Importantly, the percentage of CpGs in escapees with methylation values in the 1-allele methylation zone (0.2 < beta < 0.6, blue area under the blue curve Fig. 3A) increased in tumors, suggesting defective escape of these genes in cancer. The percentages of 2-allele demethylation in XCI GpGs (X-Ra) and 1-allele methylation in escapees were strong additive factors for cancer risk adjusted for age (XCI CpGs OR = 1.45, P = 4.09×10− 6, CpG in escapees OR = 1.90, P = 3.22×10− 9) (Fig. 3C). Furthermore, X-Ra was significantly associated with a lower probability of LOF somatic mutations in genes under XCI (OR = 0.98, P = 9.71×10− 5, Fig. 3D), adjusting for cancer type. In contrast, the percentage of 1-allele methylation in escapees was associated with a higher probability of LOF somatic mutations (OR = 1.04, P = 1.82×10− 4, Fig. 3D), suggesting that the risk of cancer increases when both methylation and mutations affect both alleles. The strong negative correlation between the percentages (Pearson’s R=-0.39, P = 9.72×10− 138, Fig. S8) indicated underlying tumor stratification.
To assess the attributable risk of defective XCI maintenance in any type of cancer given by high X-Ra, we performed a ROC analysis and obtained the optimal Youden value at X-Ra = 10%, which had 85.6% specificity. The attributable risk of dichotomized X-Ra for cancer status was 40% (95%CI = 23–56%). The proportion of high X-Ra (> 10%) was 43% across cancer types, with the highest rates in the thyroid (75%), liver (56%), and breast (50%), and the lowest in lung adenocarcinoma (13%) and pancreas (17%), in line with the association with cancer status (Fig. 3C).
X-Ra as a marker of breast cancer
We investigated the association between X-Ra and breast cancer (BRCA) subtypes, survival, chemotherapy treatment and somatic mutations. A 1%-point increase in X-Ra in tumor biopsy increased the hazard ratio of death from BRCA by 1.7%, adjusting for age (HR = 1.017, P = 0.025) (Fig. 4A). We validated this finding in an independent study (GSE78754) of 70 patients with TNBC (HR = 1.033, P = 0.0081), adjusting for age. Consistent with these findings, we found in GSE141441 that 95 TNBC patients who did not undergo surgery had a higher probability of recurrence due to a high frequency of X-Ra, after adjusting for chemotherapy treatment (OR = 1.08, P = 0.04) (Fig. S9).
With respect to breast cancer subtypes, we found that X-Ra significantly decreased the probability of having no estrogen receptor type (ER-) (OR = 0.982, P = 0.004), no progesterone receptor type (PR-) (OR = 0.984, P = P = 0.008), and no human epidermal growth factor receptor 2 (HER2-) status, although not significantly (OR = 0.988, P = 0.15). Consistent with these results, X-Ra was significantly associated with TNBC (OR = 1.029, P = 3.1×10− 4). This finding was validated in 224 patients (GSE225845), where X-Ra was associated with TNBC versus HE2R+/HER + status (OR = 1.07, 95%CI = 0.0015) (Fig. 4B). In summary, we observed that X-Ra stratified patients with breast cancer, particularly TNBC, and was associated with important clinical outcomes.
We analyzed longitudinal studies to determine whether X-Ra could be used as a monitoring biomarker of neoadjuvant chemotherapy for breast cancer. In 290 patients with three visits during chemotherapy treatment (GSE207460), we found a significant reduction of 1.78% in X-Ra for every 12 weeks of treatment (β=-1.78, P = 1.78×10− 13) (Fig. 4C), adjusting for whether patients were additionally treated with bevacizumab. Interestingly, we noted that patients treated with bevacizumab further reduced X-Ra at each 12-week visit (interaction β=-1.30, P = 0.013). We validated the reduction of X-Ra with neoadjuvant chemotherapy in 22 TNBC patients (GSE184159) with four cycles of chemotherapy during 12 weeks (β=-3.56, P = 0.014), suggesting a stronger modifiable nature of X-Ra by chemotherapy for this subtype.
X-Ra associations with somatic mutations in breast cancer
We then investigated the potential molecular mechanisms underlying X-Ra activity in breast cancer tumors. We considered chromosomal copy number gains in 0.5MB windows. We discovered 492 windows with significant increments in log(X-Ra) associated with the frequency of gains (Fig. 4C, Table S5). Remarkably, most of the top associations on each chromosome included notable oncogenes (Table S6). Notably, we observed gains in MYC locus19, which is known to induce pluripotent stem cell reprogramming with reactivated X-chromosomes in females20. The most significant association was found in a window that included TNFRSF6B, which prevents apoptosis21 and RTEL1 which encodes a regulator of telomere length22. X-Ra in BRCA was significantly associated with longer telomere length (β = 6.6, P = 0.003)23. These data underscore the potential mechanism of X-Ra in BRCA, consistent with the association between telomere length and RTEL1 gains (TL difference = 1.72, P = 6.47×10− 5). Additionally, 22 significantly deleted regions were associated with X-Ra, all located in the short arm of chromosome 8, with the highest peak at CSMD1 (Fig. 4C), which encodes a complement inhibitor known to act as a tumor suppressor in breast cancer24. Finally, we did not find any association with somatic copy number loss, gain, or copy neutral loss of heterozygosity in any region or in the entire X chromosome, suggesting that X-Ra increments are not due to the loss of X or copy neutral variants of X (LOX/GOX/UPD) or parts of it.
The association of X-Ra with 18 frequent somatic mutations (> 5%) revealed an increase of X-Ra by 3.42 percentage points for eachTP53 mutation (P = 5.42×10− 5), and decreased by 5.42 for each CDH1 mutation (P = 4.93×10− 6). Consequently, X-Ra is likely associated with the basal-like 1 subtype of TNBC25.
X-Ra in blood is associated with cancer and aging
We analyzed methylation data from three different studies to test the association of X-Ra with age > 65 years (median age of cancer diagnosis), adjusting for immune cell count in the blood. We excluded patients with a cancer diagnosis in TruDiagnostic; cancer diagnosis was an exclusion criterion in the other two studies. We observed significant increases in X-Ra with age in all studies (β = 0.44, P = 3.3×10− 3, Fig. 5A-B), with significant heterogeneity given by GENOA, a pedigree study on hypertension. Associations with continuous age were significant in TruDiagnostic (β = 0.014, P = 0.0038) and MESA (β = 0.029, P = 3.81×10− 5), but not in GENOA (β = 0.0018, P = 0.65). In the TruDiagnostic data, the association between X-Ra and continuous age increased 3-fold in menopausal women (β = 0.044, P = 6.64×10− 5). In addition, X-Ra was associated with shorter telomere length (β=-1.008, P = 1.19×10− 5, Fig. 5C) and with Horvath’s (β = 0.029, P = 4.85×10− 6) and Levine’s methylation ages (β = 0.016, P = 0.006), which do not include CpGs in the X chromosome. These results show that X-Ra in the blood is an indicator of female aging.
Differential expression analysis of X-Ra in monocytes from the MESA study revealed significant genes relevant to breast cancer and histone modifications (Fig. S10), such as upregulation of ABTB226, downregulation of SOGA127,28, and upregulation of RBBP4 and RSPO229. We also observed significant enrichment of inactivated genes in the differential expression analysis of X-Ra (OR = 3.00, P = 0.002, Table S7) and nominal but expected downregulation of XIST (log2FC=-0.024, P = 0.02). Therefore, high X-Ra values in the blood were associated with breast cancer-related genes and with the expected transcriptomic signatures of XCI maintenance.
We found a significant association between X-Ra in the blood and cancer diagnosis in the TruDiagnostic cohort (167 were diagnosed with any type of cancer) and observed an increase of 7.9% in cancer diagnosis for a 1%-point increase in X-Ra (OR = 1.077, P = 0.034, Fig. 6A). We did not find significant associations between cancer and methylation ages of Horvath and Levine genes. The association between X-Ra in the blood and cancer in GSE237036, adjusted for age and immune cell abundance, was not significant (OR = 1.54, P = 0.25). However, the combination of both studies showed a significant association (OR = 1.08, P = 0.02), with no apparent heterogeneity (P = 0.34). The association between X-Ra in the blood and cancer was independently validated in a longitudinal study (GSE142536) of 17 elderly women, where cancer diagnosis increased the frequency of X-Ra by 2.2% (β = 2.2, P = 0.006) (Fig. 6B). No association was observed between Levine’s methylation age and cancer (Fig. S11). Taken as a test-retest study, X-Ra showed high statistical reliability (ICC = 0.95, 95%CI = 0.91 0.98).