Participants were selected based on pubertal stage (i.e., self-reported Tanner staging), matching males (n=47) and females (n=71) on Tanner stage at T1 using the average Tanner scores for pubic hair and breast/testes development. Participants returned for the second time point (T2) an average of 1.97 years later (sd=0.33, range 1.29-3.37 years; Supplementary Figure 1A). Participants provided saliva samples for DNAm quantification via the Illumina EPIC array and additional samples for assay of salivary testosterone (males and females; Supplementary Figure 1B). We used the following strategy in analyzing our data: 1) conduct a differential methylation analysis to separately identify sites that were associated significantly with sex (at T1 and T2) and time (i.e., that shifted between time points); 2) assess the independence or overlap between sex- and time-related sites, and characterize effect sizes, direction, and genomic context; 3) carry forward significant sex- and time- sites to explore co-methylated gene networks, summarized by network ‘modules’ and driving network hubs; 4) further probe co-methylated network modules and hub CpG sites for biological significance by testing for associations with Tanner stage and salivary testosterone, and enrichment for androgen response elements.
Figure S1: A) Distributions of age, Tanner stage, and B) testosterone in males and females
Step 1: Differential methylation analysis to identify Sex- and Time-related DNAm sites
First, to identify individual DNAm sites that differ between males and females, we conducted statistical models regressing DNAm at each site (794,811 sites) onto sex, controlling for age, ethnicity, and cell type proportions (bioinformatically computed using Hierarchical EpiDISH; see methods) separately for each time point. At the first time point, 5,273 DNAm sites were significantly related to sex (adjusted p<0.05); at the second time point, 5,917 sites were significantly related to sex (adjusted p<0.05) after multiple test correction (for all multiple tests we used the false discovery rate Benjamini-Hochberg procedure; Figure 1). Across both time points, there were 3,174 overlapping sex-related sites. This significant overlap may suggest sex-related sites are common across stages of puberty.
Figure 1: DNA methylation sites identified by statistical models and carried forward to network analysis. WGCNA = weighted correlational network analysis.
Next, we conducted models to identify the sites that were shifting within individuals over the pubertal transition using a mixed model, consistent with prior pubertal DNAm studies that assessed DNAm at early and late pubertal stages (i.e., 18,30). We conducted mixed models to calculate the effect of time point across samples controlling for the interval (in years) between T1 and T2, ethnicity, and cell type proportions, with individual subject ID added as a random effect. In addition, we compared these results to the effect of time in separate models conducted for males and females and followed up on significant sites to assess what variables corresponded to changes in DNAm at time-related sites (see methods). In a full model testing for the effect of time across males and females and controlling for covariates, time had a significant effect for 2,602 probes after multiple test correction. In females, 364 sites shifted significantly with time (91.45% overlap with the full model) and, in males, 64 sites shifted significantly (90.63% overlap with the full model; Figure 1). Models split by sex are substantially less powered and, given the large degree of overlap in sites, we collapsed significant sites across the full, male, and female time models for network analyses (see below). We conducted follow-up models on these sites in order to assess what variables corresponded with shifting DNAm patterns: we found that sex predicted shifts only at one site, whereas the age intercval between time points predicted shifts at 20 sites. Smoking and changes in Tanner Stage did not predict DNAm change. Thus, overall, time sites that were moved forward for network analysis for comparison with sex-related sites were mostly independent of sex and age, and entirely independent of pubertal stage (see Methods). All significant model results are presented in Supplementary Table 1. In our next set of analyses, we compared these time- and sex-related sites to assess their independence and characterize effect sizes, directions, and genomic location in the context of puberty.
Step 2: Assess independence of sex- and time-related DNAm sites and characterize effect size, direction, and genomic context
Co-methylation network analysis is driven by patterns of connections among the nodes identified at the differential methylation analysis stage described above. Thus, to gain a better understanding of the patterns of DNAm in sites related to sex and to time, we assessed the overlap of sex- and time-related sites, the size and direction of effects, and their location in relation to genes (i.e., genomic context). To assess the independence of sex- and time-related sites and effect sizes, we assessed overlap both for multiple test correction significant hits (Figure 2A) and for sites that surpassed a biological threshold (Figure 2B). Overall, the majority of significant sites are specific to sex or time; only 46 probes overlapped between sex and time models (1% of unique sex sites and 2% of unique time sites). To further probe effect size, we applied a standard biological threshold of an absolute delta beta greater than 0.0531. A total of 723 unique sex-related sites (14%) exceeded this threshold. In contrast, only four sites (0.2%) survived a delta beta >0.05 threshold in the time models (none of which overlapped with sex-related sites), demonstrating that, overall, time-related effect sizes are smaller than are sex-related effect sizes. When applying a biological threshold to time effect models, more sites were found for female and male specific time effect models relative to the p value threshold, suggesting that some sites from across sex time effect models were larger in either males or females. Figure 2C and 2D show the effect sizes from different models relative to significance: the signal of sex highly surpasses that of time shifts. This is consistent with the epigenetic aging literature, in which average shifts in DNAm of aging-related sites are reported to be 3.2% across a span of 20 years32. Due to the overall smaller effects of time, we set the biological threshold for time-shifting probes to an absolute delta beta greater than 0.02 for further analysis, which increased the time-related probes to 566 (19% of significant sites). With respect to overlap of significant sites with larger effects, sex-specific sites were largely independent of sites that shifted significantly from T1 to T2: in fact, there were only three CpGs that showed differential methylation between both sex and time points that met their respective biological thresholds. Overall, these comparisons indicate that the effects of sex are largely distinct from the effects of time on DNAm in saliva during this phase of puberty.
Figure 2: Overview of significant and biologically thresholded hits across sex and time models. A) Venn diagram showing overlap of significant results (adjusted p value<0.05) from models testing the effect of sex at T1, sex at T2, time across sexes, time in females, and time in males. B) Venn diagram showing how overlapping results shift when applying a biological threshold of delta beta >0.05 for sex, and >0.02 for time (i.e., reducing the model results to those meeting both significance and effect size thresholds). C) Volcano plot for sex models at T1 and T2, with effect size on the x axis and -log10 of the p value on the y axis. Significant sites with larger effects are visisble in the top left and right quadrants. D) Volcano plot for time models across sex, only females, and only males. Note. Different x axis scale for Figure 2C and 2D. E) Count of adjusted p value significant probes found across models for sites that are higher in females versus higher in males. F) Count of adjusted p value significant probes meeting biological thresholds across models that are increasing or decreasing with time. Note. Different y axis scale for Figure 2E and 2F.
Next, we examined sex- and time-related sites for trends in effect direction and genomic context of CpG sites. The trends in direction of sex-related sites were largely similar at T1 and T2 (Figure 2E and 2F): more sites had higher DNAm in females than in males. Similarly, across both sexes and within females and males separately, time shifts were largely due to sites that decreased from T1 to T2. All significant effects of sex at T1 and T2, and significant effects of time point, were dispersed across the autosomes (i.e., significant DNAm sites were identified across chromosome 1-22).
To further characterize the genomic context of sex- and time-related effects, we collapsed significant sites that met the respective biological threshold (sex 0.05, time 0.02) identified across sex models (T1 and T2: 723 sites) and time models (all, females, and males: 566 sites) to compare to the full background of tested sites on the EPIC array for enrichment for genomic locations (i.e., promoter, gene body, intergenic, etc.) as well as mQTLs (see Methods). Sex-related probes were enriched in gene bodies (adjusted p value = 2.73e-06), intergenic spaces (adjusted p value = 4.01e-05), and transcription start sites (within 1,500 bp; adjusted p value = 0.04), and time-related probes were enriched for gene bodies (adjusted p value = 0.01) and transcription start sites (within 200 bp; adjusted p value = 0.01; Supplementary Table 2). Both sex- and time-related sites were enriched for mQTLs identified in buccal cells (see Methods). Next, we were interested in the distinct network structures that might correspond to these separate sets of sites and whether and how they reflect processes of sexual differentiation during puberty.
Step 3: Network analysis to identify co-methylated gene networks
Sex- (723) and time- (566) related sites meeting respective biological thresholds were next carried forward to explore correlated network structure. We applied weighted correlation network analysis (WGCNA) to identify possible co-methylated gene networks from sex- and time-related sites. WGCNA assesses patterns of correlated DNAm levels across many probes by estimating gene clusters, or ‘modules’ summarized by a first principal component or eigengene. These modules represent a correlated network of DNAm sites in terms of their interrelations across samples. In two WGCNA analyses, we tested DNAm sites that met the above set biological thresholds (723 sites >0.05 for sex and 566 sites >0.02 for time). We repeated analyses for all adjusted p value significant probes to confirm the robustness of network results (5273 sites for sex and 2639 sites for time).
Sex network analysis. We first conducted WGCNA on all sex-related probes with absolute delta betas >0.05 (723) across both time points on beta values that were corrected for cell type proportions. This identified two modules, the ‘blue’ and the ‘turquoise’ modules, representing co-methylated gene networks (Figure 3A), summarized in the following by module eigengenes, the first principal component of which captures how modules relate to one another and across individuals. We explored the top ten ‘hub’ CpGs sites (i.e., those with the highest eigen-based connectivity) between the two modules. From the turquoise module three CpGs were identified as hubs within the gene body of RFTN1 (Raftlin, Lipid Raft Linker 1) (Figure 3B). This is a Protein Coding gene related to double-stranded RNA binding. All three CpG sites had higher DNAm levels in females than in males. In the blue module, two CpGs were identified as hubs from the gene body of NAB1 (NGFI-A Binding Protein 1), a protein coding gene involved in transcription factor binding and implicated in Breast Cancer. These CpGs were more highly methylated in males than in females (Figure 3C). Supplementary Table 3 shows the patterns of all high-connectivity probes, with kMeans <0.7 from WGCNA analyses. This table shows that there are also highly connected probes that show opposite trends from the top hub CpG examples in each module. Repeating the analysis with the full set of adjusted p value significant probes confirmed the same correlational structure and common hub CpGs for both modules. Each module can be summarized by a principal component (PC), and then the similarly of modules between different network analyses can be assessed by correlating these top PCsfrom each module. The correlations between top module PCs produced by the p value significant sites in a network analysis versus module PCs produced by a network analysis on the reduced set of biological thresholded sites is shown in Supplementary Figure 2A.
Figure 3: Network structure and example hub CpG patterns from sex- and time-related CpG sites. A) Network structure of sex-related probes identified by WGCNA. The hierarchical cluster tree shows co-methylation modules, with each leaf in the tree representing an individual CpG. B) Example boxplots of beta values of three hub CpGs from the turquoise module annotated to RFTN1. C) Example boxplots of beta values of two hub CpGs from the blue module annotated to NAB1. D) Network structure of time-related probes identified by WGCNA. Note gray color means no cluster identified. E) Example boxplots of beta values of two hub CpGs from the blue module annotated to SHMT2 and ABCC3. F) Example boxplots of beta values of four hub CpGs from the turquoise module annotated to SLC12A9;TRIP6.
Figure S2: A) Correlations between sex DNAm modules composed of all p value significant probes and DNAm modules composed of all biologically thresholded probes. B) Correlations between time DNAm modules composed of all p value significant probes and modules composed of all biologically thresholded probes
A functional pathway analysis of top kME genes from each module (77 sites in blue; sites 126 in turquoise) indicated that sites composing the blue module were enriched for mitotic and cell cycle functions, and sites composing the turquoise module were enriched for protein transport and localization to membrane raft and T cell antigen processing (Supplementary Table 4).
Time network analysis. We conducted WGCNA on all time-related CpGs that met the biological threshold from the general, female, and male models (>0.02, 566); this yielded three modules (Figure 3D). Although we combined sites from the male and female models with the general model, none of the three modules yielded significant sex differences (Brown, t=0.83, p=0.41; Yellow, t=0.22, p=0.83; Magenta, t=-0.55, p=0.58), suggesting that the major variability in DNAm shifts across time are consistent between males and females. The brown module had only one DNAm site that met a kME threshold >.7, but all top-10 hub CpGs increased over time. Three DNAm sites were associated with the meiotic recombination protein REC8 from the kleisin family of structural maintenance of chromosome protein partners; however, such processes would only take effect in germ cells. Of the yellow module driving hub CpG sites, seven are located within seven different genes and three are intergenic. Example DNAm patterns at T1 and T2 are plotted in Figure 3E: a site within SHMT2, a gene encoding a mitochondrial enzyme responsible for glycine synthesis, and a site associated with ABCC3, which encodes an ATP-binding transporter. All of the most highly interconnected DNAm sites within the yellow module decreased in DNAm from T1 to T2. In the magenta module (examples plotted in Figure 3F), five different highly connected CpGs located in the SLC12A9-TRIP6 gene region were identified as hubs, consistent with a previous report19. SLC12A9 (Solute Carrier Family 12 Member 9) is a protein coding gene implicated in cation and chloride symporter activity, and TRIP6 (Thyroid Hormone Receptor Interactor 6) encodes a protein recruited to the plasma membrane to regulate lysophosphatidic acid-induced cell migration. All show decreasing DNAm patterns from T1 to T2. Supplementary Table 3 shows the patterns of all high-connectivity probes, with kMeans <0.7 from the time modules. All highly connected CpGs from the yellow and magenta modules decreased in DNAm over time, and the single CpG from the brown module increased over time.
Repeating the analysis with the full set of adjusted p value significant probes confirmed the same correlational structure and hub CpGs for the yellow and magenta modules; however, the brown module was not well represented in the second set of probes (Supplementary Figure 2B). In combination with the low number of probes (1 site) that met a kME interconnectivity cut-off, these results suggest that the brown module is not as robust in its correlational structure. Moreover, the most highly connected DNAm sites of the brown module fall within a gene in which the protein is only necessary in germ cells, which is inconsistent with the adolescent stage; thus, we focused follow up investigations on the yellow and magenta modules.
Functional pathway analysis of top kME genes from each robust module (25 sites in yellow; 23 sites in magenta; 1 site in brown) showed that the yellow module was enriched for interleukin-1 receptor complex, the magenta module was enriched for glycine biosynthetic process, and glycine hydroxymethyltransferase activity, and the brown module for protein localization to M-band (Supplementary Table 4).
Now with correlated network modules to move forward, our next set of analyses explored the functional links of sex- and time-related sites to additional measures of pubertal development and hormones.
Step 4: Explore functional links of sex- and time-related co-methylation network modules
To explore the biological relevance of sex and time co-methylated network modules, we explored the associations between 1) sex- and time-related modules; and 2) pubertal stage and salivary testosterone. Following up on links with testosterone levels, we further examined enrichment of sex- and time-related sites for androgen response elements.
Module correlations with pubertal stage. We correlated module eigengenes from the sex- and time-correlated networks to assess if and which sets of correlated CpG sites were informative of pubertal stage. For the sex-correlated network at T1, the turquoise module was significantly correlated with Tanner stage (r=0.19, p=0.03); the blue module was not correlated with Tanner stage (r=-0.17, p=0.06). Neither of the sex modules was significantly related to Tanner stage at T2 (blue module, r=-0.11, p=0.29; turquoise module, r=0.12, p=0.27). In contrast, both time-related network modules were strongly correlated with Tanner stage (yellow module, r=-0.50, p=5.49e-16; magenta module, r=-0.37, p=4.38e-09). Figure 4 presents Tanner stage for each time point plotted against time module scores. Interestingly, although there is an overall negative association of each network module with Tanner stage, the network structure accounts for Tanner staging differently at each time point, such that there is an increasing association at Time point 1 and a decreasing association at Time point 2, these trends, however, are not statistically significant (Supplementary Table 5).
Figure 4. Boxplots displaying time-network module scores across Tanner stages for T1 and T2. Module scores decrease over time, but show nonsignificant, opposite trends for each time point (increasing at T1 and decreasing at T2).
Module correlations with testosterone. Testosterone is a driving hormone in sexual differentiation over the pubertal transition. To explore potential links of sex- and time-related network modules, we correlated network eigengene module scores with testosterone levels at each time point in a subsample of participants with salivary testosterone data (n=106 at T1 and n=111 at T2). These correlations are presented in Figure 5. Neither module from the sex-related DNAm network was significantly correlated with testosterone at T1, but was significantly correlated at T2. Moreover, whereas males and females demonstrate highly distinct module membership at T1, they are less distinguishable at T2, as modules shift into tight correlations with testosterone, especially for the turquoise module. This suggests that the correlated networks represented by the modules capture a shift in DNAm levels from T1 to T2 that defines shifting biological status in terms of circulating available testosterone beyond biological sex.
Figure 5. Relation between salivary testosterone and sex module scores. Correlations between 1) blue and turquoise modules estimated by WGCNA on sex-related probes and 2) testosterone at T1 and T2. Module scores show clear distinctions between males and females at T1 and do not correlate with testosterone. At T2 these scores no longer cleanly distinguish males and females and correlate significantly with testosterone.
Given these tight correlations of modules with testosterone, we next assessed whether our sex-related CpGs that reached a biological threshold of >0.05 (723 probes) were individually related to testosterone at T2 but not at T1 after correction for multiple comparisons (Supplementary Table 1). After covarying for the effects of age, ethnicity, and cell type proportions (but not for sex, which is colinear with hormone levels), no probes were significantly related to testosterone at T1. At T2, 334 probes (46%) were significantly predicted by testosterone. All ten top hub CpGs from both modules and 183/203 of the probes from the modules with a kMeans <0.7 were overlapping with probes that were significantly related to testosterone. Correlations between the top 10 hub CpGs from both modules and testosterone at T1 and T2 are presented in Supplementary Figure 3.
Figure S3 Correlations of hub probes for blue and turquoise modules from WGCNA analysis of sex-related probes with testosterone at T1 and T2
Next, we assessed correlations between time modules and testosterone. The yellow (r=.18, p=0.009) and magenta (r=-0.20, p=0.003) modules both were significantly correlated with testosterone levels. Variability in module scores was much greater at T2 relative to T1 (Figure 6), with T1 male and female module scores clustering close together for both the yellow and magenta modules. Module scores are more spread apart and differentiate males and females at T2 in both modules. Similar to sex-related CpGs above, we tested whether DNAm levels at individual time-related CpG sites significantly correlated with testosterone at time 2 but not at time 1. No individual CpG sites were significantly related to testosterone levels at time 1, and only one CpG site out of 566 significantly related to testosterone at time 2 (Supplementary Table 1). Thus, in contrast to sex-related CpG sites, the associations of time-related sites with testosterone levels are not as strong.
Figure 6. Relation between salivary testosterone and time module scores. Points distinguished by time point, with greater variability in module scores at T1 relative to T2. With increases in each module score, testosterone levels increase. The relation is stronger for males than for females, shown by separate regression lines for males and females.
Separate regression lines for males and females show that module score associations with testosterone levels are stronger in males, supported for both the yellow and magenta modules (Figure 6). Regressing testosterone levels on the interaction between module score and sex was statistically significant for both modules (Supplementary Table 5). Thus, although time network modules were largely independent of sex, these correlated networks of DNAm captured variability in testosterone levels to a greater extent for males.
Finally, to assess if there was evidence for epigenetic regulation of gene networks due to testosterone effects, we tested for enrichment of sex- and time-related sites for androgen response elements (AREs) using data from Remap (see methods). 85% of sex-related sites and 87% of time-related sites were found within androgen response elements, both of which were greater than expected by chance (sex-related sites: c2 = 238.57, p < 2.2e-16; time-related sites: c2 = 232.77; p < 2.2e-16) by comparing to a baseline of the average overlap for 10 random draws of sites from quality controlled probes. Taken together, it appears that DNAm networks of sexually differentiating sites and time-shifting sites at the pubertal junction are both linked to fluctuating testosterone levels and are found within androgen response elements.