Raw sequencing data were downloaded from the SRA/ENA for 29 datasets, covering 25 unique species (Table S1). These data were either individual whole-genome sequencing (WGS), capture-based sequencing (CAPTURE), or pool-sequencing (POOL) data, with a minimum of five sampling locations (see methods for full selection criteria). Data were processed using a common SNP-calling pipeline (Fig S2; see methods), to minimise the influence of bioinformatic technical artefacts. The number of SNPs and individuals sampled per dataset ranged between 173,119 to 23,406,976 and 46 to ~ 1300 respectively (Table S1). To enable comparisons of multiple species, we reconstructed orthology relationships using OrthoFinder227 and classified genes into orthologous sets across species (orthogroups), which could include multiple copies of a gene in a given species due to gene duplication. In total, 44,861 orthogroups were classified, 14,328 were species-specific, and 92.8% of all genes from all genomes were assigned to an orthogroup. Importantly, orthogroups were predominantly characterised by low rates of paralogy (Fig. 1E) and high occupancy (Fig. 1F), with rates of single-copy genes ranging from 39.8%-62.4% (Table S2).
To identify genes with signatures of local adaptation within species, we performed genotype-environment association (GEA) scans by testing the association among per-site allele frequencies and per-site climatic variation (BioClim variables 1–19) taken from worldclim (v2.1, 2.5 minutes)28 using non-parametric Kendall’s Tau correlations (see methods for full details). We also defined and quantified two variables that capture change in local climate (max temperature and precipitation) over the last 50 years. Briefly, these represent the effect size per sampling site of the difference between climate quantified in either the 1960s or 2010s (example in Fig. 1C, see methods). We then combined per-SNP p-values to calculate per-gene p-values using the weighted-Z analysis (WZA)29 correcting for associations between SNP count and WZA variance. This WZA GEA method exhibits increased power and reduced error for identifying adaptive genes across realistic and extreme spatially-correlated climatic variation compared with other commonly-used methods29. To identify orthogroups with repeated signatures of association across multiple species, we applied PicMin30 (see methods supp) across a focal 8,470 orthogroups with at least 20 species represented for each of the 21 climate variables. These focal orthogroups were slightly more likely to include genes with stronger associations with climate relative to untested orthogroups (see supplementary text). For orthogroups with multiple paralogs within a given species, we include the paralog with the strongest evidence of association to test for repeatability after correction for multiple testing. This approach, therefore, tests for repeated adaptation driven by any member of a gene family.
CLIMATE ADAPTATION INVOLVES REPEATED GENE RE-USE
To cast a broad net and capture a large number of genes driving adaptation to climate, we adopt a False Discovery Rate of 50%, and identify a set of 108 unique orthogroups with evidence of repeated association to climate. While this lenient FDR threshold will include a substantial number of false positives, it is chosen specifically to capture as many true-positives as possible to enable downstream analyses of gene properties (as false positives will simply add noise to any analysis of enrichment). This set is the union from tests across all climate variables, many of which are correlated, representing a collection of 141 significant results at FDR < 0.5. Tests of our method on simulated data yield a median of 36 significant ‘repeatable’ orthogroups, representing 36 unique orthogroups at FDR < 0.5 (see Supplementary Materials). Indeed, 108 unique orthogroups was greater than any simulated outcome across 1,000 permutations (max = 102). Our results therefore constitute a three-fold enrichment of orthogroups than would be expected by chance (Fig. 2A; permuted p < 0.001) for causal loci driving adaptation in species spanning both gymnosperms and angiosperms (Fig. 2E), despite the ~ 300 million years separating these clades.
These Repeatedly Associated Orthogroups (hereafter, RAOs) were observed across most climate variables, although variation in the number and strength of statistical support varied. Generally, temperature variables exhibited a greater number of RAOs with stronger evidence of repeatability, in particular mean temperature in the warmest month (Fig. 2B). This suggests that the adaptive molecular response to temperature variation across plants may be more repeatable at the level of individual genes, compared with precipitation, which might reflect adaptive constraint or the added complexity of how precipitation interacts with soil to modulate drought effects. Variability in the number of RAOs identified across climate variables for a given species was not linked to inferred GEA power or estimated niche breadth (see supplementary results, Figs S3-S5). For downstream analyses, we focus on two sets of orthogroups: those with FDR < 0.5 to explore general trends in the distribution of RAOs among species and climate variables and to discuss specific candidate genes; and those with a PicMin p-value < 0.005 (based on the top decile cut-off of strongest evidence of repeatability across all climate variables per-orthogroup) within each climate variable to explore general properties of RAOs.
For RAOs (FDR < 0.5), we identified species contributing towards the signature of repeatability on the basis of their per-orthogroup p-value (see methods). Visualising the abundance of these contributions by species and climate variables (Fig. 2C) demonstrates importantly that no specific species, or cluster of species, contributes excessively to the repeatability signatures that we observe. In agreement, we failed to observe any evidence of phylogenetic signal of GEA results in our RAOs with respect to random orthogroups (see supplementary results; Fig S6). This shows that our identification of significant orthogroups is not driven by groups of closely-related species, but rather a signature observed broadly across the phylogenetic tree. This is reinforced by visualising the contribution to repeatability signatures for pairwise contrasts among species (Fig. 2D), where if groups of closely-related species were driving effects, we would expect heat signatures to cluster near the diagonal, which is not observed.
Sixteen orthogroups were associated with repeatability across multiple climate variables (Fig S3). Most strikingly, the orthogroup including the A. thaliana genes PRR3 and PRR7, with instrumental functional roles in circadian rhythm31 and flowering time32, was repeatedly adaptive across 10 of 21 variables (most strongly with mean temperature in the dry quarter, FDR = 0.102) and across multiple variables within the same species (difference between red and blue bars in Fig. 2E; Fig S7). The role of this gene family in circadian rhythm may contribute towards its repeated association with multiple climate variables if these also vary with latitude. Another notable RAO includes a family of four ubiquitin-related proteins (RUB1/RUB3/UBQ7/AT1G11970), which was associated with six climate variables and most strongly with minimum temperature in the coldest month (FDR = 0.051). The RUB-conjugation pathway has been implicated in the auxin response, embryo development and growth33. A summary of genes in RAOs that are associated with phenotypes known to be adaptive under climatic stress34 is shown in Table 1 (full details of RAO gene contents in Table S3-S5)
Table 1
Summary of known climatic adaptive phenotypes associated with RAOs. Each line represents the A. thaliana paralogs within a given RAO.
Adaptive Phenotype | Genes (A. thaliana paralogs) |
Flowering time, development, and photoperiodism | ATC/TSF/TFL1/BFT/FT |
| AT4G04260/EBS/SHL1 |
| PDP5/PDP2 |
| AHBP-1B/BF5/TGA6/PAN |
| ELF9 |
| GIGANTEA |
| BRD1/BRD2 |
| RVE1/RVE2/AT3G10113/EPR1 |
| GPRI1/GLK2 |
| CUL4 |
Circadian rhythm | PRR7/PRR3 |
| ATH13 |
| RVE1/RVE2/AT3G10113/EPR1 |
| GIGANTEA |
Auxin signaling | GH3.1/GH3.3/GH3.9/WES1/BRU6/DFL1/GH3.17/GH3.4 |
| UBQ7/AT1G11970/RUB3/RUB1 |
| IAA33 |
| PEX5 |
| ABCG33/ABCG41/ABCG30/ABCG42/ABCG43/ABCG37 |
| RVE1/RVE2/AT3G10113/EPR1 |
Salicylic Acid (SA) signaling | AHBP-1B/OBF5/TGA6/PAN |
Abscisic Acid (ABA) signaling | CIPK3/CIPK8/CIPK26/SOS2/CIPK9/CIPK23, SPP1 |
Seed dormancy and vegetative timing | DRG |
| RPN8A/MEE34 |
| GPS1 |
| AT4G04260/EBS/SHL1 |
| POD1 |
| CPN60A/Cpn60alpha2 |
Root growth and development | ZP1 |
| ATSAC1B/RHD4/SAC8 |
| ABCB24/ABCB23/ABCB25 |
| ABCG33/ABCG41/ABCG30/ABCG42/ABCG43/ABCG37 |
| BCHA2/SPI |
| WAV2 |
| GH3.1/GH3.3/GH3.9/WES1/BRU6/DFL1/GH3.17/GH3.4 |
| HRD/AT5G52020/AT1G12630, ANL2/HDG1 |
Cold- and thermo-tolerance | CIPK3/CIPK8/CIPK26/SOS2/CIPK9/CIPK23 |
| HCF106 |
| GIGANTEA |
| HSF4 |
Salt stress | SOS1/NHX8, ANL2/HDG1 |
| CIPK3/CIPK8/CIPK26/SOS2/CIPK9/CIPK23 |
| ATSAC1B/RHD4/SAC8 |
The functions of these genes reflect the expected responses to climatic variation, such as phenological avoidance of drought or frost through changes to flowering time or seed dormancy4, or modification to root hair number and structure in response to temperature changes35. Changes to growth or stomatal function through hormone signaling36 meanwhile may facilitate tolerance to drought by reducing water loss, and genes associated with salt stress may be involved in surviving salt accumulation in soils due to aridity.
We found only a single RAO associated with our two climate change variables at FDR < 0.5; harbouring the A. thaliana genes ATKPNB1, AT3G08943 and AT3G08947. ATKPNB1 is sensitive to abscisic acid and is involved in drought tolerance through stomatal closure37. The limited number of RAOs here likely reflects the relatively short amount of time that our climate change variables are calculated over (~ 50 years), and the limited time to respond to selection subsequently, particularly in longer-lived species.
REPEATED ADAPTATION ACROSS ORTHOGROUPS WITH SIMILAR FUNCTIONS
Repeated adaptation may also happen beyond the gene level, where multiple genes from within the same molecular pathway are used for adaptation across multiple species. Examples of this ‘functional repeatability’ have been documented in adaptation to whole-genome duplication in Arabidopsis38 and highland adaptation in maize39. To explore this phenomenon, we used the STRING database40 (v11.5) to provide a network-based representation of protein-protein interactions (direct and indirect associations compiled from genomic context, experimental evidence such as co-expression, and text-mining of literature), and tested whether RAOs formed networks with more interactions than expected (see methods). We grouped RAOs all together and into temperature- and precipitation-related groups, and tested each group to see whether RAOs as a group contained genes that were more likely to interact with one another than random orthogroups.
Each group of RAOs tended to include more protein-protein interactions than random orthogroups, and this was particularly clear for RAOs identified through precipitation variables (permuted p-value = 0.015) (Fig. 3A). Gene-ontology (GO) enrichment over the same groups of RAOs reflected this, as precipitation-related RAOs were highly enriched for biological processes with clear adaptive roles (Fig. 3B-C; Table S6). orthogroups associated with ‘root hair tip’ were enriched across all RAOs, and temperature-RAOs were enriched for several processes, notably ‘regulation of photoperiodism, flowering’ and ‘brassinosteroid biosynthetic process’ which are known to be involved in thermotolerance4,41. These results suggest that independently identified RAOs contain genes involved in similar functional processes, which implies that repeated adaptation is occurring beyond the level of the gene. This may occur because a given adaptive response requires the coordinated modification of multiple functionally-related genes in all species involved. Alternatively, this signal could also be driven by different subsets of functionally-related genes contributing to adaptation in different subsets of species (e.g. if genes A, B, C, and D are functionally related, species 1, 2, and 3 adapt via genes A and D while species 4, 5, and 6 adapt via genes B and D). In either case, our results show that certain pathways or functional groups of genes are particularly important for adaptation to these climatic stressors, particularly with regards to adaptation to precipitation variation.
REPEATABILITY IS ASSOCIATED WITH INCREASED PLEIOTROPY
We next ask whether the genes we identified that contribute to repeated adaptation tend to share particular characteristics with respect to their degree of pleiotropy. Pleiotropy is a fundamental attribute of a gene describing the number of traits it affects. Based on Fisher’s model of universal pleiotropy42, the ‘Cost of Complexity’ hypothesis3 posits a reduced adaptive potential for genes with greater pleiotropy, as constraint increases with organismal ‘complexity’. In keeping with this, greater fitness consequences are predicted by the degree of pleiotropy in yeast43. However, empirical evidence from mice, nematodes and yeast suggests this cost may be counteracted by a greater mutational effect size per-trait observed for genes with greater pleiotropy44,45. In line with this, Rennison and Peichel46 found genes repeatedly involved in stickleback adaptation exhibited elevated levels of pleiotropy. To test the importance of pleiotropy by multiple definitions in our dataset, we used public databases of gene expression for A. thaliana and Medicago truncatula genes extracted from Expression Atlas47 and ATTED-II48. We explored pleiotropy by two definitions, tissue specificity49 and condition-independent co-expression with other genes50. Tissue specificity of gene expression is inversely associated with pleiotropy, has previously been linked to increased rates of evolution51, and was estimated here according to the τ metric52 (Fig. 4A). τ describes tissue specificity, the inverse of pleiotropy, so to avoid confusion we will describe changes to the breadth of expression, which we define as lower τ. Contrary to the ‘Cost of Complexity’ prediction, we found that RAOs with the strongest evidence of repeatability were strongly associated with increased expression breadth (p = 5.44e− 4), and expression breadth tended to decrease in subsets of orthogroups with increasingly weaker evidence of repeatability, such that orthogroups with the weakest evidence of repeatability were enriched for genes with high specificity (p = 4.74e− 6; Fig. 4C).
Alternatively, pleiotropic constraint can be considered as various node centrality statistics within a gene coexpression network (Fig. 4B), where the ‘distance’ between two gene nodes is lower when co-expression of those genes increases (e.g. across experimental treatments or tissue types). Node degree and strength describe the number and summed weights of interactions for a given node, betweenness describes a gene’s role in bridging subnetworks, while closeness represents nodes with the shortest distance (edge length) to all other nodes. All of these centrality measures have been inversely linked to rate of evolution in several eukaryotic protein-protein networks, and changes to genes with high centrality are more likely lethal53,54, indicative of evolutionary constraint. However, in contrast to these negative associations with evolvability, we observed clear positive associations between evidence of repeatability and co-expression centrality for node closeness, degree, and strength across both co-expression networks (Fig. 4C; node betweenness was not significantly associated with evidence of repeatability in the A. thaliana network). Similar to results for specificity of expression, centrality was significantly greater (p < 0.05) in orthogroups with the strongest evidence of repeatability and significantly lower (p < 0.05) in orthogroups with the weakest evidence. These clear trends highlight a robust association between increased pleiotropy and evidence of adaptive repeatability across all tested orthogroups.
The association we find between repeatability of local adaptation and increased pleiotropy stands in apparent contrast to previous findings of increased contributions to adaptation by genes with reduced pleiotropy51,53,54. This difference may arise because of the scale at which adaptation is occurring; here we have focused on local adaptation, which involves a tension between migration and spatially divergent selection that tends to favour the contribution of alleles of large effect, as they can overcome migration swamping23,55. As the phenotypic effect size of mutations tends to increase with pleiotropy 44,45, increased pleiotropy may therefore be favoured in local adaptation. By contrast, when a species adapts to a temporal change in environment across its whole range, there is no tension between migration and selection and no additional advantage for alleles of larger effect23. This may explain the reduced pleiotropy previously observed in rapidly evolving genes51,53,54. Our results suggest a robust impact of pleiotropy on local adaptation across multiple plant species, consistent with similar observations in stickleback46, ragweed56, and A. thaliana57. It is unknown whether the association between pleiotropy and repeatability is monotonic, or if intermediate pleiotropy promotes repeatability and extreme pleiotropy remains constraining, as suggested by46,57. In the A. thaliana tissue specificity and node degree data, RAO sets were diminished for the least pleiotropic genes, but also enriched for the most pleiotropic (Fig S8). It is important to make clear that we cannot rule out an association between repeatability and pleiotropy due to an increased likelihood of detecting large-effect alleles with GEA methods. A comparable analysis centred on global adaptation through selective sweeps could distinguish these, as selective sweep methods are similarly biassed towards large-effect alleles but there is no assumption of a biological role for increased pleiotropy facilitating global adaptation at the species-level3,42.
We were also interested in whether pleiotropy enrichment was isolated to specific climate variables. We therefore repeated enrichment analyses for orthogroups exhibiting the strongest evidence of repeatability within climate variables (PicMin p-value < 0.005) for tissue expression specificity and A. thaliana node degree (Fig. 4D). The majority of these sets of orthogroups exhibited elevated pleiotropy. Strikingly, orthogroups with the strongest evidence of repeatability associated with our climate change variables were highly enriched (p < 0.05) for pleiotropic genes by both measures. Given these variables only capture environmental change over ~ 50 years, the genes with the strongest evidence of repeatability associated with these variables may be highly pleiotropic due to genes with greater effect sizes facilitating rapid adaptation to shifting fitness optima58.