Genetic affinities and population structure
Principal component analysis (PCA) in the global context revealed that M.AR was located in the East Asian cluster (Fig. S1B). When only considering East Asians, M.AR was found to cluster with northern East Asian populations (EAS.North), particularly close with the Mongol, Hezhen, and Xibo populations, who also speak Altaic languages (Figs. S1C and 1B). ORQ and EVK were highly variable, whereas DAU showed lower genetic variability and was closer to the other three non-M.AR populations (Fig. S2). To further examine the genetic relationship between M.AR and other EAS.North, we calculated the unbiased fixation index (FST) and the outgroup f3 statistics. The results showed that ORQ presented close genetic relationships with EVK (FST = 0.0038) and DAU (FST = 0.0050), and similarly, EVK was genetically close to DAU (FST = 0.0031) and ORQ (FST = 0.0038), while DAU was genetically closer to the Mongol, Hezhen, and Xibo populations (FST = 0.0016–0.0029) than to EVK and ORQ (Fig S3A). Additionally, among the six populations, EVK and ORQ showed closer genetic affinities with YAK (FST = 0.0126–0.0153) than with the other four populations (FST = 0.0159–0.0190) (Fig. S3B). The phylogenetic tree based on the pairwise FST also supported the aforementioned population relationships (Fig.1C). Moreover, the outgroup f3 statistical analysis also showed consistent results (Fig. S4).
We also performed mtDNA and NRY haplogroup assignments for M.AR to investigate the genetic structure at the maternal and paternal levels, revealing that M.AR harbored high proportions of the mtDNA haplogroups widely distributed in North Asia and Siberia, including D (30.65%), C (12.9%), G (11.29%), A (6.45%), Z (4.84%), M11 (3.23%), and Y (1.61%), although the haplogroups with the East Asian (B, F, and M7, 20.97% in total) and European origins (H and J, 6.45% in total) were also observed (Table S3). All the NRY haplogroups of M.AR were prevalent in East Asia and particularly, C2b1a accounted for nearly half of the proportion (48.39%) (Table S3). To investigate the genetic affinity of M.AR at the maternal and paternal levels, we also collected the mtDNA and NRY haplogroup frequencies of EAS.North and Siberian populations (SIB) from published research. PCA results showed that ORQ was closer to SIB, while DAU was closer to EAS.North (Figs. S5 and S6). Taken together, these findings reveal that M.AR show different genetic affinities with SIB and EAS.North, suggesting their differentiated ancestral makeup and population history.
Genetic ancestry and population history
Using ADMIXTURE [20], we inferred the ancestral makeup of M.AR in the context of northern East Asia and Siberia, showing that EVK and ORQ presented a greater proportion of YAK-enriched ancestry (66.73 ± 17.51% in EVK and 63.06 ± 10.22% in ORQ) than DAU (45.22 ± 9.93%) (Figs. 2A and S7). This finding is consistent with the results of D statistics, suggesting that DAU had closer genetic connections with non-M.AR populations including Han Chinese (HAN) and other EAS.North compared with EVK and ORQ (Fig. 2B). Additionally, EVK and ORQ showed closer genetic relationships with YAK than with DAU, and only one instance of evident gene flow was observed in EVK (Fig. 2B). The estimation of runs of homozygosity (ROH) also showed that EVK had larger medium and long ROHs than DAU and ORQ (Fig. S8), suggesting a lower admixture level in EVK. Similar results were observed using the f3 statistic (Fig. S9).
To infer the demographic history of M.AR, we applied RELATE [21] to estimate the historical effective population size (Ne) using HAN and YAK as references. The Ne values of M.AR were consistently greater than those of YAK and lower than those of HAN from ~14,000 to ~2,000 ya (Fig. 2C). Additionally, M.AR and YAK experienced continuous bottlenecks from ~10,000 to 3,000 ya, while HAN underwent continuous Ne growth (Fig. 2C). The mtDNA-based Bayesian skyline plots also showed a similar pattern to the autosomal data (Fig. S10). We further estimated the pairwise divergence of the five populations and found that HAN diverged from DAU later than the other four populations, while YAK diverged from DAU earlier than EVK and ORQ (Fig. S11). Using IBDNe [22], we inferred their recent demographic history and found that all populations except DAU showed an obvious bottleneck, and HAN had a greater Ne than the others (Fig. S12).
Testing natural selection on CAR genes
Previous studies have identified several candidate genes related to cold adaptation in Arctic populations. To test and compare these cold adaptation related genes (hereafter referred to as CAR genes) in our studied populations, we first collected CAR genes from SIB, Inuit, and Alaskan populations (Table S4) and used several statistical methods to detect significant selection signals in M.AR using YAK as a reference. To explore both shared and differentiated selection history of M.AR, we focused on CAR genes with significant selection signals in at least one population of M.AR and YAK in each method (see Methods) and observed that five CAR genes met these criteria: PRKG1, OCA2, TBX15, WARS2, and HS3ST4 (Figs. 3A and S13). Since population-specific factors like gene flow and genetic drift can lead to allele frequency changes in certain CAR genes derived from common ancestors, we also applied RELATE to examine these genes putatively under positive selection and CLUES [23] to infer the selection history based on the allele frequency changes.
Among these CAR genes, the strongest signals considering both M.AR and YAK, were present in PRKG1,which is involved in the regulation processes of smooth muscle contraction and plays a central role in increasing cardiac pressure during cold exposure [24, 25]. Of eight overlapping highly differentiated SNPs in PRKG1, the derived allele frequency (DAF) of the variant (rs75411997) with the top selection signal was 45–60% in M.AR and YAK, higher than that in the 1000 Genomes Project (KGP) (8%) and KGP East Asians (28%) (Fig. S14). Similarly, two extremely highly differentiated SNPs (rs74653330 and rs76582321) within OCA2 were detected in all four populations (Fig. S15). Notably, rs74653330, with the highest LSBL values, is a missense variant conserved in evolution (CADD = 23.8) and is associated with skin color in East Asians [26]. The DAF of rs74653330 peaked at 25-40% in M.AR and YAK, compared to 1% in KGP and 3% in KGP East Asians; likewise, rs76582321 exhibited the greatest (70–80%) DAF in M.AR and YAK, contrasting with 10% in KGP and 34% in KGP East Asians. Additionally, inferences of allele frequency growth suggested that selection on these adaptive variants was shared among the four populations and may have occurred before population differentiation (Figs. S14 and S15). Interestingly, the DAF of rs74653330 increased rapidly after ~5,000 ya in the four populations (Fig. S15), suggesting recent shared selection on OCA2.
In addition to the aforementioned CAR genes detected across the four populations, we noticed an alternative selection pattern in several reported CAR genes, with stronger signals in ORQ and YAK than in DAU and EVK (Fig. 3C). For example, within TBX15, a gene involved in adipocyte differentiation and skeletal development in mice [27-29], we identified rs190281538, with the highest LSBL values, as a significant selection signal in ORQ (DAF = 17%) and YAK (DAF = 12%), but less so in DAU (6%), and absent in EVK and KGP East Asians (Fig. S16). Similarly, for WARS2, another reported gene adjacent to TBX15, rs148015490 and rs148437530 under the strongest selection showed greater DAFs in YAK (28%) and ORQ (26%) than in DAU (13%), EVK (13%), and KGP East Asians (1%) (Fig. S16). Moreover, within HS3ST4, seven extremely highly differentiated SNPs with complete linkage were detected in ORQ and YAK (Fig. S17). The variant (rs60949977), with the top selection signal, exhibited a DAF of 35% in ORQ and 34% in YAK, much greater than in DAU (19%), EVK (13%), and KGP East Asians (11%) (Fig. S17). Given the changes in the DAFs of these variants, ORQ and YAK, with higher DAFs, showed rapid increases that were markedly different from the pattern of DAU and EVK (Figs. S16 and S17).
Identification of novel CAR genes
In addition to the reported CAR genes, we hypothesized that adaptation to different cold environments could lead to divergent biological adaptations detectable as population-specific selection signatures. Using the WGS data, we focused on M.AR-specific selection signals related to cold adaptation that have not been previously reported in human populations. We applied the locus-specific branch length (LSBL) to perform genome-wide scans for each M.AR population to detect selection signatures, using YAK as a control comparison, and validated findings with the integrated haplotype score (iHS) and cross-population extended haplotype homozygosity (XP-EHH) analyses (see Methods).
We identified four novel CAR genes involved in energy regulation and metabolism, FGGY, ADH1A, ADH1B, and NELL1, which reached the top 1% genome-wide significance threshold in all LSBL tests (empirical P value < 0.01) (Fig. 4A). FGGY, which encodes a protein that phosphorylates carbohydrates, is involved in muscle growth and fat deposition in pigs [30]. The neighboring ADH1A and ADH1B, well-known for encoding alcohol dehydrogenase, are involved in lipid metabolism (GO:0006629) and the proper development and metabolic activity of adipose tissues [31]. Additionally, previous research on Arabidopsis has demonstrated that ADH genes play an important role in the cold stress response of plants [32]. NELL1, encoding a cytoplasmic protein that contains epidermal growth factor (EGF)-like repeats, is crucial for fat metabolism in adipose tissues based on a previous study of sheep [33]. We confirmed that these candidate adaptive genes can be identified as selection signals by the iHS and/or XP-EHH methods (Fig. S18).
The genomic region of FGGY had four extremely highly differentiated SNPs in EVK, ORQ, and YAK, of which rs150686724 exhibited the highest LSBL values (Fig. S19). The derived A allele of rs150686724 was most prevalent in YAK (34%), followed by ORQ (24%) and EVK (21%), but was rare in KGP (0.5%) and KGP East Asians (2.4%). We also observed a correlation between the DAF of rs150686724 and the minimum temperature (R2= 0.38, P = 6.49×10–9) (Fig. S20). However, these four SNPs were not identified as candidate adaptive variants in DAU, which showed a different haplotype pattern from EVK, ORQ, and YAK. Moreover, the allele frequency trajectory of the variant (rs76988708) with the strongest selection signal in DAU showed a unique pattern distinguishing it from the other three populations (Fig. 4B).
Within the adjacent genomic regions of ADH1A and ADH1B, nine SNPs showed extremely high differentiation and complete linkage in both M.AR and YAK, of which rs12505816 in the intergenic region exhibited the highest LSBL values (Fig. S21). The derived A allele of rs12505816 was correlated with increased ADH1A expression in adipose-subcutaneous tissue (P = 1.5×10−4, normalized effect size = 0.24). Compared to the low frequencies in KGP East Asians (9%), the DAFs were relatively high in M.AR (48% in ORQ, 45% in EVK, and 38% in DAU) and in SIB (63% in Chukchi, 58% in Eskimo, 47% in Yukagir, 42% in YAK, and 38% in Ulchi) (Fig. S22). Additionally, M.AR and YAK showed a similar pattern of allele frequency trajectories (Fig. 4C), suggesting shared selection history.
Since ADH1B has been reported to be under selection in East Asia, we also examined rs1229984, a well-known missense variant associated with reduced alcoholism risk and indicative of strong positive selection in East Asians due to agriculture-related adaptations ~10,000 ya [34-40]. However, rs1229984 showed no selection signal in M.AR or YAK, with a DAF of 30–40% in M.AR and 14% in YAK, lower than the 69% observed in HAN (Fig. S23). Moreover, the inferred DAF of rs12505816 in YAK and M.AR had already reached 20–40% before 10,000 ya (Fig. 4C), earlier than agricultural emergence in East Asia. Therefore, we speculated that the selection signals on ADH1A and ADH1B observed in M.AR and YAK were derived from another independent event, distinct from the reported adaptations associated with agricultural advances in East Asia.
Regarding NELL1, we observed population-specific extremely highly differentiated SNPs in each population, suggesting the high genetic diversity of this gene (Fig. S24). Particularly, EVK showed the lowest genetic diversity in NELL1 among M.AR and YAK populations (Fig. S25). A comparable pattern of allele frequency trajectories was observed in DAU, ORQ, and YAK, which was different from EVK (Fig. S24), despite similar haplotype patterns in the four populations (Fig S26). Intriguingly, the putative adaptive haplotype composed of these adaptive variants was not only prevalent in our studied high-latitude populations (20–30%), but also reached high frequencies among Tibeto-Burman speakers in Southwest China’s high-altitude regions, such as the Drung (50%), Khatso (50%), and Jingpo (41%) (Fig. S26).
Adaptive scenario in the AR
Combining the aforementioned results, we proposed a simplified model to describe the selection history of CAR genes in the AR (Fig. 5). Some adaptive variants were shared among the four populations, implying a common ancestral origin (e.g., PRKG1, OCA2, ADH1A, and ADH1B). However, some genes showed significant selection signals in all four populations, but an alternative haplotype pattern in DAU (e.g., FGGY). Given the high admixture level of DAU in M.AR, we hypothesized that this effect would decrease the DAFs of some adaptive variants and result in distinct haplotype patterns. Compared to DAU, EVK and ORQ showed closer genetic affinities and more ancestry sharing with YAK. Consequently, we hypothesized that they preserved the derived adaptive variants from the ancestor’s local adaptation. However, EVK is the most isolated population in M.AR, which may have resulted in the loss of adaptive alleles and fixation on ancestral haplotypes, especially for genes with high genetic diversity (e.g., NELL1). Particularly, the admixture in DAU and genetic drift in EVK would both lead to the loss of adaptive alleles and could be reflected in adaptive variants significant only in ORQ and YAK (e.g., TBX15, WARS2,and HS3ST4).
Considering the existing archeological evidence suggesting that most Neanderthal and some Denisovan populations living in ultracold areas have undergone intense adaptation to cold climates [41-44], we thus hypothesized that archaic introgression may contribute to cold adaptation in modern humans. To explore this, we analyzed candidate adaptive genomic regions to detect archaic introgression and assessed whether the adaptive haplotypes originated from archaic hominins in our studied populations. We first focused on reported CAR genes in M.AR and YAK with selection signatures, mapping their archaic ancestry informative markers (aAIMs) to high-coverage archaic genomes and then compared the proportion of aAIMs in highly differentiated SNPs across the whole genome to evaluate significance (see Methods). Consistent with a previous study [45], archaic sequences were detected within the adjacent genomic regions of TBX15 and WARS2, with much longer archaic segments in YAK than M.AR (Fig. S27). Notably, nine aAIMs of YAK matched Neanderthal and Altai Denisovan genomes, and eight were highly differentiated SNPs. Additionally, 90% of YAK's highly differentiated SNPs in this region were aAIMs, significantly greater than the whole-genome level (chi-square test, P < 2.2×10−16) (Fig. S16). We then investigated the archaic introgression of newly discovered CAR genes, and revealed only archaeal segments within NELL1 (Fig. S27), with over 10 aAIMs of both M.AR and YAK matching Neanderthal genomes. However, we only observed that in DAU and EVK, the proportions of aAIMs among highly differentiated SNPs were significantly greater than the whole-genome level (chi-square test, P < 2.2×10−16) (Fig. S24). These results suggested that the putative adaptive haplotypes of NELL1 in DAU and EVK possibly originated from Neanderthal ancestors, an effect not observed in ORQ or YAK.