Genome assembly and annotation
The 10x sequencing Genomics Chromium run generated a total of 623 million barcoded reads. The final O. edulis draft assembly has a size 967 Mbp, with 14,796 scaffolds larger than 10 Kbp and a scaffold N50 of 77.76 Kbp. The final annotation comprises a total of 16,615 gene models and 45,064 predicted transcripts. A search using BUSCO v5 36 (metazoan_odb10) indicated that this gene set contains 91.2% single-copy universal metazoan genes (78.5% complete and 12.7% fragmented).
Whole genome shotgun sequencing, quality control and mapping
The two sequencing efforts for each of the 26 oysters collected between 1868 and 1888 across Europe generated 6,289,842 to 75,541,394 reads per run (Supplementary Table S1 for sampling information). Between 9.357% and 83.352% (median = 36.488%) of the reads mapped to the draft nuclear genome of O. edulis. After combining the runs for each sample, median genome coverage was 0.4478x, and ranged from 0.1008 to 2.5337x (Supplementary Table S2). A total of 243,005 sites passed our genotype filters and were used in the subsequent analyses. The historical mitochondrial genomes had a median coverage of 24.24x (range: 0.86–549.38), and 19 samples were covered with at least 4x across more than 90% of the genome.
Neutral genetic structure
Omitting samples with low coverage or the genomic scaffolds with adaptive signatures did not change results of the principal component analysis (PCA) qualitatively (Supplementary Fig. S1), and we thus present the results for all 26 samples and all genomic regions. Low and ultra-low coverage genomes have previously emerged as a resource for trait association mapping, population genomics and the identification of adaptation 37–40. In the PCA, genomes of the now-extinct Wadden Sea population clustered tightly with each other, while the British and French oysters were less differentiated (Fig. 2A). The two Dutch oysters clustered with the Wadden Sea and French-British populations, respectively (Fig. 2A). The present-day oyster from the Limfjord collected in 2018 had an intermediate position between the Wadden Sea oysters and the French-British oysters. For the admixture analysis, K = 3 had the highest delta K value (1135.6372), followed by K = 5 (927.8341), K = 2 (912.6502) and K = 4 (895.1018). The admixture analysis were concordant with the PCA results: the Wadden Sea oysters belonged predominantly to the first ancestral population, the British oysters to the second and third ancestral population, and the French oysters to the third ancestral population (Fig. 2B,C). We tested for HWE and inbreeding in the Wadden Sea and French-British population separately. Of 634,136 SNPs, 29,481 (4.65%) had a significant inbreeding coefficient F and departure from HWE in the French-British population, and 18,875 (2.97%) in the Wadden Sea population. Only four of these SNPs deviated from HWE in both populations.
The phylogenetic tree reconstructed for the 19 mitochondrial genomes that passed our quality checks was congruent with the reconstruction of Hayer et al. 22. Most of the mitochondrial genomes fell into one of three distinct clades: one clade contained only Wadden Sea oysters (the “WS” clade of Hayer et al. 2021), one only French-British oysters (“NEA” clade), and the third clade contained British, Wadden Sea and Dutch oysters (“NS” clade) (Fig. 3A). The mitochondrial nucleotide diversity of the Wadden Sea population (0.010500, n = 7) was not significantly different from the British population (0.010613, n = 6), or the combined French-British population (0.007619, n = 10) (Fig. 3B). The French population alone had a lower nucleotide diversity (0.003459, n = 4), as it did not contain any haplotypes of the most divergent NS clade (Fig. 3A). The Dutch population had the highest nucleotide diversity (0.022170, n = 2) but the confidence in this estimate is low given the small sample size (Fig. 3B).
Genomic signatures of selection
For all genome scans, we compared the Wadden Sea population to the combined French-British population based on the results of the admixture analysis. The genome-wide median of Tajima’s D for the Wadden Sea population was − 0.425, and − 1.013 for the French-British population (Fig. 4A). Plotting Tajima’s D values for each population against each other revealed a cluster of 818 values that were either relatively low in the Wadden Sea population, or relatively high in the French-British population (Fig. 4A). These outliers clustered on 46 genomic scaffolds. Genome-wide Watterson’s theta was 324.74 for the French-British population, and 209.03 for the Wadden Sea population. Plotting Watterson’s theta values between populations revealed a similar cluster of outliers as for Tajima’s D, but with a less clear separation to the overall distribution (Fig. 4B).
The genome-wide unweighted Fst was 0.030766, and the weighted Fst was 0.052661. The distribution of sliding window Fst values was bimodal, with the majority of values centering around the genome-wide median (Fig. 4E). However, of the 47703 sliding windows, 1197 had Fst values that were significant outliers, centering around 0.22 (Fig. 4E). Of the 14,796 genomic fragments that make up the draft genome, the high Fst values clustered on 49 of these fragments. The outliers were exclusively found on fragments of intermediate genetic diversity in the Wadden Sea population (Fig. 4C) and relatively high genetic diversity in the French-British populations (Fig. 4D). This means that the genetic diversity surrounding these highly differentiated regions was lower in the Wadden Sea population than in the French-British population, indicative of selective sweeps in the Wadden Sea population. Outliers from Fst, Tajima’s D and theta occurred concomitantly on 40 genomic scaffolds that were between 70,975 and 1,041,195bp long. These scaffolds contain the strongest candidates for local adaptation.
A total of 524 mRNAs were functionally annotated within the 40 outlier regions. These annotations belonged to 297 different genes. Summarizing these genes by molecular function revealed that the majority had catalytic or binding activity (Fig. 5A). The most commonly discerned biological processes were cellular and metabolic but the outlier regions also contained genes responsible for immune system processes, biological adhesion, interspecies interaction and reproduction, all of which could be relevant for local adaptation (Fig. 5B). Lastly, the outlier regions contained genes for a large number of pathways, including stress and immune response pathways, and pathways related to disease in humans (Alzheimer, Parkinson, Huntington) (Fig. 5C).