The chromosome-level genome assembly
The orchid Ophrys sphegodes Mill. has a karyotype of 2n = 2x = 36 10,14,15. The haploid genome size was estimated to be 4.83 Gb by flow cytometry, and heterozygosity was estimated at 1.28% via k-mer analysis (Supplementary Figs. 1 and 2, Supplementary Table 1). To assemble the genome, we generated a total of 409 Gb data on the Nanopore PromethION platform (Supplementary Table 2, Supplementary Fig. 3) and sequenced Hi-C chromatin conformation capture libraries (Supplementary Table 3, Supplementary Fig. 4). We used Miniasm assembly16 to generate a total of 11 148 contigs in 6.4 Gb, with an N50 value of 754 kb (Supplementary Table 4). We removed under-collapsed heterozygous contigs (1.2 Gb) and used Hi-C data to anchor the scaffolds into 18 pseudomolecules corresponding to the 18 chromosomes expected for the haploid genome (Fig. 1c, Supplementary Fig. 5a; Supplementary Table 5). The final assembled genome size was 5.2 Gb, in line with the estimated genome size, with a scaffold N50 of 218 Mb, L50 of 10 and L75 of 17 (Supplementary Table 4). Overall, 97.8% of raw Illumina sequence reads could be mapped to the assembly, suggesting that our assembly contains the complete genetic information (Supplementary Table 6). Gene region completeness was assessed using BUSCO (Benchmarking Universal Single-Copy Orthologs)17: 1453 out of 1612 (91.1%) conserved core land plant genes were found in our assembly, of which 1361 (84.9%) were complete (Supplementary Fig. 6a). We confidently annotated a total of 42 549 protein-coding genes, of which 90.01% had functional annotation or RNA-seq support (see Supplementary Tables 7–10 for genes of interest). Cytosine methylation was inferred from Nanopore sequencing data (Fig. 1c, Supplementary Fig. 7).
A burst in transposon activity preceded adaptive radiation
We manually characterised transposable elements (TEs) in the genome following a protocol18 for TE classification that characterises TEs through a combination of homology-based prediction and manual inspection to find structural motifs and define sequence boundaries. To this end, we created the first species-specific database of TEs of an orchid genome, containing a total of 436 novel sequences specific to O. sphegodes. Using this database, we identified a total of 4.05 Gb repetitive elements, occupying 78.06% of the O. sphegodes genome (Fig. 1c, Supplementary Table 11). Overall, O. sphegodes represents the largest orchid genome assembled to date and exhibits the highest abundance of long terminal repeat (LTR) elements (74.57% of the genome), more than Vanilla planifolia (10.00%)19, Apostasia shenzenica (16.81%)20, Dendrobium catenatum (39.88%)20, Phalaenopsis equestris (44.19%)20, Gastrodia elata (54.75%)21, Cymbidium sinense (54.76%)22, and similar to Platanthera guangdongensis and P. zijinensis (73.23% and 71.78%23; Supplementary Table 12). Transposon activity is known to influence genome size variation in eukaryotes24, and particularly LTR/Gypsy and LTR/Copia have previously been shown to correlate with genome size expansion in orchids25,26. In O. sphegodes, analysis of recent LTR insertions shows that LTR activity had an initial increase at around 3 Ma, to reach its maximum at around 1.3 to 0.8 Ma (Fig. 1d). During this period, the Mediterranean Basin experienced climatic oscillations with glacial/interglacial periods27,28 that may have acted as environmental disturbances29,30 and/or facilitated hybridisation31 after distribution range shifts. Any such event might have triggered bursts of TE proliferation in O. sphegodes, thus inflating its genome size. Interestingly, the peak of LTR element insertions precedes or overlaps with the radiation of the most species-rich Ophrys lineage, including the O. sphegodes species group, less than 1.0 Ma12. The ability to cope with genome size changes has allowed angiosperms to successfully diversify32 and TEs have played an important role in enhancing angiosperm evolution33 through their effect upon gene expression30, as well as gene duplication and genomic rearrangements34,35. By contributing to the generation of intraspecific genetic diversity36, TE bursts may have provided the O. sphegodes lineage with the genetic capacity to adapt to a new pollinator niche (i.e., Andrena bees), thereby facilitating the adaptive radiation of the O. sphegodes group.
Genome evolution through chromosome fusions in the Ophrys lineage
To understand the evolutionary history of the Ophrys lineage, we constructed a phylogenomic tree and estimated divergence times across O. sphegodes and 20 other plant species with fully sequenced genomes, based on single-copy orthologues. Ophrys diverged from Platanthera (both from Orchidoideae subtribe Orchidinae), the most closely related orchid with a fully sequenced genome23,37,38, approximately 22.42 Ma ago with a 95% confidence interval (CI) of 20.77–23.95 Ma, and Orchidoideae separated from other orchids around 54.49 Ma (CI 53.00–56.15 Ma; Fig. 2a, Supplementary Fig. 8). Our analysis further suggests that Orchidaceae separated from the common ancestor of Asparagales approximately 99.96 Ma ago (CI 98.66–105.66 Ma) and places monocot/eudicot divergence around 147.51 Ma (CI 139.87-154.45 Ma), in line with previous studies39,40.
To track chromosome evolution of O. sphegodes, we compared it with the most closely related sequenced orchid genomes23, focusing on the comparison with Platanthera zijinensis (Fig. 2b). The two genera differ in their chromosome numbers, the karyotype organisation of Platanthera (n = 21)41 reflecting the ancestral and Ophrys (n = 18) the derived state37,42. Overall, most chromosomes maintained their structure between Platanthera and Ophrys, but some notable major rearrangements are apparent, particularly with regard to chromosome fusions. Chromosome (chr) 4 in O. sphegodes appears to be the product of a fusion between chr 7 and part of chr 4 in Platanthera, and Ophrys chr 10 derives from a fusion between Platanthera chr 15 and part of chr 14. Moreover, Platanthera chr 8 has no homologous chromosome in Ophrys, and significant parts of Platanthera chr 3, 4, 11, 14, 20 and 21 lack syntenic regions in O. sphegodes (Fig. 2b and Supplementary Fig. 9). Taken together, these findings are consistent with a reduction in chromosome number via fusions in the Ophrys lineage.
Expanded gene families may be involved in flower development
To gain insights into the genomic basis of the unique pollination system, we first identified orthologous gene families using OrthoFinder43. We identified a total of 495 819 genes in 26 709 orthogroups among the 21 species, of which 3 054 are shared among all species. A total of 1 351 families containing 4 537 genes were unique to O. sphegodes. We then identified expanded and contracted gene families using CAFE44 (Fig. 2a). In O. sphegodes, 3 712 gene families underwent an expansion, whereas 756 underwent a contraction. This is the highest level of gene family expansion reported among orchids so far, followed by V. planifolia (+ 3 248) and D. catenatum (+ 1 817). Of those gene families, 291 and 59 exhibited significant (p-value ≤ 0.01) expansions and contractions, respectively. Among the significantly expanded gene families were genes encoding transcription factors (TFs) involved in plant reproduction and flower development, such as MADS-domain TFs (55 genes encoding type I and 6 genes of type II MADS; Supplementary Fig. 10a), MYB TFs (73 genes, of which 32 MYB-P; Supplementary Fig. 10b), LATERAL ORGAN BOUNDARIES (LOB) domain TFs (10 genes), C2C2-GATA TFs (12 genes), WRKY TFs (11 genes) involved in trichome development45, SERINE CARBOXYPEPTIDASE-LIKE-1 (SCPL-1) proteins (15 genes) controlling anthocyanin acylation46, TCP TFs (14 genes) regulating flavonoid biosynthesis and floral symmetry47,48, and YABBY TFs (6 genes) involved in establishment of adaxial-abaxial polarity49. Secondly, disease resistance-related and stress response genes were found, such as NAC TFs (30 genes), glutathione S-transferases (28 genes) and plant PLEIOTROPIC DRUG RESISTANCE (PDR) proteins (10 genes). Finally, two putative candidate genes for floral odour production and anthocyanin biosynthesis50 showed significant expansion, too, namely fatty acyl-CoA reductases (FARs) and chalcone synthases, also involved in defence response, (29 and 14 genes, respectively).
Plant adaptation to pollinators in Ophrys is associated with local duplication of candidate genes
The key component for pollinator attraction in Ophrys species is the chemical mimicry of the pollinator female’s sex pheromone, the composition of which has previously been characterised for O. sphegodes s.l.51,52. Here, alkene hydrocarbons are especially important, as different proportions of (Z)-12-alkenes, (Z)-9-alkenes and (Z)-7-alkenes are the major odour differences among O. sphegodes and the closely related O. exaltata, responsible for attracting different pollinators53,54. Thus, genes involved in hydrocarbon biosynthesis are likely important for pollinator attraction50. We annotated previously identified candidate genes50 in the genome of O. sphegodes (Supplementary Fig. 11, Supplementary Table 7). Among these, structural annotation of stearoyl-ACP desaturases (SADs) shows that the key genes, SAD1 and SAD2 (SAD2-type), are duplicated in tandem in a single cluster on chromosome 4 (283.17–283.30 Mb; containing four copies; Fig. 3d), whereas the house-keeping desaturase encoding SAD3 55 resides in one copy in chromosome 5 (280.45–280.47 Mb) and SAD4 (SAD5-type) is also present as a single copy (scaffold 75: 1.04–1.11 Mb; Supplementary Fig. 11; Supplementary Table 7). For other O. exaltata SAD5-type genes, we identified four full-length copies in the O. sphegodes genome (at least two of them on the same scaffold 210), although none of these appeared to be a functional copy (Supplementary Table 7). This is in line with previous findings that O. sphegodes only expresses functional SAD2-type alleles, while functional SAD5-type alleles are expressed in O. exaltata54,55. Phylogenetic analysis confirms the presence of separate SAD gene lineages corresponding to SAD3, SAD2-type (SAD1/2/7/8), and SAD5-type (SAD4/5/6/9/10, an incomplete SAD11) genes (Fig. 3a). Both SAD2- and SAD5-type lineages appear as single copies in Platanthera, suggesting they underwent recent gene duplication events. Of note, SAD2 homologues were only present in orchid genomes from the subfamily Orchidoideae.
Like SADs, membrane-bound fatty acid desaturases (FADs)56, which contribute to fatty acid desaturation and, potentially, alkene production57, were also found duplicated in the O. sphegodes genome. We found a phylogenetic lineage containing four FAD gene copies, of which at least three reside on chromosome 1 (368.58–368.63 Mb; Fig. 3e), a second lineage of two copies clustered on scaffold 33, while one gene, FAD2, was not duplicated. Both lineages showing gene duplications are present as single copies in the Platanthera genome (Fig. 3b).
Fatty acyl-CoA reductase (FAR) homologues show gene duplication in Ophrys. FARs likely catalyse the conversion of fatty acyl-CoA to primary alcohols and different FARs produce fatty alcohols with different acyl chain lengths58. In the O. sphegodes genome, 16 FAR homologues were found, of which 15 gene models form a phylogenetic Orchidoideae lineage together with a single Platanthera sequence. Of these, 14 genes are distributed over only three scaffolds (Fig. 3c; chr 5: 1; scaffold 279: 4; scaffold 133: 4; scaffold 578: 6). Six of these clustered FAR genes reside in one region of 109 kb on scaffold 578 (Fig. 3f). The close vicinity of these copies in a unique clade shows a recent duplication event in the O. sphegodes lineage (Fig. 3c). Gene duplication plays a crucial role in shaping the evolutionary landscape of genomes, as they provide the main raw material for the evolution of new genes59. Single or tandem gene duplications are also involved in the origin of many plant genes60. Often, retention of duplicate genes occurs non-randomly, as changes in the concentration of gene products can have a selective advantage for the organism61. These novel genes may provide Ophrys with an opportunity to respond to selection by pollinators, e.g., through positive dosage effects or neofunctionalisation and pseudogenisation of the less effective variants62.
Population genomic analyses reveal putative barrier loci under pollinator-driven selection
Closely related Ophrys species provide plausible examples of pollinator driven-speciation11,63. We therefore investigated the genetic differentiation between O. sphegodes and three other closely related, sympatric co-flowering species, O. exaltata, O. garganica and O. incubacea in Gargano, South Italy10,11. These four species are pollinated by sexual deception of different solitary bee males: Andrena nigroaenea, Colletes cunicularius, A. pilipes, A. morio, respectively64. These Ophrys species show variation in floral traits, ranging from labellum coloration (markedly blacker in O. garganica and O. incubacea11), to different floral odours mimicking their pollinators’ sex pheromones, which majorly contribute to reproductive isolation10,11,52,54,65. We used Genotype-By-Sequencing (GBS) data to investigate the genetic differentiation between species and scan the genome for signatures of selection. Chord between-population genetic distance66 was calculated per 1 Mb window to identify the most similar/dissimilar species at a given chromosomal region in the O. sphegodes genome. This analysis revealed that segregating polymorphisms between species are distributed across the genome, and that overall genetic (dis)similarity between O. sphegodes and the three other species is roughly equal (Fig. 4a, Supplementary Fig. 12), as may be expected in a species radiation. Yet cumulatively, the four species were clearly separable in a principal coordinate analysis (PCoA; Fig. 4b). These findings are in line with previous population analyses suggesting that many polymorphisms in the genome are shared among all species11, whilst few barrier loci may separate them. Furthermore, global FST outlier analysis (FDR < 0.01) of a GBS dataset from 126 individuals11 revealed highly differentiated regions in chr 2 (327–346 Mb; Fig. 1c and 4c). Interestingly, in contrast to the genome-wide pattern, especially O. sphegodes and O. exaltata are clearly separated in the highly differentiated c. 20 Mb region on chr 2, as seen by PCoA of this region (Fig. 4d). Whether this pattern is due to divergent selection and suppression of effective recombination via e.g., divergence hitchhiking or an inversion, is currently unknown. Overall, using the metaphor of genomic islands of speciation67, these results suggest that adaptation involving individual loci (or small genomic segments) rather than entire genomes, characterises these reproductively isolated sympatric species. The overall high level of allele sharing and the genomic mosaic pattern of species relationships (without long contiguous stretches of similar relationships) cannot easily be explained by gene flow after a secondary contact of species that are separated by strong floral isolation10. An alternative explanation could be that the high level of segregating polymorphisms is due to shared ancestry and large effective population sizes, suggesting that the species are in an early stage of genomic divergence10,53,67,68. Our population genomic analyses identified candidate regions potentially under pollinator-driven selection, thus calling for future research into the roles and functions of the genes in these regions.