Annelids exhibit a conserved histone repertoire
To investigate the histone gene repertoire of spiralians and annelids, we first reannotated the H2A, H2B, H3 and H4 histone gene models by combining a protein-to-genome alignment-based approach, transcriptomic developmental time series data, a careful manual curation, and an orthology assignment based on maximum likelihood and Bayesian reconstructions (Fig. 2A; Supplementary Fig. 1, 2). We recovered 81, 90, and 17 core histone genes for O. fusiformis, C. teleta, and D. gyrociliatus, respectively (Table 1). In these species, only six, three, and four genes, respectively, displayed an atypical divergent amino acid sequence, which we regard as unknown histone variants. Out of these genes with unknown orthology, all six in O. fusiformis and one in C. teleta were deemed pseudogenic due to having either null or negligible low expression levels throughout development (Supplementary Fig. 7E–G, 8C). Unlike in other protostomes, where certain histone variants have been lost, like H2A.X in Nematoda (Lemmens and Tijsterman 2011), or where features and roles from multiple variants get merged into a single gene, like H2A.X and H2A.Z in the H2A.v variant in Drosophila (Baldi and Becker 2013), all three analysed annelid taxa have conserved at least one ortholog of each of the core histone proteins—canonical and variant—presumed to be ancestral to Bilateria (Henikoff and Smith 2015; Grau-Bové et al. 2022), i.e., the canonical H2A, H2B, H3, and H4 proteins, and the H2A.X, H2A.Z (or H2Av), macroH2A, H3.3, and cenH3 histone variants, suggesting a potential functional conservation of the histone complement. Owenia fusiformis and C. teleta possess between 16 and 23 copies per canonical histone (Table 1). However, the compact genome of D. gyrociliatus only encodes two genes per canonical histone (Table 1), thus making this fully conserved histone complement one with the lowest (if not the lowest) described copy numbers for canonical histones in a metazoan lineage.
Table 1
Histone gene repertoire in Annelida.
Protein | O. fusiformis | C. teleta | D. gyrociliatus |
H2A family | 24 (21) | 23 | 6 |
H2A canonical | 17 | 20 | 2 |
H2A.X | 2 | 1 | 1 |
H2A.Z | 1 | 1 | 1 |
macro-H2A | 1 | 1 | 1 |
H2A unknown variants | 3 (0) | 0 | 1 |
H2B family | 21 (18) | 19 | 3 |
H2B canonical | 18 | 18 | 2 |
H2B unknown variants | 3 (0) | 1 | 1 |
H3 family | 20 | 25 (24) | 6 |
H3 canonical | 17 | 22 | 2 |
H3.3 | 1 | 1 | 1 |
cenH3 | 1 | 1 | 1 |
H3 unknown variants | 1 | 2 (1) | 2 |
H4 family | 16 | 23 | 2 |
H4 canonical | 16 | 23 | 2 |
Total | 81 (75) | 90 (89) | 17 |
Total number of canonical and variant histones encoded in the genomes of O. fusiformis, C. teleta, and D. gyrociliatus, classified by family (H2A, H2B, H3, and H4). For each family, the different canonical and variant histones are shown. Numbers in parentheses denote the number of non-pseudogenic genes. |
Genomic organisation and gene structure of histones
Canonical histones tend to appear in tandemly repeated organised clusters, sometimes even associated with H1 linker histones (Nagel and Grossbach 2000; Buschbeck and Hake 2017; Amatori et al. 2021). To assess whether this also occurs in annelids, we characterised the genomic organisation of the core histone genes in O. fusiformis and D. gyrociliatus, the two focal species with more contiguous reference genomes (Fig. 2B; Supplementary Fig. 3A). In O. fusiformis, canonical core histones are arranged into two large clusters containing 35 and 29 canonical histones in chromosomes 5 and 8, respectively, and a smaller one containing four genes only in chromosome 7. With the exceptions of the chromosome 7 cluster and a single occurrence in the chromosome 5 cluster, core canonical histones are tandemly arrayed as a H4–H3–H2A–H2B repeating unit. In this unit, H4, H3, and H2B are transcribed from one strand, and H2A is transcribed from the opposite one (Fig. 2B, Supplementary Fig. 3A). Concomitant to the extremely reduced histone copy number, the genomic clusters in D. gyrociliatus are minimal in size, consisting of a single H3–H4–H2B–H2A repeating unit in scaffolds 2 and 29 (Fig. 2B). This may result from a genomic translocation of the H2A–H2B and H3–H4 units, with the transcripts still being synthesised from the same strands as in O. fusiformis. The presence of these H2A–H2B and H3–H4 units makes it plausible that previously described head-to-head gene pairs in C. teleta and Helobdella robusta (Grau-Bové et al. 2022) are also found in O. fusiformis and D. gyrociliatus and potentially in other annelids like Chaetopterus variopedatus (del Gaudio et al. 1998). Interestingly, Owenia fusiformis and D. gyrociliatus display different genomic organisations than those reported for the annelids Platynereis dumerilii (Sellos et al. 1990; Krawetz et al. 1993) and Urechis capo (Davis et al. 1992). Although these are not based on high-quality reference genome analyses and could be reconciled with a single translocation event from the putative O. fusiformis state, they might also indicate that different genomic organisations of the histone complement occurred in annelids.
All clustered histone genes lack introns in both species, whereas histone variants are often multi-exonic (Fig. 2B, Supplementary Fig. 3A). Interestingly, differences in the overall gene length between species are not statistically significant (Fig. 2C; Supplementary Fig. 3B, C). However, the overall intron length of histones, as well as the total and average intron length per gene of intron-containing genes, but not the number of introns per gene, are significantly lower in both C. teleta and D. gyrociliatus compared to O. fusiformis (Fig. 2D–G; Supplementary Fig. 3D–F). This reduction of average intron length can even be traced gene by gene in the single-copy histone variant genes h2ax, h2az, macroh2a, cenh3, and h3–3 (Supplementary Fig. 3G–I). Therefore, annelids organise their histone genes in tandemly repeated organised clusters. Still, significant changes have occurred in both the cluster structure and number, as well as the gene structure, throughout annelid diversification.
Histone genes are in dynamic hyperaccessible chromatin regions
To better understand chromatin regulation around histone genes and their genomic clusters, we leveraged our publicly available ATAC-seq developmental time series for O. fusiformis and C. teleta (Martín-Zamora et al. 2023b), as well as data for D. gyrociliatus female adults (Martín-Durán et al. 2021). Histone genes are amongst those with the highest ATAC-seq enrichment at every developmental point in O. fusiformis and C. teleta, albeit more starkly in the former (Fig. 3A, B, Supplementary Fig. 4A–D). In these species, open chromatin is most prevalent at the transcription start sites and promoter regions, progressively decreasing along the gene body towards the transcription end sites. During the embryogenesis of O. fusiformis and C. teleta, chromatin around histone genes becomes more accessible after the blastula stage during gastrulation, reaching its peak openness and remaining at very elevated levels during larval development (Fig. 3A, B, Supplementary Fig. 4A–D). At the end of the developmental time course, however, chromatin around the histone clusters becomes more compact. This above-average enrichment is present in the clusters that contain canonical histones—especially in the large clusters of O. fusiformis, where it is very significant—and less so in the loci of the variant histone genes (Fig. 3C, D, Supplementary Fig. 4C, D, 5A–C, 6A, B). In D. gyrociliatus, histone genes are also in loci of highly open chromatin (Supplementary Fig. 4E, F). However, their chromatin accessibility landscape differs from those of O. fusiformis and C. teleta, with chromatin being open almost exclusively at the TSS in defined peaks (Supplementary Fig. 5D, 6C). Therefore, histones are in dynamically regulated, hyperaccessible chromatin regions in annelids, suggesting their expression might be more variable than expected for genes essential for DNA compaction and regulation.
Histone gene expression dynamics in Annelida
To investigate histone gene expression dynamics across annelid development, we mapped the developmental RNA-seq time courses for all three species (Burns and Pechenik 2017; Martín-Durán et al. 2021; Martín-Zamora et al. 2023b) against the new gene models containing the curated histone genes. Canonical histone gene expression levels were calculated as the sum of the expression of all the genes encoding the same protein (e.g., all 17 genes for caH2A in O. fusiformis). All four canonical histones (caH2A, caH2B, caH3, and caH4) present similar expression patterns within species (Supplementary Fig. 7A–D, 8A). Moreover, in both species with a larval stage, i.e., O. fusiformis and C. teleta, histones are expressed in the zygote and cleavage stages—particularly in O. fusiformis—but reach maximum expression levels after the zygotic genome activation, at the gastrula stage and immediate post-gastrula stages. Thereafter, they progressively decrease until the competent larva and juvenile stages (Supplementary Fig. 7A–C, 8A). This mirrors and correlates with the developmental dynamics of the accessible chromatin regions in which they are located. In the direct developer D. gyrociliatus, however, maximum expression of canonical histones occurs at the early embryogenesis time point and plummets immediately after (Supplementary Fig. 7A, D, 8A). More interesting are the expression patterns of histone variants, whose expression dynamics across the development of all three studied annelids are conserved, thereby hinting possible conservation of their roles in development (Supplementary Fig. 7A, E–G, Supplementary Fig. 8B). The h2az histone, which is known to maintain the pluripotency state of stem cells in development (Giaimo et al. 2019; Dijkwel and Tremethick 2022), is expressed at high levels in cleavage and early embryogenesis, and it decreases after gastrulation when cell fates are acquired and terminal differentiation into multiple cell lineages starts. Very similar is the pattern of h3–3, which incorporates into chromatin to sustain several differentiation programmes across metazoans (Sakai et al. 2009; Bush et al. 2013; Jang et al. 2015). The centromeric cenh3, which is critical in kinetochore positioning and assembly during cell division (Quénet and Dalal 2012; Kixmoeller et al. 2020), is also highly expressed at cleavage stages when cells are dividing quickly, but its levels dilute earlier and decrease even before the blastula forms, unlike h3–3 and h2az. The only case in which expression increases dramatically throughout development to reach a maximum in the adult is the macroh2a gene, which might be biologically relevant given its known role in transcriptional repression in maintaining the identity of terminally differentiated cells (Buschbeck et al. 2009; Gaspar-Maia et al. 2013). We also found distinct expression patterns of the unknown/unidentified histone variants across all species (Supplementary Fig. 7A, H–J, 8C), particularly that of h3uv1 in O. fusiformis and h2buv1 in C. teleta, which appear to be extremely abundant genes. We systematically explored the conservation of known hPTMs sites in these genes and all other mined histone genes (see Data Availability section), but we still we have no hints of their functional relevance.
Histone H2A.X variants have evolved in parallel in Metazoa
The histone variant H2A.X, critical in DNA repair (Rogakou et al. 1999; Shroff et al. 2004) and embryonic stem cell development (Andäng et al. 2008; Wu et al. 2014), among other non-canonical functions (Turinetto and Giachino 2015), is strikingly divergent across the studied annelids. While other variants have a single ortholog in all three species, C. teleta and D. gyrociliatus display a single h2ax gene, but O. fusiformis encodes two different H2A.X proteins that share an 88.1% sequence identity (Fig. 2A, B; Supplementary Fig. 1, 2, 3A; Table 1). Both paralogs are maternally deposited in the egg and highly abundant in early embryonic stages, and progressively lose expression after the blastula and gastrula stages up to their minima in the juvenile adult stage (Fig. 4A; Supplementary Fig. 8B). The same expression dynamics are evident for C. teleta and D. gyrociliatus (Fig. 4A, Supplementary Fig. 8B). Nevertheless, while developmental dynamics of both paralogs in O. fusiformis are similar, the expression of h2ax2 is up to 150-fold greater than that of h2ax1, confirming that h2ax2 is the dominant H2A.X gene (Fig. 4A, Supplementary Fig. 7E). Yet, despite the lower levels of expression, we could confidently prove h2ax1 is expressed throughout the life cycle of O. fusiformis by specifically amplifying both genes from a cDNA pool of mRNA coming from all developmental time points of O. fusiformis (Supplementary Fig. 9). Remarkably, some of the differences in protein sequence between h2ax1 and h2ax2 lie in critical regions, most strikingly in the C-terminal motif of the protein (Fig. 4B, C; Supplementary Fig. 10A). In mammals and most chordates, Tyr142 is a conserved position that undergoes biologically relevant post-translational modifications (Cook et al. 2009; Xiao et al. 2009). Where h2ax1 displays this phosphorylatable Tyr142 (from here on also H2A.X-Y), h2ax2 has a sterically homologous yet unmodifiable Phe142 (thus being called H2A.X-F) (Fig. 4B). Indeed, this residue change has been observed before in the H2A.X-F gene of Xenopus, where it is expressed in eggs and early embryos and where it has been hypothesised to facilitate rapid early-embryo cell divisions (Shechter et al. 2009; Wang et al. 2014), but also in some fish (Shechter et al. 2009), a mollusc (González-Romero et al. 2012), and in plants (Shechter et al. 2009; Lorković et al. 2017; Lei and Berger 2020). Indeed, we found that some of the few curated spiralian H2A.X sequences—and also some arthropod and other eukaryote ones—display this change (Fig. 4D; Supplementary Fig. 10B–F). Therefore, the distribution of H2A.X-Y and H2A.X-F variants is more widespread than previously foreseen. Yet, it is unknown whether these originated in the last common eukaryotic ancestor or are the result of parallel evolution.
To disentangle the evolutionary history of H2A.X, we performed PHI-BLAST searches for H2A.X-Y and H2A.X-F orthologs across Eukarya combined with phylogenetic reconstruction. We recovered 1,656 H2A.X sequences (660 H2A.X-Y and 996 H2A.X-F), mostly belonging to the highly sequenced Metazoa and Plantae clades, across 29 different eukaryotic phyla (Supplementary Fig. 11B–G). Few clear trends could be elucidated beyond almost all plants displaying an H2A.X-F variant, with the few plant H2A.X-Y variants (nine out of 635) being almost entirely restricted to chlorophytes. In arthropods, the ancestral and most common scenario seems to be an H2A.X-Y from which divergent H2A.X-F proteins evolved. Within Chordata, on the other hand, two distinct clades correspond to each of the two H2A.X variants. In each clade, some conversions to the opposite scenario (Y-to-F and F-to-Y transitions) occur, yet generally, there is strong conservation of the amino acid in position 142. This robustly suggests that in chordates these are different genes with distinct evolutionary origins (Fig. 4E; Supplementary Fig. 11A) but that other evolutionary dynamics are also present in animal and other eukaryote lineages (Fig. 4F). Previous research suggested possessing both a H2A.X-Y and a H2A.X-F variant could be an adaptation for quickly developing aquatic species, yet 36.5% of the species with both variants do not display aquatic lifestyles (Fig. 4G). Here we show, however, that species with this two-H2A.X variant setting are almost exclusively found within Metazoa and are more common than previously thought, as shown in species analysed from the Chordata (23.40%), Cnidaria (16.67%), and Mollusca (30%) (Fig. 4H), likely reflecting H2A.X-Y and H2A.X-F animal variants to be two distinct genes. Altogether, these results demonstrate that histone variants can display intricate phylogenetic histories, highlighting the need for further functional comparative analyses of the genome regulatory implications of the two H2A.X variants in eukaryotes and during animal embryogenesis.
Histone modifiers show a complex evolution throughout annelid diversification
Besides the histone complement, diversity in the repertoire of histone-modifying enzymes (HME) is the other major potential source of evolutionary variation within histone-based regulation. To address this, we investigated the writers and erasers of histone acetylation and methylation: histone deacetylases (HDAC), histone demethylases (HDM), histone methyltransferases (HMT) of both the lysine-specific (KMT) and arginine-specific (PRMT) types, and histone acetyltransferases (HAT), of both type A and type B. We used a mutual best BLAST hit approach to find candidate sequences in the annelid genomes, which were then subjected to orthology assignment using both maximum likelihood and Bayesian reconstructions in six different phylogenetic analyses (Fig. 5; Supplementary Fig. 12–23). We identified a total of 78 clades corresponding to 77 different genes, of which only three (3.9%) were not supported by phylogenetic methods, and only five (6.5%) had bootstrap values or posterior probabilities below 70 or 0.7, respectively, thus making our gene identification highly robust and reliable.
Histone deacetylases
Annelids display at least a hdac1/2, hdac3, hdac4/5/7/9, hdac8, and hdac11 ortholog, as well as one ortholog of each of the known NAD-dependent sirtuins, that is, sirt1, sirt2, sirt3, sirt4, sirt5, sirt6, and sirt7 (Fig. 5A, B; Supplementary Fig. 12, 13). There is an additional well-assigned sirtuin clade, which we have here termed sirtuin novel (sirtn), with a single ortholog in all three species, as well as a clade with high similarity to both sirt6 and sirt7 genes (sirt6/7-like, or sirt6/7l), with two copies present in both O. fusiformis and C. teleta, but not in D. gyrociliatus. The latter presents a duplication of the hdac1/2 gene, whereas the genome of C. teleta encodes for two different hdac6/10 genes. Intriguingly, C. teleta also has an additional HDAC gene that clusters independently, and for which the most similar gene is the poorly characterised hdac12 gene in zebrafish (Fig. 5A, B; Supplementary Fig. 12, 13).
Histone demethylases
In terms of histone demethylases, annelids encode orthologs of kdm1a (lsd1), kdm1b (lsd2), kdm2a/b (fbxl10/11), kdm3 (jmjd1x), kdm4 (jmjd2x), kdm5 (jarid1x), kdm6 (utx/uty), kdm7 (phf2/8), kdm8 (jmjd5), riox1/2 (no66/mina53), jmjd6, and jmjd7 (Fig. 5A, C; Supplementary Fig. 14, 15). All three species encode a single ortholog of these genes, except for kdm5, which is only present in C. teleta, the secondary losses of kdm1b (lsd2) and kdm7 (phf2/8) in D. gyrociliatus, and a lineage-specific duplication of kdm2a/b (fbxl10/11), also in the direct developer D. gyrociliatus. We identified a clade with high similarity to jmjd6, which we have denoted as jmjd6-like (jmjd6l), that is nevertheless distinct from jmjd6 and is present as a single gene in each of the annelids’ genomes. Two other clades phylogenetically closer to jmjd7 yet automatically assigned as kdm8 genes were identified. The analysed genomes contain one copy of each, except in C. teleta, which harbours two genes within the kdm8-like 1 (kdm8l1) clade. Similarly to all other invertebrates, the protein arginine deiminase padi4 is absent in our annelids (Fig. 5A, C; Supplementary Fig. 14, 15).
Histone acetyltransferases
A complete repertoire of nuclear/type A and cytoplasmic/type B occurs in all three annelids (Fig. 5A, D; Supplementary Fig. 16–19). The only exception is kat12 (gtf3c4), which despite being part of a general transcription factor (TFIIIC90) of RNA Pol III (Male et al. 2015; Seifert-Davila et al. 2023), seems to be absent in D. gyrociliatus. At least one ortholog was found for all three lineages for kat2a/b (gcn5/pcaf), kat3a/b (crebbp/ep300), kat4 (taf1), kat5 (tip60), kat6a/b (myst3/4, moz/morf), kat7 (myst2, hbo1), kat8 (myst1, mof), kat9 (elp3), kat13a/b/c (ncoa1/3/2), kat13d (clock), kat14 (csrp2bp), hat1, naa60, and naa40. Unexpectedly, the compact genome of D. gyrociliatus showed the largest expansions, with kat4 (taf1), kat5 (tip60), and kat8 (myst1, mof) duplicated in a lineage-specific manner. But we could also find a likely ancestral duplication common to O. fusiformis and C. teleta of the kat13d (clock) gene (Fig. 5A, D; Supplementary Fig. 16–19).
Lysine-specific histone methyltransferases
The KMT family showed the largest variations between annelids and model organisms. Annelids do not have an ortholog of the chordate-specific kmt3c (smyd2) and kmt3d (smyd1) genes (Calpena et al. 2015), or kmt7 (setd7). They do display orthologs for kmt1c/d (ehmt2/1), kmt1e/f (setdb1/2), kmt2a/b (mll1a/b), kmt2c/d (mll3/2), kmt2e (mll5), kmt2f/g (setd1a/b), kmt2h (ash1l), kmt3a (setd2), kmt3b/f/g (nsd1/3/2), kmt3e (smyd3), kmt4 (dot1l), kmt5a (setd8), kmt5b/c (suv420h1/2), kmt6a/b (ezh2/1), setd6, setmar, and smyd5 (Fig. 5A, E; Supplementary Fig. 20, 21). Strikingly, we could only assign a kmt1a/b (suv39h1/2) ortholog in the case of C. teleta. Dimorphilus gyrociliatus has suffered losses of kmt2h (ash1l), kmt5b/c (suv420h1/2), and setd6, but carries two different genes of the kmt1c/d (ehmt2/1) and kmt2f/g (setd1a/b) classes. Owenia fusiformis has a duplication of the kmt2h (ash1l) gene, whereas C. teleta has one for kmt3e (smyd3). Moreover, we uncovered an expansion of kmt5a (setd8) orthologs exclusive of C. teleta, which has three different paralogs of this gene. Two sequences of the SET and MYND domain-containing family (SMYD)—one belonging to O. fusiformis and another one to D. gyrociliatus—did not cluster with any identified KMT clade and were classified broadly as smyd-like (smydl) genes. Within the kmt8 (prdmx/mecom) clade of genes, both D. gyrociliatus and C. teleta contained three different genes that we could not subclassify but that may represent different genes. Surprisingly, O. fusiformis displayed a large expansion in this family, with up to six different genes that could not be resolved further (Fig. 5A, E; Supplementary Fig. 20, 21).
Arginine-specific histone methyltransferases
Lastly, we also studied the complement of PRMT enzymes (Fig. 5A, F; Supplementary Fig. 22, 23). Excluding prmt2, which could not be found in any polychaete, all others are present, namely, prmt1/8, prmt3, prmt4, prmt5, prmt6, prmt7, and prmt9. A further PRMT gene was identified in all three species that only clustered with the fruit fly Art8 gene and that we deemed art8-like (art8l). The expansions of prmt1/8 and prmt6 in the genome of D. gyrociliatus are thus the only changes within the PRMT family in the studied annelids.
PRMT6 expansions led to domain fusions and likely catalytically dead enzymes
Considering that D. gyrociliatus is a case of extreme genome compaction, frequently associated with gene losses, it is striking that it appeared to possess four different prmt6 genes (hereon referred to as prmt6-a through prmt6-d) that subcluster independently of the rest of the sequences of the other species within the prmt6 clade (Fig. 5A, F; Supplementary Fig. 22, 23). PRMT6 enzymes catalyse type II arginine methylation reactions and yield monomethyl arginine (Rme1/MMA) and asymmetric dimethyl arginine (Rme2a/ADMA) (Fig. 6A) in both histones and non-histone proteins and have been shown to regulate cell proliferation and senescence (Blanc and Richard 2017; Lorton and Shechter 2019). Both the PRMT6-A and PRMT6-C proteins are significantly longer—with a full length of 871 and 1090 amino acids long, respectively—than what is expected of a PRMT6 ortholog (340–380 residues) (Fig. 6B). We analysed their domain and region composition and predicted their protein structure (Fig. 6B–D, Supplementary Fig. 24, 25) and uncovered that while O. fusiformis and C. teleta have a very conserved protein structure and domain length, this is only true for prmt6-b in D. gyrociliatus. Meanwhile, both prmt6-c and prmt6-d have a shorter S-adenosyl methionine (SAM)-dependent methyltransferase class I domain, and prmt6-a and prmt6-c contain additional regions and domains in the N-terminal region of the protein, namely three consecutive galactose/rhamnose-binding lectin domains and a progesterone-induced blocking factor 1 family region, respectively (Fig. 6B, D, Supplementary Fig. 25A). Transcriptomic validation of the fusion genes showed that the one present in prmt6-c is, however, likely a false positive and a result of a wrongful annotation. Unlike in prmt6-a, where both presupposed fused parts of the gene show continuous transcription at similar levels, the prm6-c gene showed large discrepancies in read density across the model and displayed a large number of unexplained antisense reads (Supplementary Fig. 25B, C). No PRMT6 ortholog contains the functionally critical tyrosine dyad (Y47 and Y51) in the PRMT characteristic motif (Fig. 6C, Supplementary Fig. 24). There are also a very high number of amino acid changes in highly conserved positions known to interact with either the SAM cofactor (e.g., R66, E141, and S169) and/or with the arginine residue (e.g., E164 and E317) (Fig. 6C, Supplementary Fig. 24). In fact, structural changes affecting the characteristic PRMT domain are evident, most notably in the lack of tertiary structures of the alpha helices where the PRMT characteristic motif and the tyrosine dyad are located (Fig. 6D, Supplementary Fig. 25A). These sequence and structure alterations are likely disrupting the potentially PRMT6 enzymatic function, thus perhaps relaxing selection pressures and allowing for various gene expansions and divergence events to occur. Altogether, our data indicate that in an annelid with a highly compact genome and a reduced and simplified histone complement, like D. gyrociliatus, the plasticity in histone-based regulation required during embryogenesis most likely resides in the conservation and, in some cases, diversification of the HME repertoire.
Life cycle correlates with heterochronies of histone modifiers
Histone-modifying enzymes (HMEs) constitute a highly plastic source of regulative variation that underpins changes in gene expression patterns (Bannister and Kouzarides 2011; Zhou et al. 2011; Butler et al. 2012; Zhao and Shilatifard 2019; Morgan and Shilatifard 2020). To investigate how temporal shifts in the deployment of orthologous developmental programmes could correlate with large changes in HME expression, we leveraged the entire developmental RNA-seq time series, from the zygote to competent larva/adult stages, for all three annelid species. All expressed transcripts were clustered using an unbiased soft k-means approach into an optimal number of 15 clusters (O. fusiformis and C. teleta) or eight clusters (D. gyrociliatus) (see Methods, Supplementary Fig. 26A–C). Next, we assigned the clusters to their corresponding developmental stages according to their maxima and classified them as cleavage or post-cleavage clusters (O. fusiformis and C. teleta) or early development and late development clusters (D. gyrociliatus). We further subdivided post-cleavage clusters in the indirect developers into pre-larval and post-larval clusters (Supplementary Fig. 26D–F). We profiled the expression dynamics of all five superfamilies of histone modifiers described above across the development of the three species. Given the broad array of biological readouts different modifiers from the same superfamily can have, as expected no clear superfamily-specific trends were common to all three taxa (Supplementary Fig. 27, 28). We then decided to focus on gene-wise expression dynamics. To do this, we compared the timing of deployment of every histone modifier with single-copy orthology either between both indirect-developing species or common to all three species (Fig. 7; Supplementary Fig. 29, 30). When comparing O. fusiformis and C. teleta, we unravelled how there are more histone modifiers under heterochronic shifts between cleavage and post-cleavage clusters (29 genes, 44.6% of the total) than sharing the same expression pattern (27 genes, 41.5%). This is mostly due to 21 enzymes (32.3%) being expressed early during cleavage stages in O. fusiformis but late in post-cleavage time points in C. teleta; yet the opposite is also true for eight of the enzymes (12.3%) (Fig. 7A, top). Heterochronic genes belong to all five superfamilies of modifiers. In the first group of genes (group I), or those delayed in the lecithotrophic larva, we can find, among others, the H3K27ac acetyltransferase kat3a/b (crebbp/ep300) and one of the circadian clock gene orthologs kat13d-a (clock-a); four different sirtuin genes, namely sirt3, sirt4, sirt5, and sirt6; both lysine-specific demethylases kdm1a (lsd1) and kdm1b (lsd2); the H3K79-specific kmt4 (dot1l) and H3K27-specific kat6a/b (ezh2/1); as well as prmt4 and prmt5. Displaced towards a late expression in the planktotrophic larva (group II), examples include the ortholog to the steroid receptor co-activators kat13a/b/c (ncoa1/3/2), hdac11 and sirt1, the bifunctional histone demethylase/ribosome hydroxylase riox1/2 (no66/mina53), the inactive kmt2e (mll5), and the previously discussed prmt6 gene (Fig. 7A, top, D; Supplementary Fig. 29). Contrary to our null expectation of finding heterochronic shifts between pre-larval and post-larval expression, all genes with a post-cleavage expression were consistently deployed at the same life cycle time point—either before or after larval formation—between C. teleta and O. fusiformis (Fig. 7A, bottom), suggesting that histone-based regulatory mechanisms in later developmental stages might be broadly conserved between indirect developing annelids.
We thus focused on the comparison between direct and indirect developers, i.e., between D. gyrociliatus and either C. teleta or O. fusiformis. As observed with transcription factors (Martín-Zamora et al. 2023b), a significant number of HMEs are pre-displaced from late/post-cleavage expression in C. teleta (group III, 14 genes, 29.8%) and O. fusiformis (group IV, 11 genes, 23.4%) to early development in D. gyrociliatus (Fig. 7B, C). Genes involved in these shifts are varied, with all superfamilies except the PRMT enzymes being represented in both groups of heterochronic genes (Fig. 7E, F; Supplementary Fig. 30). A critical advantage of these pair-wise comparisons is that we can intersect them and find those changes that are consistent and specific to direct and indirect developers (Fig. 7G). By comparing groups I. and III., those where genes are shifted from early/cleavage expression in either O. fusiformis or D. gyrociliatus to late/post-cleavage expression in C. teleta, we can robustly describe modifiers specific to the development of the lecithotrophic larva of C. teleta (Fig. 7D, top, E, G, left). These are kat3a/b (crebbp/ep300), kat14 (csrp2bp), hdac8, jmjd6l, kdm1a (lsd1), kdm3 (jmjd1x), kmt4 (dot1l), kmt6a/b (ezh2/1), and smyd5. The equivalent can be produced for O. fusiformis by comparing groups II and IV, which yields the modifiers specific to the development of the planktotrophic larva of O. fusiformis (Fig. 7D, bottom, F, G, centre), namely kat6a/b (myst3/4, moz/morf), kat7 (myst2, hbo1), hdac11, sirt1, and kmt2e (mll5). Lastly, we compared groups III and IV, i.e., those genes that are consistently pre-displaced to early expression in D. gyrociliatus from late/post-cleavage expression in both indirect developers (Fig. 7E, F, G, right), which included kat2a/b (gcn5/pcaf), three different demethylases, jmjd6, kdm4 (jmjd2x), and kdm6 (utx/uty), and kmt2c/d (ehmt2/1). Therefore, a few orthologous HMEs exhibit different temporal expression dynamics between the studied annelids. These might represent key epigenetic regulators of specific developmental programmes that differ between direct and indirect annelids, thus emerging as a tractable gene set for functional manipulations in the future.
Adult annelids harbour distinct histone modification enrichments
Finally, given the differences in the HME repertoires between species and their expression timing across development, we investigated whether adults may consequently display different levels of key histone modifications. To assess this, we used quantitative mass spectrometry-based detection of hPTMs in adult specimens of O. fusiformis and C. teleta and focused on the methylation and acetylation of residues in H3 and H4 histones (Fig. 8A, B; Supplementary Fig. 31A–C). The three biological replicates with two technical replicates strongly clustered by species (Fig. 8C and Supplementary Fig. 31D). Significant differences in key hPTMs were immediately obvious in several peptides. Most notable were the large differences in H3K56 methylation and H3K79 methylation. H3K56 is mostly unmethylated (H3K56me0) or di- and trimethylated (H3K56me2 and H3K56me3) in C. teleta but more abundant in its monomethylated form (H3K56me1) in O. fusiformis (Fig. 8D, left). Notably, these differences are consistent with the differences in the presence/absence of HMEs between species and their developmental heterochronies. H3K56me3 is known to be introduced in humans by the KMT1A (SUV39H1) and KMT1B (SUV39H2) enzymes. The almost exclusive presence of H3K56me3 in C. teleta matches our predictions, given that the annelid KMT1A/B (SUV39H1/2) ortholog is present in C. teleta but absent in O. fusiformis. On the contrary, the ortholog to the human H3K56me1-depositing enzyme KMT1D (EHMT2), which is KMT1C/D (EHMT2/1) in annelids, is present in O. fusiformis, likely explaining the accumulation of H3K56me1 in this species. These different hPTM abundances may have deep biological consequences, as it has been described that while H3K56me1 is involved in regulating DNA replication (Yu et al. 2012), H3K56me3 is a heterochromatic mark that largely overlaps with the constitutive heterochromatin mark H3K9me3 (Jack et al. 2013; Colmenares et al. 2017). We also found large differences in unmodified and monomethylated H3K79, with the former being the dominant form in C. teleta and the latter in O. fusiformis (Fig. 8D, right). Some well-studied and conserved methylation hPTMs like H3K4me2 and H3K4me3—among many others—appeared to be more enriched in one species from EpiProfile analyses, though statistical quantifications showed no significant differences between species (Fig. 8; Supplementary Fig. 31E). Other methyl marks like H3K36me1/2/3 and H3K27me2/3 were significantly more abundant in one of the annelid lineages. We also found this to be true for two acetylations, namely H3K23ac and H3K27ac, both significantly more abundant in C. teleta (Fig. 8F; Supplementary Fig. 31F–H), which could be the result of the heterochronic shift between early/cleavage expression in O. fusiformis and late/post-cleavage expression in C. teleta of five different HAT genes: kat5 (tip60), kat9 (elp3), kat13d-a (clock-a), kat14 (csrp2bp), but most interestingly, kat3a/b (crebbp/ep300). H3K9ac, however, escapes this pattern and appears more abundant in O. fusiformis (Supplementary Fig. 31F). Likewise, this may be associated with chromatin and transcription differences in the annelids, as histone acetylations like H3K23ac and H3K27ac are known to be tightly associated with increased transcription across species (Creyghton et al. 2010; Martin et al. 2021; Zhang et al. 2023). Altogether, our data describe a core repertoire of annelid histone methylation and acetylation of histones H3 and H4, revealing some key differences in the adult levels in the abundance of key hPTM, likely to be the result of differences in the repertoire and expression dynamics of HMEs.