Identification of mobile genetic elements of methanogenic archaea
To identify MGEs associated with archaeal methanogens, we assembled a database (“methanogen genome database”) consisting of 3436 genomes and metagenome-assembled genomes (MAGs) from Genbank63, the Unified Human Gastrointestinal Genome catalog (UHGG) database64, the IMG database65, and MAGs reconstructed from animal gut samples available in our group (Supplementary Table S1). Complete genomes and MAGs are a rich source of proviral sequences and occasional extrachromosomal elements - circular or linear contigs with terminal repeats, included in the genome assembly during the process of binning66. Our pipeline for the identification of MGEs of methanogens consisted of four steps: CRISPR prediction, MGE prediction, host assignment and vOTU definition (Fig. 1). First, we identified all CRISPR spacers and CRISPR repeats in the “methanogens genome database” using the CRISPR-Cas-Finder program67 (Step 1 in Fig. 1). The resulting data were assembled into a “methanogens CRISPR database”, which includes ~ 60000 spacer sequences, the most extensive collection to date (Supplementary Table S2, Supplementary Data). Next, we predicted MGE sequences in the “methanogens genome database” using a decision-tree approach (Step 2 in Fig. 1). The “methanogens genome database” was first screened for the presence of major capsid proteins (MCPs), a hallmark of viruses, using sensitive homology searches based on profiles obtained from the PFAM and phrogs databases68. Based on the presence and type of the MCPs identified, potential MGEs were classified into three groups: i) putative head-tailed (pro)viruses with the HK97-fold MCP (1404 sequences), ii) (pro)viruses with other MCP types (271 sequences), iii) extrachromosomal elements without known MCP (1661 sequences).
The HK97 fold is a conserved structural fold found in capsids of head-tailed (pro)viruses (class Caudoviricetes)69 and in capsid-like nanocompartments of archaea and bacteria called encapsulins70. To distinguish (pro)viruses from encapsulins, we specified that potential head-tailed (pro)virus sequences should contain at least one of two other hallmark genes of head-tailed viruses12, namely the portal protein or the terminase large subunit (TerL) (Fig. 1). To remove low-quality contigs with incomplete (pro)viruses, we overlapped the ends of the contigs to predict complete sequences of viruses in the extrachromosomal form – circular or linear contigs with direct or inverted terminal repeats (TR), respectively. Then, we applied CheckV71 to estimate the completeness of the remaining sequences and to extract proviruses from the surrounding host sequences. The CheckV estimation of viral completeness is based on a similarity between query viral sequence and the “CheckV complete viral genome database”, which consists mostly of viruses of bacteria. As a result, CheckV underestimates the completeness of archaeal head-tailed (pro)viruses (Supplementary Table S3). Therefore, we selected CheckV contigs with > 90% completeness and refined the integration sites for the rest of the contigs manually. Among the 1404 sequences with an HK97-fold MCP, 370 fulfilled all the above-mentioned conditions and were thus classified as complete head-tailed (pro)viruses.
We then proceeded to analyze the second group of potential (pro)viruses with other types of MCPs (Fig. 1). Nine contigs appeared to be complete virus genomes in extrachromosomal form and we identified several additional potential proviruses. Given the genetic diversity of these (pro)viruses from other reference viruses present in the CheckV database, we did not rely on automatic tools to predict their completeness but rather manually identified the provirus integration sites. After manual completeness filtering, we obtained 48 sequences of complete proviruses that encode MCPs unrelated to the HK97-fold family, leading to a total of 57 potential non-head-tailed (pro)viruses.
Finally, we moved on to analyze the rest of the contigs lacking known MCPs (Fig. 1). From this group, we identified 1661 sequences that appeared in circular form. To confirm the plasmid or viral nature of these sequences, we identified 36 that matched spacers from the “methanogens CRISPR database” with > 90% identity. For the remaining contigs, protein annotation using the MGE-specific database of hmm profiles68 revealed 260 contigs with MGE-specific genes. Together, this step led to a classification of 296 sequences as potential extrachromosomal MGEs of methanogens.
The final dataset of 723 MGEs predicted from the “methanogens genomes database” was supplemented with 73 putative viral sequences from the human gut previously identified through metaviromic studies 72–76 (“gut virome database”, Fig. 1). Finally, we added 44 MGE sequences assigned to methanogenic hosts in the IMG/VR database 65 and 38 MGEs of methanogenic archaea from the NCBI database identified in previous studies29,31,33,37,43,54, 77–79 (“known MGEs of methanogens”, Fig. 1).
We then proceeded to infer the hosts for the 723 predicted MGEs (Step 3 in Fig. 1). An MGE was assigned to a methanogenic host if: (i) it is a provirus integrated into the genome of a methanogenic archaeon, (ii) it is matched by a CRISPR spacer from the “methanogens CRISPR database” with > 90% identity or (iii) the best Blastp hit of one of its proteins with > 70% identity corresponds to a protein of a methanogenic archaeon from the RefSeq database. Based on these criteria, 421 MGEs (58%) could be assigned to methanogenic hosts. To assign hosts to the remaining MGEs, we clustered them based on their encoded proteins using vContact280 and transferred the host assignment to closely related MGEs (MGEs that shared at least three protein clusters). Collectively, this allowed assigning methanogenic hosts to additional 196 MGEs (27%), leading to a total of 617 assigned MGEs (85%). Finally (Step 4 in Fig. 1), we proceeded to the vOTU definition for each predicted MGE by calculating their average nucleotide identities (ANI), which yielded 248 (pro)viral and 63 plasmid OTUs with a threshold of 95% ANI (the list of all MGEs is presented in Supplementary Table S4).
A rich repertoire of MGEs associated with all major lineages of methanogenic archaea
The 248 identified (pro)-viruses infect hosts belonging to the main known lineages of methanogenic archaea (Fig. 2A; Supplementary Table S4). The vast majority (79%) belongs to Caudoviricetes (pro)viruses and are associated with Methanobacteriales (n = 92), Methanosarcinales (50), Methanomicrobiales (25), Methanomassiliicoccales (23), Methanococcales (4) and Methanoliparales (1) hosts. Seventeen viruses were predicted to infect acetoclastic Methanotrix/Methanosaeta, among the main methane producers on Earth81. We also identified seven viruses associated with methanotrophic archaea belonging to the Methanosarcinales (ANME-2, ANME-3). Phylogenetic analysis of the portal protein (Supplementary Figure S1) shows that these head-tailed viruses are very diverse, forming multiple families encompassing the few currently characterized viruses. For example, head-tailed viruses of Methanosarcinales fall into six independent clades: three include the previously described (pro)viruses MetMV, ANMV-1 and Mace-Pro1, while the rest represent totally novel viral families (Supplementary Figure S1).
For the remaining (pro)viruses, we identified icosahedral tailless (pro)viruses with the double jelly-roll (DJR) associated with Methanococcales (3), Methanosarcinales (1), and Methanomicrobiales (13). We also found archaea-specific viral morphologies82. These are represented by pleomorphic proviruses associated to Methanonatronarchaeales (6), but also for the first time in gut-associated Methanomassiliicoccales (27), extending the distribution of pleolipo-like viruses beyond saline environments. We also identified already described spherical virus MetSV of Methanosarcina mazei54 (1) and the nearly-complete spindle-shaped virus previously predicted to infect Methanosarcina37 (1) (Fig. 2A).
The 63 plasmids of methanogenic archaea (of which 39 are newly predicted ones) are circular and have an average size of 28 kbp. They are associated to eight orders of methanogens: Methanosarcinales (n = 34), Methanobacteriales (11), Methanococcales (10), Methanomicrobiales (4), Methanomassiliicoccales (1), Methanoliparales (1), Korarchaeota (1), Methanocellales (1). Finally, we could not identify any MGEs in Methanopyrales, Methanofastidiosales, NM3, Methanoflorentales, Methanohydrogenales and Methanomethyliales, probably due to scarcity of available genomic data for these lineages (Fig. 2A).
To study the relationships between the newly identified MGEs of methanogenic archaea and to assess their connections to MGEs of other archaea and bacteria, we used network analysis by vConTACT280 and the NCBI Bacterial and Archaeal Viral RefSeq V85 database. We constructed a gene-sharing network where each node represents an MGE, and two nodes are connected if the respective MGEs share more than three protein clusters (Fig. 2B). MGEs are largely clustered by host taxonomy, suggesting that this is the first factor underlying gene content similarity. Plasmids and viruses of the same host share proteins involved in replication, integration, and host interactions. Head-tailed viruses associated with different orders of methanogenic archaea (Methanomicrobiales, Methanosarcinales, Methanoliparales, Methanobacteriales, and Methanomassiliicoccales) form an interconnected component in the network by sharing similar sets of structural proteins (Fig. 2B). Interestingly, the head-tailed virus component is also connected to the bacterial part of the network through the head-tailed virus vir120 of Methanoculleus (Methanomicrobiales). Structural proteins of this virus are in fact ~ 30–40% identical to those of a tailed bacteriophage from Geobacillus (Firmicutes) (Supplementary Figure S2), suggesting the possibility of past coinfection and/or extensive gene exchange between these archaeal and bacterial head-tailed viruses. In contrast, head-tailed viruses of Halobacteria, Methanococcales, and most of the Methanomassiliicoccales form discrete components (Fig. 3A), likely reflecting high sequence divergence of their structural proteins.
It has been recently proposed that small ssDNA viruses of the Smacoviridae family59 infect Methanomassiliicoccales archaea, based on eight spacers from the type I-B CRISPR array of Methanomassiliicoccus intestinalis Issoire-Mx1 targeting smacovirus genomes with 90–100% identity83. Using the same identity threshold, we could match 16 additional spacers from M. intestinalis type I-B CRISPR arrays and 10 spacers from Methanomethylophilus CRISPR arrays of types I-B and V-A. Notably, in addition to smacoviruses from the human gut, these new spacers matched chicken, porcine and lynx-associated smacoviruses (Supplementary Table S5). These results further extend and support a possible interaction between smacoviruses and gut associated Methanomassiliicoccales. Interestingly, no spacers targeting smacoviruses were found in other gut methanogens, highlighting a specific interaction between these viruses and Methanomassiliicoccales. However, this conclusion should be treated with caution. In fact, we note that smacoviruses are phylogenetically nested among viruses with experimentally confirmed eukaryotic hosts62,84 and do not show affinity to prokaryotic plasmids encoding the same type of replication proteins85. If smacoviruses indeed infect Methanomassiliicoccales, this relationship could represent a rare case of inter-domain virus transfer from eukaryotes to archaea. Alternatively, we cannot exclude the possibility that exogenous smacovirus DNA is non-specifically internalized into Methanomassiliicoccales cells and occasionally gets incorporated into the CRISPR arrays.
Together, these results uncover a rich mobilome including viruses and plasmids associated with all major lineages of methanogenic archaea. This atlas of MGEs derived from diverse environments and methanogen lineages enables to glean into factors underlying niche adaptation and evolutionary patterns in the global archaeal mobilome.
Functional annotation of methanogens MGEs
To characterize the functional potential of the identified MGEs and their interactions with methanogenic hosts, we proceeded with their annotation. Proteins encoded in MGEs were clustered using vContact280 and annotated using the arCOGs database86, the NCBI conserved domains database87 and the phrogs database68. Overall, most proteins from (pro)viruses and plasmids were assigned to the “function unknown” and “mobilome” arCOG categories (Supplementary Figure S3). We found a significant difference in arCOG classification between the two types of MGEs, with plasmids encoding a larger fraction of proteins involved in replication, cell wall biogenesis, and protein modification and turnover than (pro)viruses (p-value < 0.001, chi-square test). Also, when compared to the recently identified Borgs, the gene content of the newly predicted plasmids is less enriched in metabolism-related proteins and contains more mobilome-specific arCOGs (Supplementary Figure S3), suggesting fundamentally different modes of propagation for Borgs as compared to plasmids with much smaller genomes. Notably, differently from Borgs, we could not identify any genes related to methanogenesis in the predicted MGEs, indicating that manipulation of host metabolism is not a general strategy of these elements.
Fifty-four viruses were found as circular contigs with sequence lengths ranging from 10 kbp (spherical virus of Methanosarcina, MetSV)54 to 228 kbp (head-tailed virus of Methanobrevibacter) (Supplementary Table S4). Ninety vOTUs were found in the integrated form, as proviruses, with sequence length in the range of 6–10 kbp (Pleolipoviridae-like integrated elements of Methanonatronarchaeales and Methanomassiliicoccales) to 82 kbp (head-tailed provirus of Methanobrevibacter) (Supplementary Table S4).
Identified (pro)viruses of methanogens display a vast diversity in terms of gene content (Fig. 3, Supplementary Data, Supplementary Table S6). The majority of head-tailed vOTUs display modular genomic organization similar to that of other bacterial and archaeal Caudoviricetes viruses, with evidently delineated modules for virus-host interaction, genome replication and virion assembly (Fig. 3). Proteins potentially involved in viral genome replication were identified in 40% of the vOTUs and include primases of the AEP superfamily 88, B-family DNA polymerases, clamp loaders and PCNA-like processivity factors as well as minichromosome maintenance (MCM)-like replicative DNA helicases, DNA ligases and ssDNA-binding proteins. Like haloarchaeal head-tailed viruses12, viruses of methanogens encode components of purine and pyrimidine synthesis pathways and biosynthesis of nucleotide sugars pathway used to accelerate viral genome replication. This is the case of thymidylate synthase (ThyA), which is found in 22 vOTUs infecting members of Methanobacteriales, Methanosarcinales and Methanoliparales (Supplementary Table S6).
Viruses of methanogens encode diverse DNA modification systems for protection against restriction-modification-like systems of their hosts (Fig. 3 and Supplementary Table S6). The most prevalent ones include adenine- and cytosine-specific methyltransferases, which are found in 54% of the vOTUs associated with all major lineages of methanogenic archaea. In 32 MGEs, we found methyltransferases encoded next to predicted restriction endonucleases, representing full restriction-modification systems. In addition, we identified systems involved in DNA sulfur modification and several queuosine synthases and FolE GTP cyclohydrolase potentially involved in DNA modification with queuosine89, as previously reported in haloarchaeal viruses and bacteriophages12.
Some viral genomes encode proteins involved in the shutdown of host translation and transcription and in countering the host defense systems (Fig. 3 and Supplementary Table S6). For instance, multiple viruses harbor Lar-like90 or ArdC-like anti-restriction proteins91, involved in the alleviation of the host restriction-modification systems. Additionally, we identified several peptidyl-tRNA hydrolases, which potentially participate in the inhibition of translation of the host proteins.
Interference modules of the type IV-B CRISPR-Cas system were found in six viruses associated with Methanobacteriales and Methanosarcinales archaea (Supplementary Table S7). Other potential defense systems include viperin92 (plasmid pMETHO01 of Methanomethylovorans), BREX93 and AVAST94 systems in plasmids of Methanobrevibacter, a PD-Lambda-195 protein in four viruses of Methanobacteriales, an hma system96 encoded in a virus of Methanoperedens, and the serine/threonine kinase system stk297 in a provirus of Methanobacterium (Supplementary Table S7). This rich repertoire of defense and counter-defense systems suggests a highly dynamic and complex network of interactions between methanogens and their MGEs, likely extending beyond the classical predator-prey relationships98.
In Crenarchaeota6 and Thaumarchaeota99, type IV secretion system proteins constitute the conjugation machinery involved in DNA transfer, but no such system has been reported thus far in Euryarchaeota. Interestingly, we found potential conjugation systems including virB4, virB5 and virB6 genes, encoded not only by plasmids but also by head-tailed (pro)viruses associated with methanotrophic archaea of the ANME-3 lineage (Methanosarcinales, Supplementary Figure S4). These head-tailed viruses might thus be potentially transmitted between host cells both through infection and conjugation. Experimental validation of these conjugation systems may open new possibilities for genetic manipulation in methanotrophic archaea.
Viruses of Methanobacteriales contain proteins to cross the pseudo-peptidoglycan cell wall
Methanobacteriales are widespread in the environment and represent the main archaeal component of the animal and human gut microbiomes45,48,100. All five previously characterized viruses of Methanobacteriales (phiF1, phiF334, psiM2101, psiM10077, Drs333 and C15837) infect environmental hosts, while viruses associated with Methanobacteriales from the gut environment have been reported but not comprehensively described37,48,55. Our analysis shows that head-tailed viruses of Methanobacteriales are very diverse and can be divided into 18 families, including the 3 previously known Speroviridae92, Leisingerviridae 52, and Anaeroviridae33, as well as 15 new ones. These include all complete (pro)viral sequences, corresponding to 7 vOTUs recently identified in human metagenomes by Li et al.57 (Fig. 4, marked with asterisks). Many of these new families are currently represented by a single virus genome, underscoring a considerable and still largely uncovered genetic diversity. We provide detailed characterization for the four most abundant viral families (provisionally denoted as ‘Families 1-2-3-4’) associated with both isolated and uncultured Methanobrevibacter species found in intestinal ecosystems (Fig. 4, Supplementary Table S4). Family 1 viruses (~ 53–63 kbp) are predicted to have siphovirus-like morphology and are targeted by CRISPR spacers from up to 156 different M. smithii and M. intestini genomes48. Family 2 viruses (~ 50–76 kbp) are predicted to have myovirus-like morphology and to infect Methanobrevibacter members from bovine and goat intestinal metagenomes. Family 3 viruses (~ 42–228 kbp) are also predicted to have myovirus-like morphology and include proviruses from M. oralis and from multiple human and animal gut metagenome-assembled genomes of Methanobrevibacter. Family 4 is the most widespread in human gut metagenomes and metaviromes; it is predicted to have sipho-like morphology and contains the previously described sipho-like provirus from M. smithii ATCC 35061 strain (Msmi-Pro155,102). For a more detailed description of these families, see the Supplementary text.
The Methanobacteriales represent one of the two orders of archaea with pseudo-peptidoglycan (pPG) cell walls, an evolutionary convergence with bacterial PG103. Like head-tailed viruses of bacteria, the head-tailed viruses of Methanobacteriales must therefore overcome the pPG barrier of the host twice in their lifecycle: during virus entry upon delivery of genomic DNA into the host cytoplasm, and during virion egress through the lysis process104. During the entry process, bacteriophages carry out local degradation of the cell wall by exolysins, which are PG-cleaving domains typically fused to the structural tail proteins that recognize and attach to the cell surface (fibers, spikes, baseplate, tail tape measure proteins). At the end of the infection cycle, the PG layer is destroyed by the endolysins, small soluble proteins with a PG-cleaving domain and optional PG-recognition domains. We found that an analogous paradigm applies to Methanobacteriales viruses, which display several cell wall degradation enzymes (exolysins and endolysins) (Fig. 4).
Exolysins and endolysins of Methanobacteriales viruses include different types of enzymatic domains: peptidase_C71 (PeiW-like)105, peptidase_C39106,107 (PeiR-like), and glycosyl hydrolase (Fig. 4, Supplementary Table 8). Identified exolysins mostly contain one or two peptidase-C71 domains. For myoviruses with predicted contractile tails, the peptidase_C71 domain is typically fused to baseplate wedge proteins. Viruses with sipho-like morphology (i.e., long, non-contractile tails) instead encode the peptidase_C71 domain fused to a tail tape measure protein (TMP) and/or a spike protein. The spike protein carries an additional Ig-like domain with a potential pPG-binding function. Interestingly, Leisingerviridae viruses do not encode any identifiable exolysins (Fig. 4), suggesting they are either able to pierce the cell wall of the host without degrading the pPG-layer during viral entry or, more likely, encode novel pPG-degrading enzymes.
While exolysins contain the peptidase_C71 domain, the predicted endolysins of viruses of Methanobacteriales exhibit higher diversity (Fig. 4). In addition to the previously characterized pPG-cleaving peptidase_C71 (PeiW-like)77 and peptidase_C39 (PeiR-like)107, viruses of Methanobacteriales also encode glycosyl hydrolases and transpeptidases. The pPG-cleaving domains are encoded in different combinations with putative pPG-recognition modules (PMBR - pseudomurein binding repeat, PGRP - peptidoglycan recognition protein, Ig-like - immunoglobulin-like domains) (Supplementary Figure S5). Notably, the distribution of pPG-cleaving and pPG-recognition domains in viruses of gut-associated Methanobacteriales are different from those of environmental Methanobacteriales (Supplementary Figure S5), suggesting that the diversification of these domains might be linked to adaptation to the gut environment.
Our results show that Methanobacteriales viruses encode a diverse repertoire of pPG-cleaving and pPG-recognition domains, indicating that bacterial and archaeal viruses have converged independently on similar solutions to cross the cell wall.
Enrichment in Ig-like domains and diversity-generating retroelements (DGR) is an adaptation of methanogen viruses to the gut environment
Ig-like domains (belonging to the beta-sandwich domain family) encoded by phages infecting human gut bacteria are thought to be involved in attachment to the surface of the host or to the mucosal layer of the gut epithelial cells, thereby providing immunity by controlling bacteria through predation108–111. To explore whether Ig-like domains are also involved in the adaptation of methanogen viruses to the gut environment, we screened the viral genomes using Pfam profiles of beta-sandwich domains.
We found 289 proteins with Ig-like domains encoded by viruses of Methanobacteriales, Methanosarcinales, Methanomicrobiales, Methanomassiliicoccales and Methanococcales (Supplementary Table S9). In addition to the exolysins and endolysins described above, Ig-like domains were also found in the major capsid proteins (MCPs), major tail proteins (MTPs), tail and head fibers, and spikes. Notably, some Ig-like domains associated with MCP or MTP are presumably translated through a programmed − 1 frameshift mechanism, as previously shown for Ig-like domains in some bacteriophages112. The frameshift mechanism is thought to produce a limited number (3–5%) of MCP or MTP molecules with Ig-like domains, which minimizes the negative impact of the additional domain on virion assembly. Ig-like domains were found in both gut-associated and environmental methanogens and are likely primarily involved in interaction with the carbohydrate moieties present on the surface of the archaeal hosts. However, we observed that gut-specific Methanobacteriales viruses (Families 1,2,3) encode significantly more Ig-like domains than environmental viruses (p-value 0.003, t-test) (Fig. 4, Fig. 5A).
The Flg_new (/List_Bact; PF09479) domain is a potential adhesin domain structurally akin (beta-sandwich) to the Ig-like domain and has been shown to be specifically enriched in some host-associated bacteria113 as well as gut-associated Methanomassiliicoccales113 and Methanosarcinales114. Four Flg_new repeats are present in the Pil3A protein of Streptococcus gallolyticus, playing an essential role in the adhesion of this bacterial pathogen to human gut mucosa115. We found 85 viral proteins which contain the Flg_new domain (Supplementary Table S10). Interestingly, the large majority (83) belong to head-tailed and pleolipo-like viruses of gut-associated Methanomassiliicoccales (Supplementary Table S10) and are therefore also enriched with respect to environmental viruses (Fig. 5B). The majority of the viral Flg_new-containing proteins have a transmembrane domain (86%) and a signal peptide (73%). This domain organization (signal peptide - Flg_new repeats - transmembrane domain) is similar to previously described adhesins of Methanomassiliicoccales hosts113. Altogether, our result suggests that both Ig-like and Flg_new domains play a role in the adaptation of Methanobacteriales and Methanomassiliicoccales viruses to the gut environment.
Genes at the interface of virus-host interactions are among the fastest evolving components of both cellular and viral genomes116–118. A prime example is represented by genes encoding cellular receptors and viral receptor-binding proteins. One of the dedicated mechanisms ensuring a fast mutational pace involves diversity-generating retroelements (DGRs)119–123, a group of genetic elements that utilize error-prone reverse transcription to introduce variations in the target gene (and the corresponding protein) sequence. Remarkably, DGRs have been coopted by both prokaryotic viruses and their hosts to quickly diversify the receptor-binding proteins and receptors, respectively118,124. In the case of head-tailed viruses, the target sequences are usually located in genes encoding tail proteins, specifically within receptor-binding domains 119,120. Using the MyDGR server125, we predicted 25 DGRs in tailed, tailless icosahedral, and pleolipo-like viruses of Methanosarcinales, Methanomicrobiales, and Methanomassiliicoccales (Supplementary Table S4). The DGR system of viruses of methanogens has a similar configuration to DGRs described in bacteriophages (Fig. 5C): it consists of a reverse transcriptase (RT) gene and two imperfect repeat sequences: a template repeat (TR) used for cDNA synthesis and a variable repeat (VR) in the target gene, which is replaced by a TR-cDNA during the retrohoming process. All target proteins have the same domain architecture: 1 to 5 beta-sandwich Ig-like domains or Flg_new domains followed by a C-type lectin fold (CLec-fold) ligand-binding domain. Depending on the DGR, 11–20 diversified amino acids are located on the surface of the CLec-fold domain, accommodating up to ~ 1016 protein sequence variants, consistent with previous estimates for archaeal viruses (1018)78. According to the domain architecture and genomic location, we annotated the target proteins of these DGRs as tail fibers for head-tailed viruses, and capsid spike proteins for icosahedral tailless viruses (Fig. 5C). Again, we found that DGRs are more prevalent in viruses of gut-associated methanogens than in viruses of environmental methanogenic archaea (Fig. 5D). Interestingly, clustering of the RT proteins based on sequence similarity using the CLANS program126 shows that DGR systems of viruses of methanogenic archaea are not related to the previously reported archaeal viral DGR systems127. Notably, the DGR of head-tailed and tailless viruses of Methanomicrobiales are closely related (40–60% identical), suggesting horizontal gene transfer between the two groups of viruses.
These results suggest that enrichment in Ig-like and Flg_new domains is a potential strategy involved in the adaptation of methanogen viruses to the gut environment, similar to bacteriophages of the human gut108–111. They also extend recent studies of DGRs in genomic and metagenomic sequences, demonstrating that DGRs are particularly abundant in gut microbiomes121,120.