AGO proteins are key effectors in eukaryotic RNA silencing and RNA interference (RNAi). This is possible through the formation of an RNA-induced silencing complex (RISC) with small interfering RNA (siRNA). RISC utilizes short 5’-phosphorylated RNAs to recognize RNA transcripts complementary to siRNAs through base-pairing, and silences target genes or transposon elements at the transcriptional or post-transcriptional level. The eukaryotic RNAi pathway is relatively conserved because eukaryotic AGO proteins share common structural properties and functions. Meanwhile, plant AGO proteins have evolved specialized and diverse functions [1, 2]. It has been recognized that some prokaryotic AGO proteins have genome editing potential in the same vein as CRISPR-Cas9 and play an important role in biological defense mechanisms [3]. This review systematically summarizes the general characteristics of AGO proteins, the assembly and action mechanism of RISC, and the latest functions of AGO proteins to provide a reference for further study and utilization.
1.1 AGO protein family structural features
Eukaryotic AGO proteins are structurally conserved and typically contain four domains: N-terminal domain, P-element induced wimpy testis (PIWI) domain, PIWI-Argonaute-Zwille (PAZ) domain, and middle (MID) domain (Figure 1). The crystal structure of AGO proteins shows that the MID and PIWI domains form a pair, while the N-terminal and PAZ domains form another pair to resemble a bilobate structure [4, 5]. The N-terminal, MID, and PIWI domains form a crescent-like bottom groove structure, and the top of the groove is the PAZ domain. PAZ is connected to the crescent structure through the linker 1 (L1) and linker 2 (L2) domains. Furthermore, PAZ is connected to the N-terminus through L1, and the MID domain is connected through L2, while N-PAZ and MID-nucleic acid binding occurs in the bilobed channel formed by PIWI (Figure 2). However, AGO protein structure undergoes structural rearrangements when bound and functionalized by small RNA (sRNA) molecules [6].
The N-terminal sequence is variable and can be involved in unwinding RNA duplexes, assisting in RNA cleavage processes, and assembling mature RISCs. PAZ has approximately 130 amino acids and is the structural component of AGO proteins and Dicer enzymes. It can recognize and bind to sRNAs or small interfering RNAs (siRNAs). The 3’ end of siRNA/microRNA (miRNA) or 3’ methylated PIWI-interacting non-coding sRNA (PIWI-interacting RNA, piRNA) insert into the PAZ domain which protects sRNA from degradation. The PAZ structure has several typical characteristics: it binds to RISC; it is more inclined to bind to RNA than DNA; it cannot bind to siRNA phosphorylated at the 5’ end; and it binds to single-stranded RNA (ssRNA), which is recognized and bound by two nucleotides at the 3’ end. The MID domain resembles a “pocket” structure (Figure 2) and binds to the 5’-terminal nucleotides, enabling AGO proteins to bind sRNAs. The PIWI domain has ribonuclease H (RNase H)-like activity, and the Asp-Glu-Asp-His/Asp (DEDH/D) amino acid quadruplex in its catalytic center coordinates with metal ions. The active center of AGO protein cleavage and catalysis and the RNA cleavage process require participation of the PIWI domain. However, not all AGOs are active [7-10].
Prokaryotic Argonaute proteins (pAGOs) can be divided into two branches according to their position on the phylogenetic tree. Most long pAGOs and eukaryotic AGOs (eAGOs) have the same domains. However, the thermophilic archaeal AGO protein (Archaeoglobus fulgidus, AfAGO) is an exception because its N-PAZ domain is missing. Short pAGOs have only MID and PIWI domains. However, eAGOs and pAGOs have similar catalytic effects which guide siRNA binding, target RNA recognition, cleavage, and release [11, 12]. The structure of the PAZ domain and C-terminal PIWI domain was first obtained using phylogenetic sequence analysis, then using three-dimensional structure analysis. Folding of the PAZ domain, Sm-fold domain, and oligosaccharide/oligonucleotide-binding-fold domain in Drosophila melanogaster AGO proteins involve connecting PAZ and Sm domains through a conserved site in the central part of the masked C-terminus [13-15].
1.2 Classification of AGO protein families
eAGOs can be divided into four types according to the different domains: AGO-like, PIWI-like, worm-specific Argonaute (WAGO) and Trypanosoma AGOs. pAGOs are further divided into long and short pAGOs. There are three classes of pAGOs and PIWI-RE proteins. All eAGOs can be divided into three main clades based on their structural characteristics and mechanism of action: AGO, PIWI, and WAGO [1, 12]. AGO clade proteins bind miRNA or siRNA and induce specific RNAi similar to Arabidopsis AGO1; PIWI clade proteins are only found in animals and bind with piRNA to regulate transposon activity; WAGO clade proteins specifically exist in nematodes such as Caenorhabditis elegans [12]. AGO-like and PIWI-like proteins are found in bacteria, archaea, and eukaryotes, indicating that both types of proteins have an ancient origin. However, the number of AGO genes varies in different species [15, 16]. For example, there are 8 AGO genes (4 AGO-like and 4 PIWI-like) in humans (Homo sapiens), 5 (2 AGO-like and 3 PIWI-like) in the D. melanogaster genome (PIWI-like), 1 AGO-like gene in Saccharomyces pombe, and at least 26 AGO genes in Caenorhabditis (5 AGO-like, 3 PIWI-like, and 18 Group3). Plants only have AGO-like proteins, and the number and functions of the AGO family are constantly expanding during evolution. The green alga Chlamydomonas rehardtii has 3 AGO proteins, while moss (Physcomitrella patens) has 6. The number of AGO proteins increases in angiosperms: 10, 15, 15, 17, and 19 in Arabidopsis thaliana, poplar (Populus L.), tomato (Solanum lycopersicum), maize (Zea mays), and rice (Oryza sativa), respectively. Plant AGO proteins can be divided into three main clades according to their phylogenetic relationship; AGO1/5/10, AGO2/3/7, and AGO4/6/8/9 are named after Arabidopsis AGOs. Rice and maize have evolved the AGO18 subclade belonging to the AGO1/5/10 clade. However, eukaryotic Saccharomyces cerevisiae does not have AGO proteins, which may be due to the secondary loss of the RNAi system [1, 6, 17].