1. Transcriptome of G. japonicum teliospores
G. japonicum teliospores were collected from infected juniper branches in the Jiangsu province (China). Telia were visible in the form of a dense fungal mass of dark orange colour breaking through the bark of the juniper branch (Figure 1). Teliospores were recovered by gently scratching the telia with a needle and inspection by light microscopy revealed typical two-cells spores attached to their pedicels and no contamination by plant material. Three biological replicates of G. japonicum teliospores were collected for total RNA extraction and were subjected to paired-end sequencing on Illumina Hiseq 2000 platform. In total, 49,245,136; 60,133,718; and 52,751,538 reads were obtained, respectively for the three replicates. After filtering low-quality reads and adapter sequences, we obtained a total of approximately 160 million clean reads (Additional file 1: Figure S1), which served as the input for an assembly with the Trinity program [23]. The clean reads of each replicate library were then mapped against this reference assembly and read count for each unigene was obtained using the RNA-Seq by Expectation Maximization (RSEM) software [24]. The mapping rates for each library were 82.9%, 84.8% and 84.7%, respectively (Additional file 1: Figure S1). The generated assembly was composed of 40,583 transcripts, with a mean length of 1,059-bp and 30,243 unigenes remained after elimination of transcripts redundancy (Additional file 1: Figure S1). A total of 15,722 CDS were determined from the unigenes using ESTscan and Blast against nr and swissprot as previously described [22]. Unigenes and transcript sequences are provided in the Additional files 2 and 3: supplementary data 1 and 2. To further clarify the gene expression abundance in each library, FPKM values (fragments per Kb per million mapped reads) were calculated for each unigene (Additional file 4: Table S1). The FPKM density distribution profiles reflected consistent expression pattern across replicates (Additional file 5: Figure S2). Average FPKM value of the three replicates was further considered for all unigenes as a representative transcriptome of G. japonicum teliospores (Additional file 4: Table S1). In order to perform comparison across the Gymnosporangium lineage, we used transcriptome data previously published for G. yamadae and G. asiaticum teliospores [22]. The collection time (stage and time of the year in the life cycle) and the stage of maturation of teliospores on the same host tree, J. chinensis, were identical for the three Gymnosporangium species. The treatment of sequencing data and the assembly of unigenes performed in the transcriptome study by Tao and collaborators in 2017 [22] and in the present study were strictly identical. The Figure S1 (Additional file 1) presents the sequencing metrics, and transcripts and unigenes assemblies which are all in very close range in term of numbers and unigenes length distribution. In some of the following sections, data from the three Gymnosporangium species were used for cross-comparison. In the G. japonicum teliospores transcriptome, 30,243 unigenes were identified, from which 15,722 CDS could be defined. These data served as a basis for annotation and comparison to transcriptomes from other Gymnosporangium spp.
2. Functional overview of genes expressed in G. japonicum teliospores.
To obtain a detailed description and a functional annotation of the G. japonicum teliospores transcriptome, six public databases including the NCBI non-redundant (nr) and nucleotide (nt) databases, protein family (Pfam), SwissProt, eukaryotic orthologous groups (KOG), gene ontology (GO), and Kyoto encyclopaedia of genes and genomes (KEGG) databases were selected for similarity searches using Basic Local Alignment Search Tools (blastx and tblastx; e-value < 1e-5). A total of 18,872 unigenes showed similarity to at least one database and 2,405 unigenes were annotated in all selected databases. The Table 1 gives the number of unigenes showing similarity in the seven databases and the details for each unigene can be retrieved in the Table S1 (Additional file 4). The nr database showed the largest number of homologs (13,165; 43.5%) for the G. japonicum unigenes set, which is detailed hereafter. Among these 13,165 unigenes, 41% and 25.1 % had a length ranging between 200 and 500bp, or larger than 2 Kbp, respectively (Figure 2A). Besides, 39% and 56% of the unigenes showed similarity levels higher than 80%, and between 50 and 80%, respectively (Figure 2B). Although about 45% of the 13,165 unigenes showed high homology e-values (ranging between 1e-50 to 1e-200), 45.2% of the unigenes with homology in nr had an e-value ranging between 1e-50 and 1e-10 (Figure 2C). More than half of the 13,165 unigenes (56%) showed homology to Pucciniales sequences, notably 5,521 (41.9%) and 1,770 (13.4%) unigenes matched with Puccinia graminis f. sp. tritici and M. larici-populina, respectively (Figure 2D). We also determined whether transposons were abundant in the dataset by comparing unigenes to Repbase (https://girinst.org/repbase/; blastx, e-value ≤ 10-5) and searching for transposon-related annotations in the results of the similarity search against nr. Only 75 and 138 unigenes presented similarity to repbase and transposon-related sequences in nr, respectively, indicating that transposons were marginal in the dataset.
Table 1. Annotation of Gymnosporangium japonicum unigenes in selected public databases.
|
Number of Unigenes
|
Percentage (%)
|
Annotated in NR
|
13165
|
43.5
|
Annotated in NT
|
8421
|
27.8
|
Annotated in KO (KEGG Ortholog)
|
5444
|
18
|
Annotated in SwissProt
|
11510
|
38
|
Annotated in PFAM
|
11624
|
38.4
|
Annotated in GO
|
12309
|
40.7
|
Annotated in KOG
|
7937
|
26.2
|
Annotated in all Databases
|
2405
|
8
|
Annotated in at least one Database
|
18872
|
62.4
|
Total Unigenes
|
30243
|
100
|
Homology with KOG and KEGG databases were scrutinized to determine gene functions and metabolic pathways expressed in G. japonicum teliospores (Additional file 6: Table S2). A total of 7,937 unigenes (26.3% of all unigenes) were assigned to 25 KOG functional categories. Unigenes falling in the “posttranslational modification, protein turnover, chaperones” category constituted the largest proportion (3.7%), followed by the category “translation, ribosomal structure and biogenesis” (3.4%). After genes placed in the category “general function prediction only”, a total of 2 and 1.9% of the expressed genes were classified in the “energy production and conversion” and “signal transduction mechanisms” categories, respectively. Of the 5,444 unigenes assigned to 32 KEGG pathways, the three most important pathways were “translation” (913 unigenes), “carbohydrate metabolism” (669 unigenes) and “signal transduction” (605 unigenes) (Additional file 6: Table S2).
The distribution of the mean expression levels of unigenes across the G. japonicum teliospores replicates revealed a large proportion of unigenes with a low expression value (e.g. 12,480 unigenes with a Log10FPKM value below 0; Additional file 7: Figure S3), which is similar to observations made in transcriptomes of other Gymnosporangium spp. [22]. On the other hand, 17,763 and 6,511 unigenes showed a Log10FPKM value above 0 and 1, respectively. A total of 977 unigenes showed a high expression level above 2 Log10FPKM, among which 65 showed a very high expression level above 3 Log10FPKM (Additional file 7: Figure S3). Among these, 44 (68%) unigenes had a homolog in the nr database (including 25 and 15 with the rust fungi P. graminis f. sp. tritici and M. larici-populina, respectively), and more than half of these unigenes encode hypothetical proteins with unknown function (Table 2). Two histones (GYJ|c28242_g1, GYJ|c12350_g1) and one glycoside hydrolase 26 (GH26, GYJ|c8249_g1) were among the most highly expressed unigenes. We searched all unigenes that belonged to teliospore-associated KOG categories based on previous transcriptomes for rust fungal teliospores [11, 21]. We found 26 unigenes encoding aquaporins, 3 unigenes encoding calcium transporting ATPase, 24 unigenes coding for pleiotropic drug resistance and 17 unigenes encoding multicopper oxidase laccase like proteins expressed in G. japonicum teliospores. Also, a large number of meiosis-related genes were found at this stage, including Dmc1, Spo11, Rec8 encoding meiotic recombination proteins; Mre1, Rad50, Rad51 coding for double-strand break repair proteins; the meiotic nuclear division protein Mnd1 and the meiotic cell division protein Dom34; and the meiotic checkpoint regulator Tsg24 (Additional file 4: Table S1). A total of 18,872 G. japonicum unigenes from the teliospores transcriptome were annotated according to diverse databases, allowing to derive subsets of highly expressed unigenes in functional categories.
3. Comparison of Gymnosporangium teliospore transcriptomes highlights commonly and specifically expressed genes.
To get a better understanding of the functions expressed in the transcriptomes of Gymnosporangium spp. teliospores, we compared the transcriptomes of G. japonicum, G. yamadae and G. asiaticum teliospores which correspond to the same life cycle stage set on a same tree, J. chinensis, but in different organs [2, 22]. Distributions of unigenes in KOG categories were very similar in the surveyed Gymnosporangium spore stage (Figure 3). In the three species, the number of unigenes without KOG functional annotation was predominant. Then, “posttranslational modification, protein turnover, chaperones” and “translation, ribosomal structure and biogenesis” categories were the most important annotated categories in the three species, followed by the categories “general function prediction only”; “energy production and conversion”; and “signal transduction mechanisms”. Since, G. japonicum teliospores are formed on woody tissues of the host tree J. chinensis, we gave a particular attention to CAZymes annotation, which are important enzymes involved in the plant cell wall decomposition [25]. For G. japonicum, G. yamadae and G. asiaticum, a total of 281, 284 and 297 unigenes were assigned to CAZymes families in the dbCAN database, respectively (Additional files 8 and 9: Table S3, Figure S4). Most CAZymes belonged to glycosyl hydrolases (GHs) families and only a few were grouped into carbohydrate binding modules families. Overall, the composition in each CAZyme type was relatively similar in G. japonicum compared with the two others (Additional file 9: Figure S4). Several unigenes belonging to the GH26 family, one from G. japonicum (GYJ|c4903_g1), two from G. yamadae (GYY|c14266_g2; GYY|c13067_g1) and four from G. asiaticum (GYA|c34482_g1; GYA|c1703_g1; GYA|c2116_g1; GYA|c20188_g1), were retrieved and in particular, one unigene from each species (GYJ|c4903_g1; GYY|c13067_g1 and GYA|c20188_g1) showed a higher expression level in teliospores. Other GHs from families 16, 17 and 18 showed high expression levels in the three Gymnosporangium spp. (Log10FPKM > 2; Additional file 8: Table S3), with different expression levels noticeable between members of these GH families in these three rust species. This result suggests that Gymnosporangium spp. may possess a common set of enzymes for plant cell wall decomposition and that expression reflects specific activities in infected host tissues. Overall, comparison of annotations of G. japonicum, G. yamadae and G. asiaticum unigenes indicates that the most highly represented functional KOG categories were similar in teliospores transcriptomes of the three rust species. Moreover, unigenes encoding CAZymes were similarly represented between Gymnosporangium spp. with particularly high expression profiles for specific GHs.
4. Comparison of Gymnosporangium teliospores predicted proteomes and secretomes identifies commonly and specifically expressed secreted proteins.
CDS were determined in G. japonicum unigenes based on similarity in databases, which cannot identify new specific proteins. TransDecoder predicted 15,049 proteins from G. japonicum unigenes (Figure 4). In total, 10,917 of these predicted proteins showed similarity in the nr database, which is similar to the number observed with CDS (Table 1). Predicted proteomes from G. asiaticum and G. yamadae were retrieved from [22] and compared by reciprocal best hits (BLASTp; e-value ≤ 1e-5). We found 5,947 proteins conserved in all three Gymnosporangium species (Figure 5; Additional file 10: Table S4). Moreover, 7,548 and 7,322 orthologous proteins were found between G. japonicum and G. asiaticum; and G. japonicum and G. yamadae, respectively. There were 6,249; 6,613 and 5,461 proteins specifically detected in G. japonicum, G. asiaticum and G. yamadae, respectively (Figure 5; Additional file 10: Table S4).
Rust fungi possess large secretomes, which contain candidate virulence effectors that play a key role in fungal pathogenicity [7]. These effectors are often small secreted proteins of unknown function. Secreted proteins (SPs) were identified in the transcriptome of G. japonicum using a dedicated pipeline of secretion predictors as previously described [26]. Among the 15,049 predicted proteins, 854 (5.67%) corresponded to SPs (Additional file 11: Table S5). The predicted SPs had a size varying between 99 and 1930 amino acids, with a median length of 215aa (Figure 6). Within SPs, four are homologues of the conserved rust transferred protein RTP1 [27]. Typical secreted proteins of CAZyme, protease and lipase categories were searched in dedicated databases. In summary, 20 SPs belonged to 18 CAZymes families, 60 SPs were annotated in 24 proteases families and six SPs were part of two lipases families (Figure 4). Among the top 65 highly expressed unigenes, 19 corresponded to predicted SPs and 16 showed similarity to M. larici-populina or P. graminis f. sp. tritici proteins, including a glycoside hydrolase and a protease (Table 2). The same secretome prediction pipeline was applied to the previously published transcriptomes of G. yamadae and G. asiaticum teliospores [22] to allow comparison within Gymnosporangium species (Additional file 11: Table S5). In total, 775 SPs were predicted for G. yamadae, with a size ranging from 99 to 1083 amino acids and a median size of 215. In G. asiaticum, 722 SPs were predicted, with a size ranging from 99 to 1981 amino acids, with a median length of 200 (Figure 4 and Figure 6; Additional file 11: Table S5). The size distribution for the secreted proteins of the three Gymnosporangium species is relatively uniform and small SPs account for the largest proportion (Figure 6). Some SPs were annotated as CAZymes (114 for G. yamadae and 93 for G. asiaticum), proteases (35 for G. yamadae and 38 for G. asiaticum) and lipases (not found in G. yamadae and 4 for G. asiaticum) using dedicated databases (Figure 4).
Based on the secretome prediction, unigenes of unknown function corresponding to SPs were classified in a manually annotated category termed “secreted protein” determined after [16]. A total of 577 G. japonicum unigenes were classified in this “secreted protein” KOG category, whereas, 480 and 504 G. asiaticum and G. yamadae unigenes were classified in the same KOG category, respectively. In the three Gymnosporangium spp., the “secreted protein” category is among the most abundant KOG categories after signal transduction, representing 1.9, 1.4 and 1.3 % of the respective unigenes (Figure 3).
To conclude, proteome and secretome predictions in the three Gymnosporangium teliospore transcriptomes identified hundreds of SPs of unknown functions that may represent putative candidate virulence effectors in each rust species. More particularly, many SPs are among highly expressed transcripts in teliospores and show similarity to other rust fungi. Interestingly, SPs of unknown function are similarly relatively abundant in the three Gymnosporangium spp.
5. Comparison of Gymnosporangium spp. transcriptomes with Pucciniales genomes supports taxonomic proximity to the Pucciniaceae.
In order to identify lineage-specific proteins in Gymnosporangium spp. and beyond within the Pucciniales, we compared the three available Gymnosporangium transcriptomes with deduced proteomes of selected basidiomycetes. Three other rust fungi were selected as representatives of different taxonomical families within the order Pucciniales for which a reference genome was available (i.e. the Cronartiaceae, the Melampsoraceae, and the Pucciniaceae) [28-30]. Microbotryum lychnidis-dioicae was selected as an outgroup fungal pathogen in the Microbotryomycetes, a sister class of Pucciniomycetes within the Pucciniomycotina [31]. A total of 1,042 single copy genes conserved across the selected Pucciniales genomes, the three Gymnosporangium spp. transcriptomes and the outgroup genome in Microbotryomycetes were predicted by BUSCO [32] and were used to build a maximum likelihood tree with RAxML [33] (Figure 7). Two rust clades were evident: C. quercuum f. sp. fusiforme clustered with M. larici-populina whereas the Gymnosporangium species were grouped with Puccinia graminis f. sp. tritici; G. japonicum branching more closely to G. yamadae than to G. asiaticum.
6. Comparison of Gymnosporangium spp. transcriptomes with Pucciniales identifies new specific rust fungal transcripts and putative candidate effectors
FastOrtho was used to conduct a Markov Cluster Algorithm (MCL) analysis of predicted proteomes of the seven selected fungal species [34]. Then, the specificity of proteins identified at any lineage level was assessed by comparison to the NCBI nr database (blastp, e-value ≤1e-5). A total of 1,514 clusters (10,117 proteins) were identified for the six Pucciniales included in this study (Figure 7; Additional file 12: Table S6). After filtering against the nr database, 1,970 proteins within 305 clusters were considered as Pucciniales-specific (Figure 7; Additional file 12: Table S6). To pinpoint protein families specific to the genus Gymnosporangium, we selected 1,905 clusters, representing a total of 6,010 proteins, which were only present in the three Gymnosporangium species after the MCL analysis (Figure 7; Additional file 12: Table S6). After filtering against the nr database, 204 clusters representing a total of 635 proteins, were deemed as specific of Gymnosporangium spp. and specifically expressed in the life cycle stage teliospores (Figure 7; Additional file 12: Table S6). Finally, at the species level, we found 180, 169 and 174 clusters (corresponding to 479, 403 and 467 proteins, respectively) that were specific to G. japonicum, G. yamadae and G. asiaticum, respectively (Figure 7; Additional file 12: Table S6). After comparison to the nr database, 205 orphan proteins were specific to G. japonicum, of which 14 were predicted to encode SP, representing putative specific candidate effectors for this species. Additionally, there were 152 and 200 orphan proteins found in G. yamadae and G. asiaticum, respectively (Figure 7; Additional file 12: Table S6). A total of 11 and 12 predicted secreted proteins were found in each of the later species, indicating candidate secreted effector proteins in G. yamadae and G. asiaticum, respectively. Previous genomic studies have revealed the presence of large secreted protein gene families in rust fungi that may represent diversifying candidate effector families [7, 35]. To identify such families within the three Gymnosporangium spp. transcriptomes, we selected clusters containing one or more SPs in each species (Additional file 13: Table S7). Among clusters with SPs, specific protein families containing only 3 to 4 SP members were retrieved (Additional file 13: Table S7). To conclude, the comparison of the teliospore transcriptomes with reference rust fungal genomes identified hundreds of orphans in each Gymnosporangium spp., as well as core proteins shared at the genus level or at the Pucciniales order, which informs us about the involvement of specific rust proteins at this stage of the life cycle.