Genome features
The genome of E. arachidis LNFT-H01 was sequenced (100 × coverage), high-quality sequencing data, a total of 6.28 Gb, was obtained and de novo assembled into 16 scaffolds (N50, 3,376,838 bp) with a total size of around 33.18 Mb by CANU, large than E. australis (23.34 Mb) (GenBank: NHZQ00000000) and Sphaceloma murrayae (20.72 Mb) (GenBank: NKHZ00000000). In which, sixteen scaffolds above 1Kb, the length is 33,184,353 bp and the longest scaffold is 4,426,246 bp.
The completeness of E. arachidis LNFT-H01 genome was evaluated to be > 99%, totally encoded 9174 protein genes similar to E. australis (9223) and more than S. murrayae (8256). The proportion of genes encoding secreted proteins in E. arachidis LNFT-H01 was 8.0% (734 proteins), the proportions of secreted proteins in E. arachidis was close to 7–10%. Sixteen scaffolds were displayed by circus-plot (Fig. 1), the gene density is 285 genes per 1 Mb and have 127 Non-coding RNA and 13 pseudogenes are predicted in the genome.
Phylogenetic Analysis And Collinear Analysis
Phylogenetic analysis shows that E. arachidis is close to Sphaceloma murrayae and E. australis(Fig. 2A). In addition, synteny analysis of E. arachidis genome with E. australis, reveals that E. arachidis highest synteny with E. australis. For example, scaffolds 1, 5, 6 and 17 of E. australis correspond well with the scaffold 1 of E. arachidis, scaffold 32 and 37show well synteny to the scaffold 10 of E. arachidis(Fig. 2B).
Repetitive DNA Sequences And Methylation Sites
In eukaryotic genomes, repetitive DNA sequences have a critical role in genes function and genome structure, meanwhile, the different types and the proportion of repetitive sequences in a genome are different between species.[48, 49] Among 16 scaffolds of E. arachidis LNFT-H01 genome, 7,033,311 bp repeat sequences were totally identified, such as DNA transposon and LTR retrotransposon (Table S2), which were accounts for 21.4% of the genome, in which, LTRs accounting for 78.46%.
DNA methylation is significant in epigenetic processes and cell processes [50, 51].The fungus genome contains a variety of DNA modifications, the most common of which are adenine methylation and cytosine methylation. 1,033,888 4 m-C (4-methyl-cytosine) and 28,762 6 m-A (6-methyl-adenosine) were identified in E. arachidis LNFT-H01 genome, additionally, m4C are the majority of methylation (97.3%), whereas m6A only 2.7%. As for methylase-specific motif based on DNA methylation, we identified by DNA polymerase kinetic information and detect the fungus-specific motif(Table S3).
Functional annotation of E. arachidis
Functional annotation analysis of E. arachidis, a total of 8644 of the 9174 encoded protein sequences annotated (94.22%). Among them, GO, KEGG and KOG analysis respectively annotated 3237, 3055 and 4958 genes, accounting for 35.28%, 33.30% and 54.04%, respectively.
In order to clarify the secondary metabolic pathway of E. arachidis, KEGG used to identify the biological pathway in E. arachidis. The results of KEGG annotation show that the substance metabolism in this pathogen is active, including not only the formation of nutrients such as amino acids and sugars, but also the synthesis of some secondary metabolites. 4958 genes of E. arachidis were assigned to 24 functional regions of KOG annotation, and the number of genes distributed in different KOG categories was significantly different (Fig S1). The functional regions of the gene that account for a high percentage of the annotated results are posttranslational modification, protein turnover, signal transduction mechanisms, carbohydrate transport and metabolism, amino acid transport and metabolism, and secondary metabolites biosynthesis. The functional genes involved in transport and catabolism are abundant, and there are many genes involved, including lipid transport and metabolism, transport and metabolism related genes such as ions and coenzymes.
In the GO analysis, 3237 genes were further divided into 42 GO functional classify in biological process, cellular component and molecular function (Fig S2). The proportion of catalytic activity and metabolic process is high, and detoxification and antioxidant activity related to pathogen self-detoxification are also noted. The annotation of these gene functions for further study of the secondary metabolic biosynthesis and transport process of toxins provides a rich data base of E. arachidis.
Gene Family Analysis
Analyses of the E. arachidis genome for pathogenicity proteins
In order to explore the potential pathogenic genes of E. arachidis, using the pathogen-host interaction database for Blastp alignment, 2,752 genes were screened from the E. arachidis genome, including secondary metabolic synthesis of key genes, cytochrome P450, ATP-binding cassette superfamily (ABC) transporter and Major Facilitator Superfamily (MFS) and other related genes, can be speculated on the complexity of the disease(Table S4).
Gene Associated With Detoxification
ESC, which produced by E. arachidis, can produce a large amount of active oxygen under light conditions. Active oxygen can act on cell membranes and destroy its structure. E. arachidis can also grow and develop in the case, indicating that it has a certain detoxification effect.
The MFS transporter and ABC transporter are the two largest families of fungal transporters[52–53]. A total of 57 ABC superfamily transporter genes and 190 MFS superfamily transporter genes were obtained from the genome of E. arachidis. In addition, the cytochrome P450 enzyme system is a multifunctional oxidoreductase. [54]. In the genome of E. arachidis, 78 cytochrome P450 enzymes were predicted, these may be involved in the synthesis and the detoxification of toxins.
The CAZyme
The carbohydrate-active enzymes secreted by pathogenic fungi are involved in the process of pathogen infection of host plant cells [57], which play an essential role in the decomposition of monosaccharides and polysaccharides, synthesis and modification of carbohydrates [58]. As one of the four major organic molecules in the organism, sugar metabolism is the center of the entire biological metabolism. It is not only an important structural component in the growth and development of plants, but also a signal molecule for communication between cells. Mapped E. arachidis genomes with CAZy database to detect the presence of CAZymes. 602 genes may be code carbohydrate-active enzyme (CAZymes) were defined (Table 2), including glycosyl transferases(114), carbohydrate esterases(109), glycoside hydrolases(271) and polysaccharide lyases(16).
Table 1
Gene annotation summary statistics
Genome features | |
Scaffold Number | 16 |
Scaffold Length (bp) | 33,184,353 |
Scaffold N50 (bp) | 3,376,838 |
Scaffold N90 (bp) | 2,306,82 |
Scaffold Max (bp) | 4,426,246 |
Gap total Length (bp) | 0 |
Genome assembly (Mb) | 33.18 |
Number of coding sequence genes | 9,174 |
Average Exons length | 641.1 |
Average Introns length | 101.9 |
Total Genes length | 15,965,221 |
CDSs Percentage of genome | 43.9427% |
GC Content (%) | 48.24 |
Secreted protein | 734 |
Transmembrane protein | 1,829 |
Theproteins withsignal peptide | 949 |
PHI | 2,752 |
TCDB | 124 |
Table 2
The carbohydrate-active enzymes
Classification | Number |
Glycoside Hydrolases (GHs) | 271 |
Polysaccharide Lyases (PLs) | 16 |
Carbohydrate Esterases (CEs) | 109 |
Glycosyl Transferases (GTs) | 114 |
Auxiliary activities (AAs) | 84 |
Carbohydrate-Binding Modules (CBMs) | 66 |
Additional Files |
FigS1 KOG annotation of E. arachidis. |
FigS2 GO annotation of E. arachidis. |
The cuticle and cell wall on the plant surface are the first barriers to prevent the invasion of pathogen. Most pathogen invade the host defense system by producing cutinase and cell wall degrading enzymes. Through, further analysis detect pectin lyase(15), cutinase(13) and cellulase(19), the function in pathogenitic process still futher to study.
Secondary Metabolite Gene Clusters
E. arachidis can produce secondary metabolites ESC[5]. Although ESC is an important pathogenic factor in E. arachidis, the core gene for ESC synthesis has not be clarified in E. arachidis. The genome of E. arachidis provides the possibility to find the core genes for ESC. To identify gene clusters responsible for biosynthesis of polyketides in E. arachidis, using antiSMASH2 to identify all secondary metabolites clusters. Totally own 86 predicted secondary metabolites clusters, including polyketide synthase (PKS), nonribosomal peptide synthetase (NRPS), NRPS-PKS hybrid and others. The number and distribution of coding genes contained in gene clusters are different.
Identification and analysis of PKS Genes in E. arachidis
To further clarify the PKS gene cluster that regulates the biosynthesis of ESC in E. arachidis, a total of 19 polyketide synthase protein sequences from different species were analyzed, and phylogenetic tree was constructed (Fig. 3). EVM0003759 is involved in ESC synthesis, namely ESCB1 (Elsinochrome Biosynthesis gene 1).
To further analyze the differences in different polyketide synthase genes, used InterProScan to analyze the conserved domain of polyketide synthase in E. arachidis. Visualized the protein domain architecture, the conserved domain of polyketide synthase was mapped using software DOG 2.0. The conserved domains of polyketide synthase in E. arachidis LNFT-H01 genome include AT, KS, DH, KR, MeT, ACP and ER(Fig. 4). Among them, EVM0004732, EVM0003759 and EVM0005880 all contain KS, AT, ACP and TE, and the distribution between different domains also as the same, only difference in sequence length. In addition to KS and AT, EVM0005988, EVM0002563, and EVM0006869 contain two others domains, ER and KR, which have reducing activity. Therefore, they are further classified into non-reduced PKS (EVM0004732, EVM0003759, EVM0005880) and reduced PKS depending on the type of domain they contain. (EVM0005988, EVM0002563, EVM0006869).
RT-qPCR analysis of ESCB1
The biosynthesis of ESC in E. arachidis was significantly different in different light condition[47]. Under light condition, the production of ESC was 16 nmol/plug, while in the dark, no synthesis of toxins was detected. To further clarify whether ESCB1 participate in the biosynthesis of ESC, the expression of ESCB1 under different light conditions was examined. Results showed that the expression of ESCB1 was the same as that of toxin production(Fig. 5).
Distribution of the ESCB1 gene cluster of ESC
Further analysis of the ESCB1 cluster, 13 putative ORFs were identified including ESCB1 (Fig. 6). The EVM0001135 and EVM0007299 encode a putative polypeptide similar to O-methyltransferase and have a FAD binding domain which is involved in a number of enzymes. The EVM0006582 and EVM0006794 encode a product which similarity to major facilitator superfamily transporter. Cytochrome P450 (EVM0002495) and zinc finger transcription factor (EVM0002638) are also on this gene cluster. The specific biological functions of related functional genes in this gene cluster of ESC biosynthesis need to be further studied.