The genome sequencing of the neotype strain CBS 554.65 yielded 5.3 Gbp in 287,000 subreads. The mean length was 18.4 Kbp for the longest subreads and half of the data was in reads longer than 29 Kbp. The assembly consisted of 17 contigs with a total of 40 Gbp and 55.2-fold coverage. Half of the size of the genome is comprised in 4 scaffolds (L50) of which the smallest has a length of 4.07 Mbp (N50). The GC content is 50.3%. The nuclear genome was annotated with Augustus, using the genome of the strain ATCC 1015 as reference. Based on this automated annotation 12,240 protein coding genes were predicted. In Table 1 some basic characteristics of the CBS 554.65 nuclear genome, calculated with CLC, are reported, in comparison to the characteristics of other three sequenced A. niger strains, CBS 513.88, ATCC 1015 and NRRL3, obtained from JGI.
The CBS 554.65 genome sequence has an increased quality compared to the sequences of the other strains, with a higher coverage, a higher N50 value and a lower L50 value. Compared to the other sequenced strains, CBS 554.65 appears to have a larger genome, while the GC content is similar in the 4 strains. For each of the 8 chromosomes, a putative centromeric region between 88 and 100 kb was identified. These are indicated in Figure 2 within two vertical black lines. These regions have a GC content between 17.1 and 18.4%, significantly lower than the GC content characterizing the total genome (50.3%) and do not contain any predicted ORF. The only exception is represented by the centromere of chromosome 1 in which a ORF of 219 nucleotides is annotated. This is found in a 7 kb region of the centromere with higher GC content compared to the GC content of the total centromere, suggesting the presence of a mobile element. A conserved domain search [25] on this sequence gave as hits CHROMO and chromo shadow domains (accession: cd00024), ribonuclease H-like superfamily domain (accession: cl14782), integrase zinc binding domain (accession: pfam17921), reverse transcriptase domain (accession: cd01647), RNase H-like domain found in reverse transcriptase (accession: pfam17919) and a retropepsin-like domain (accession: cd00303). The presence of the last four domains suggests that the analyzed sequence has a retroviral or a retrotransposon origin. Similar sequences with domains for reverse transcriptase were also found in the centromeres of chromosomes 5, 6 and 7. A blast analysis of the single chromosomes of strain CBS 554.65 against the complete genome of strain NRRL3 showed that the putative centromeres are almost completely lacking from the genome assembly of NRRL3 (Figure 2, grey areas in the blast graph). Although difficult to identify, centromeric regions in filamentous fungi are composed of complex and heterogeneous AT rich sequences which can stretch up to 450 kb [26,27]. Due to the likely presence of near-identical long repeats, centromeres are difficult to sequence and assemble [27] explaining why they are lacking in strain NRRL3. Transposon and retrotransposon have been identified in the centromeres of other eukaryotes, including fungi [26,28]. The blast analysis against NRRL3 showed that, besides the putative centromeric regions, other large regions constituting the genome of CBS 554.65 do not find homology in NRRL3, explaining the difference in size between the strains. To confirm that these unique regions are not artifacts, the sequencing reads of CBS 554.65 were remapped to the genome. 192,283 reads were remapped to the genome and the mean read length of the remapped reads was 15,215.97 (see total coverage graph in Figure S1, Additional file 3). High coverage was obtained also for the CBS 554.65 regions which are not found in NRRL3, such as those present in chromosome 4 (chr4_000001F) and chromosome 5 (chr5_000008F) (Figure S2, Additional file 3). Moreover, two PCR reactions could be successfully performed on the non-homologous region in chromosome 5 (indicated by the dots in chr5_000008F, Figure 2). Sequencing of the PCR products confirmed the sequence obtained by genome sequencing. The higher read length obtained with PacBio sequencing allows to cover also repetitive sequences which are probably missing from previous genome sequences of A. niger obtained with Illumina, explaining the observed difference in genome size. The number of protein-coding genes in CBS 554.65 is in range with what found in ATCC 1015 and NRRL3. The large difference in the protein-coding genes in strain CBS 513.88 is likely caused by overpredictions, as previously suggested [5]
Mitochondrial DNA
Many genome projects focused on the nuclear genome while the mitochondrial DNA is often neglected. In A. niger only one mitochondrial DNA (mtDNA) assembly has been reported, for the strain N909 [30]. In this study, the mtDNA of strain CBS 554.65 was de novo assembled from PacBio reads as a circular DNA with a length of 31,363 bp. MtDNA is abundant in whole genome sequencing projects and the read coverage of the assembly (average: 1,220 x, min: 328 x, max: 1,674 x) is thus higher than for the nuclear genome. In total 18 ORFs, 26 tRNA and 2 rRNA sequences were annotated (Fig. 3). All 15 core mitochondrial genes reported for Aspergillus species were identified with a comparable gene organization [31]. In addition, three accessory genes orf1L, orf3 and endo1 were annotated. The gene endo1 is located in the intron of cox1 and encodes a putative homing endonuclease gene belonging to the LAGLIDADG family frequently found in the cox1 intron of other filamentous fungi [31]. The gene orf3 encodes for a hypothetical protein of 191 residues, which is also present in the mtDNA of strain N909 but was not annotated there. Surprisingly this unknown protein has a good hit against an unknown protein of Staphylococcus aureus (99% identity), however not against other proteins of Aspergillus species. In A. niger strain N909 two other unknown proteins are encoded in orf1 and orf2. These two open reading frames are connected to a long one in A. niger CBS 554.65 yielding a potential protein product with 739 amino acid residues. This is comparable to an open reading frame located at the same position between nad1 and nad4 in the mtDNA of A. flavus NRRL 3357 (AFLA_m0040), with a size of 667 amino acid residues. In the N-terminal region of both putative proteins, transmembrane spanning regions can be predicted supposing a location in a mitochondrial membrane, however the C-terminal regions are not conserved between A. niger and A. flavus protein. It is suggested to use the mitochondrial assembly of CBS 554.65 as a reference sequence for A. niger mitochondria because it is known that strain N909 is resistant to oligomycin. This resistance is typically linked to mutations in the mtDNA, either in atp6 or atp9, and indeed two mutations are found in atp6 of strain N909 (L26W and S173L).
Discovery and sequencing of a MAT1-2 A. niger strain
The genome sequencing and analysis of strain CBS 554.65 allowed to determine the mating-type of this strain. The sequence of the putative MAT1-2-1 gene (g9041) was searched in the whole nucleotide database using BlastN, giving as hits the mating-type HMG-box protein MAT1-2-1 of other aspergilli, including A. neoniger (with an identity of 93.25%) and A. tubingensis (with an identity of 93.07%). As such, we consider gene g9041 to be homologous to the MAT1-2-1 gene of other Aspergillus species.
This is in line with a previous study which indicated the presence of a MAT1-2-1 sequence in the CBS 554.65 strain through a PCR approach [8]. Here we report the first complete genome sequence of an A. niger strain having a MAT1-2-1 gene. The availability of this genome sequence represents an important tool for further studies investigating the sexual potential of A. niger. The presence of both opposite mating-type genes in different strains belonging to the same species represents a strong hint of a sexual lifestyle [10].
MAT1-2 locus analysis and comparison to MAT1-1
The locus of strains CBS 554.65 containing the MAT1-2-1 gene was compared in silico to the locus of strain ATCC 1015 containing the MAT1-1-1 gene, to determine whether the genes flanking the MAT1-1-1 gene are also present in the genome of the MAT1-2 strain and vice versa. A region of 40,517 bp, spanning from gene Aspni7|39467 (genomic position 2504615 in the v7 of the ATCC 1015 genome) to gene Aspni7|1128148 (genomic position 2545131) was aligned to the corresponding region of strain CBS 554.65 (Fig. 4). In CBS 554.65 the two genes homologous to Aspni7|39467 (g9051) and Aspni7|1128148 (g9038) are comprised in a sequence of 43,891 bp, almost 4 kb longer than in ATCC 1015. In Fig. 4 genes found in both strains are indicated with a box of the same color, MAT genes are indicated with a circle and genes which are not surrounded by a box or a circle are unique in each strain. The green lines below the genomic region of each strain indicate the sequences homologous in both strains while the black dotted lines indicate the sequences that do not find homology in the other strain. The gene identifiers are indicated on top of each gene and additionally reported in Table 2, with the indication of their predicted function, retrieved from FungiDB or blast analysis. The alignment shows that the MAT genes occupy the same genomic location at chromosome 7. The genes comprised in the analyzed loci are mostly conserved between the two strains, with the exception of genes Aspni7I1178859 (MAT1-1-1), Aspni7I1128137 and Aspni7I1160288, unique for ATCC 1015, and g9046, g9041 (MAT1-2-1) and g9040-2 (MAT1-2-4), unique for CBS 554.65. Aspni7I1128137 has predicted metal ion transport activity and it is found in other Aspergillus species, either heterothallic with a MAT1-1-1 or a MAT1-2-1 gene or homothallic, and not in proximity of the MAT gene, with the exception of A. brasiliensis and A. ochraceoroseus. Aspni7I1160288 has a domain with predicted role in proteolysis and its homolog in other aspergilli is present at another genomic locus, not in proximity of the MAT gene. A homolog of gene g9046 was found by BlastN search in Aspergillus vadensis, in a different location of the genome than the MAT locus. These results suggest that these unique genes are likely not part of the “core” MAT locus. The gene g9040-2 is a putative homolog of the MAT1-2-4 gene in A. fumigatus, an additional mating-type gene required for mating and cleistothecia formation [32]. Another difference between ATCC 1015 and CBS 554.65 is represented by the gene putatively encoding for a HAD-like protein. While this gene is complete in CBS 554.65 (g9045), it appears disrupted in ATCC 1015 and, therefore, doubly annotated in this strain (Aspni7|1095364 and Aspni7|1128138). The other genes present in the selected genomic region show a high level of conservation, with a higher syntheny further away from the MAT genes (genes in the purple and blue boxes). Moreover, genes encoding for the DNA lyase apnB, the cytoskeleton control assembly factor slaB and the anaphase promoting complex apcE are present in both MAT loci. These genes are normally found in the MAT loci of other fungi, including yeast [17], and their presence in the MAT loci of A. niger further confirms the high level of conservation characterizing this locus. In heterothallic ascomycetes the MAT genes are commonly included between the genes apnB and slaB [17]. From the alignment in Fig. 4 the relative position of the MAT genes to apnB and slaB can be analyzed. In CBS 554.65 the MAT1-2-1 gene (g9041) is flanked by apnB and slaB respectively upstream and seven genes downstream. In contrast, in the MAT1-1 locus of strain ATCC 1015 the MAT gene is flanked downstream by apnB and upstream by a conserved sequence including adeA, while slaB is found on the same side of apnB. The entire genomic locus, containing the MAT1-1-1 gene and eight other genes (23 kbp indicated by the red arrow in Fig. 4), shows a flipped orientation compared to the corresponding locus in CBS 554.65 containing the MAT1-2-1 gene (indicated by an orange arrow in Fig. 4). The ORF direction of the conserved genes apnB, coxM and apcE additionally confirms the different orientation of this locus in the two strains. By sequence analysis, a repetitive 7 bp DNA motif (5´-TTACACT) was found in the MAT1-1 locus (orange triangles in Fig. 4), where the homology between the MAT1-1 and MAT1-2 loci breaks (in proximity to adeA and slaB). An additional site of this motif was found in the gene encoding a HAD-like hydrolase (Aspni7I1128138). This motif is present at similar positions in two other sequenced MAT1-1 strains of A. niger (N402, CBS 513.88). In contrast, the MAT1-2 strain presents this motif only at the site close to the adeA gene and in the putative HAD-like hydrolase gene (g9045), but not at the site close to the slaB gene.
Table 2
List of genes included in the genomic region comprising the MAT genes.
ATCC 1015
|
CBS 554.65
|
Predicted function retrieved from FungiDB or blast
|
Aspni7|39467
|
g9051
|
Hypothetical protein
|
Aspni7|1167974
|
g9050
|
CIA30-domain containing protein – Ortholog(s) have role in mitochondrial respiratory chain complex I assembly
|
Aspni7|1225150
|
g9049
|
SAICAR synthetase (adeA)
|
Aspni7|1187920
|
g9048
|
Homolog in CBS 513.88 has domain(s) with predicted catalytic activity, metal ion binding, phosphoric diester hydrolase activity
|
Aspni7|39471
|
g9040-1
|
Hypothetical protein
|
Aspni7|1178859
|
-
|
Mating-type protein MAT1-1-1
|
Aspni7|1187921
|
g9042
|
DNA lyase Apn2|Hypothetical protein
|
Aspni7|1147272
|
g9043
|
Hypothetical cytochrome C oxidase|Mitochondrial cytochrome c oxidase subunit VIa
|
Aspni7|1187923
|
g9044
|
Ortholog(s) are anaphase-promoting complex proteins
|
Aspni7|1128137
|
-
|
Homolog in CBS 513.88 has domain(s) with predicted metal ion transmembrane transporter activity, role in metal ion transport, transmembrane transport and membrane localization
|
Aspni7|1095364
|
g9045
|
HAD-like protein; Homolog in CBS 513.88 has domain(s) with predicted hydrolase activity
|
Aspni7|1128138
|
g9045
|
HAD-like protein; Homolog in CBS 513.88 has domain(s) with predicted hydrolase activity
|
Aspni7|1187925
|
g9047
|
Glycosyltransferase Family 8 protein - Ortholog(s) have acetylglucosaminyltransferase activity, role in protein N-linked glycosylation and Golgi medial cisterna localization
|
Aspni7|1160288
|
-
|
Aspartic protease|Hypothetical aspartic protease
|
Aspni7|39480
|
g9040
|
WD40 repeat-like protein
|
Aspni7|1187926
|
g9039
|
Aldehyde dehydrogenase
|
Aspni7|53077
|
g9038
|
CoA-transferase family III
|
Aspni7|1187928
|
g9037
|
Salicylate hydroxylase
|
Aspni7|1128148
|
g9036
|
Cytoskeleton assembly control protein Sla2
|
-
|
g9046
|
Hypothetical protein
|
-
|
g9041
|
Mating-type HMG-box protein MAT1-2-1
|
-
|
g9040-2
|
Hypothetical protein – Putative homologue of MAT1-2-4 of A. fumigatus
|
Methods to identify the opposite mating-type in natural isolates often rely on the use of primers designed to bind to apnB and slaB, since these are the genes that commonly flank the MAT gene itself [33,34]. In both mating-type A. niger strains, slaB is found more than 12 kbp far from the MAT gene and this might help explaining why the MAT1-2 locus was never previously described for this species, with only one study mentioning it [8].
Not only the particular orientation of the MAT locus but also the presence of a repetitive motif in the MAT loci suggest that a genetic switch or a flipping event might have occurred or is still ongoing in A. niger, which might affect the expression of the MAT genes. Genetic switching events at the MAT locus are known for other ascomycetes, particularly yeasts. For instance, in S. cerevisiae a switching mechanism involving an endonuclease and two inactive but intact copies of the MAT genes allows to switch the MAT type of the cell [35]. Expression of the MAT gene is instead regulated in the methylotrophic yeasts Komagataella phaffii and Ogataea polymorpha via a flip/flop mechanism [36,37]. In these species, a 19 kbp sequence including both mating type genes is flipped so that a MAT gene will be close to the centromere (5 kbp from the centromere) and, therefore, silenced while the other will be transcribed. In CBS 554.65 the region comprising the MAT1-2-1 gene is present at around 280 kbp downstream of the putative centromere, which is much further away of what observed for K. phaffi and O. polymorpha. However, in certain basidiomycetes, such as Microbotryum saponariae and Microbotryum lagerheimii, the mating-type locus HD (containing the homeodomain genes) is around 150 kbp distant from the centromere and linked to it [38]. It was proposed that the proximity to the centromere in these species might be enough to reduce recombination events [38]. The effect of the distance between the centromere and the MAT genes in A. niger merits further attention, especially in view of a potential sexual cycle happening in this species.
Inversion at the MAT locus have been described for certain homothallic filamentous fungi such as Sclerotinia sclerotiorum and Sclerotinia minor [39,40]. Field analysis of a large number of isolates showed that strains belonging to these species can either present a non-inverted or an inverted MAT locus. In the inverted orientation two of the four MAT genes at the locus have the opposite orientation and one gene is truncated. In the case of S. sclerotiorum, differences in the gene expression were observed between inverted and non-inverted strains. This inversion, induced by crossing-over between two identical inverted repeat present in the locus, likely happens during the sexual cycle before meiosis [39]. The analysis of a larger number of A. niger natural isolates is required to investigate whether opposite orientations of both MAT loci exist for this species as well and what the implications of such inversions might be. Chromosomal inversions are considered to prevent recombination between sex determining genes in higher eukaryotes, such as animals and plants [41]. Further studies are therefore required to investigate whether a mechanism similar to those already described in other fungal species is also happening in A. niger, which might help to explain the difficulty in finding if this species can bear a sexual cycle.
Genetic comparison of MAT loci in different aspergilli and additional A. niger strains
Due to the particular configuration observed in this study for the MAT1-1 locus of strain ATCC 1015, the orientation of the MAT locus of additional Aspergillus species for which a genome sequence is available was analyzed (Table 3). Firstly, the genes adeA and slaB were retrieved because they are conserved and often found at the right and left flank of the MAT gene, respectively (Fig. 4). Subsequently, the position of the MAT gene was checked in comparison to the three conserved genes apnB, coxM and apcE. The MAT gene could be either included between adeA and apnB, like in ATCC 1015 (flipped position), or between apnB and slaB, like in CBS 554.65 (conserved position). The results of this analysis are reported in Table 3. A complete table with the identifiers of all genes analyzed is reported in the Additional file 4.
Table 3
MAT gene identifiers of the analyzed Aspergillus strains and their position in the MAT locus.
Section
|
Species
|
Strain
|
Mating-type gene - MAT
|
Mating-type
|
MAT position
|
Sexual cycle described for the species
|
Nigri
|
A. welwitschiae
|
CBS 139.54
|
172181
|
MAT1-1
|
flipped
|
No
|
A. kawachii (A. luchuensis)
|
IFO 4308
|
AKAW_03832
|
MAT1-2
|
conserved
|
No
|
A. luchuensis
|
106.47
|
ASPFODRAFT_180958
|
MAT1-1
|
conserved
|
No
|
A. tubingensis
|
G131
|
Not annotated
|
MAT1-2
|
conserved
|
Yes [42]
|
CBS 134.48
|
ASPTUDRAFT_124452
|
MAT1-1
|
conserved
|
A. niger
|
CBS 554.65
|
g9041
|
MAT1-2
|
conserved
|
No
|
ATCC 1015
|
ASPNIDRAFT2_1178859
|
MAT1-1
|
flipped
|
A. brasiliensis
|
CBS 101740
|
ASPBRDRAFT_167991
|
MAT1-2
|
flipped
|
No
|
A. carbonarius
|
ITEM 5010
|
ASPCADRAFT_1991
|
MAT1-2
|
conserved
|
No
|
A. aculeatus
|
ATCC 16872
|
ASPACDRAFT_1867751
|
MAT1-2
|
conserved
|
No
|
Nidulantes
|
A. versicolor
|
CBS 583.65
|
ASPVEDRAFT_82222
|
MAT1-2
|
conserved
|
No
|
A. sydowii
|
CBS 593.65
|
ASPSYDRAFT_87884
|
MAT1-2
|
conserved
|
No
|
Ochraceorosei
|
A. ochraceoroseus
|
IBT 24754
|
P175DRAFT_0477739
|
MAT1-1
|
conserved
|
No
|
Flavi
|
A. flavus
|
NRRL 3357
|
AFLA_103210
|
MAT1-1
|
conserved
|
Yes [43]
|
A. oryzae
|
BCC7051
|
OAory_01101300
|
MAT1-2
|
conserved
|
No
|
RIB40
|
AO090020000089
|
MAT1-1
|
conserved
|
Circumdati
|
A. steynii
|
IBT 23096
|
P170DRAFT_349471
|
MAT1-2
|
conserved
|
No
|
Candidi
|
A. campestris
|
IBT 28561
|
P168DRAFT_313902
|
MAT1-1
|
conserved
|
No
|
P168DRAFT_285957
|
MAT1-2
|
conserved
|
Terrei
|
A. terreus
|
NIH2624
|
ATEG_08812
|
MAT1-1
|
conserved
|
Yes [44]
|
Fumigati
|
A. novofumigatus
|
IBT 16806
|
P174DRAFT_462167
|
MAT1-2
|
conserved
|
No
|
A. fischeri
|
NRRL 181
|
NFIA_071100
|
MAT1-1
|
conserved
|
Yes [45]
|
NFIA_024390
|
MAT1-2
|
conserved
|
A. fumigatus
|
Af293
|
Afu3g06170
|
MAT1-2
|
conserved
|
Yes [46]
|
A1163
|
AFUB_042900
|
MAT1-1
|
conserved
|
AFUB_042890
|
MAT1-2
|
conserved
|
Clavati
|
A. clavatus
|
NRRL1
|
ACLA_034110
|
MAT1-1
|
conserved
|
Yes [47]
|
ACLA_034120
|
MAT1-2
|
conserved
|
Aspergillus
|
A. glaucus
|
CBS 516.65
|
ASPGLDRAFT_89185
|
MAT1-1
|
n.a.1
|
Yes [48, 49]
|
Cremei
|
A. wentii
|
DTO 134E9
|
ASPWEDRAFT_184745
|
MAT1-2
|
conserved
|
No
|
1 Conserved genes not in the MAT locus
|
Table 3. MAT genes included between adeA and apnB have a flipped orientation while MAT genes included between apnB and slaB have a conserved orientation. Aspergillus species are grouped in sections based on the most updated classification [50]. For each species it is indicated if a sexual cycle was reported.
In the analyzed Aspergillus sequences the MAT gene (either MAT1-1-1 or MAT1-2-1) was mostly found between the genes apnB and slaB, such as in CBS 554.65 (conserved). The only exceptions, showing a configuration similar to the MAT1-1 locus of ATCC 1015, were the MAT1-1-1 gene of A. welwitschiae and the MAT1-2-1 gene of A. brasiliensis. This analysis could not be performed on the MAT1-2 locus of A. welwitschiae nor on the MAT1-1 locus of A. brasiliensis, since sequences are not available. Seven of the analyzed species, including the closely related A. tubingensis, were reported to bear a sexual cycle. For all of these species, with the exception of A. glaucus, for which the conserved genes were not found in proximity of the MAT gene, the conserved position of the MAT gene was observed. These observations suggest that the position of the MAT gene and the orientation of the locus are critical for sexual development to occur.
Since the orientation observed for the MAT1-1 locus of ATCC 1015 might be peculiar for this A. niger strain only, additional analyses were performed to determine the orientation of the MAT locus of other 4 available sequenced strains of A. niger (CBS 513.88, N402, ATCC 13496, NRRL3) and of natural isolates obtained from various sources. All the A. niger strains sequenced so far contain a MAT1-1-1 gene and showed the same orientation of the MAT locus observed in ATTC 1015. In addition, 24 natural isolates of A. niger were sequenced and the MAT loci analyzed: 12 contain the MAT1-1 locus and 12 the MAT1-2 locus. The MAT locus configuration of these strains is comparable to the configuration of strain ATCC 1015, in the case of the MAT1-1 strains, and to CBS 554.65, in the case of at least 10 out of 12 MAT1-2 strains. In the two remaining MAT1-2 strains (CBS 118.52 and DTO 175-I5) a gap between two genomic scaffolds could not be closed by PCR, probably because constituting of a region with multiple Gs repeats. However, when aligning the two separate scaffolds of these isolates to the MAT1-2 locus of CBS 554.65, they appear to have the same locus configuration as the other 10 MAT1-2 isolates. Similarly, to what observed for ATCC 1015 and CBS 554.65, the HAD-like protein encoding gene appears disrupted in all the MAT1-1 strains and complete in all the MAT1-2 strains. Further studies are required to investigate whether the disruption of this gene in the MAT1-1 strains plays a role in the context of fungal development. Overall, the MAT 1–1 configuration described in Fig. 4 is a peculiar feature of A. niger and its close relative A. welwitschiae. Despite showing this unusual orientation, the presence of a 1:1 MAT1-1:MAT1-2 ratio among 24 randomly selected natural A. niger isolates is an important observation, which suggests that sexual reproduction is occurring in this species. Moreover, A. niger was previously shown to be able to form sclerotia [51–54], important prerequisite for sexual development in closely related species. Therefore, further research should focus on the possibility to efficiently induce a sexual cycle in A. niger.