CAZyme comparison with other fungi
Plant biomass-degrading and other CAZymes are catalogued into families and subfamilies in the Carbohydrate Active enZymes (CAZy) database (http://www.cazy.org/) (52).The number of CAZy domains and distribution across different CAZy families in T. terrestris LPH172 was analysed and compared to other known fungal biomass degraders to assess the propensity for lignocellulose degradation (Table 3). Note that by 'CAZymes' in this article we mean individual CAZyme domains. In total, 411 individual CAZy domains were detected in LPH172 using dbCAN2 (HMMER algorithm). Most CAZy domains were found to be GHs (201 candidates), with GH16 (n = 14), GH18 (n = 15), GH3 (n = 12), and GH47 (n = 10) being the most abundant subfamilies. There were also 86 glycosyl transferases (GTs), 4 polysaccharide lyases (PLs), 26 carbohydrate esterases (CEs), 83 AAs, and 11 carbohydrate-binding modules (CBMs). Compared to strain NRRL 8126, two more GHs (one GH16 and one GH47) were identified in LPH172, as well as one additional AA12, one GT2, and one CE1 (Additional File 3). T. terrestris LPH172 had a relatively low number of PLs compared to other fungi (Fig. 2), but a larger complement of AA family enzymes, particularly AA9 (n = 18), AA8 (n = 3), and AA7 (n = 20) (Fig. 3). Five members of AA11 (chitin-cleaving) LPMOs were detected in both T. terrestris strains, but no AA13 (starch-cleaving LPMOs) or AA14 (xylan-cleaving LPMOs) members were observed. LPH172 and NRRL 8126 were the only fungi, among the ones selected, presenting an AA16, a recently characterised C1-hydroxylating LPMO (53). The number of multidomain CAZymes was low; only 15 LPH172 proteins had two predicted CAZy domains, and one had three (Additional File 3).
Putative candidates for CAZymes capable of degrading all major lignocellulosic polymers (cellulose, xylan, xyloglucan, (galacto)glucomannan, pectin, and lignin), as well as starch, inulin, and chitin were found. This finding was in line with growth of T. terrestris on all of these carbon sources (Additional File 2).
Table 3
Comparison of the number of CAZy domains in T. terrestris LPH172 and other filamentous fungi.
|
GH
|
GT
|
PL
|
CE
|
AA
|
CBM
|
Total
|
Aspergillus oryzae
|
292
|
92
|
26
|
31
|
96
|
18
|
555
|
Myceliophthora thermophila
|
185
|
75
|
9
|
26
|
66
|
9
|
370
|
Malbranchea cinnamomea
|
118
|
59
|
4
|
14
|
37
|
5
|
237
|
Thielavia terrestris LPH172
|
201
|
86
|
4
|
26
|
83
|
11
|
411
|
Thielavia terrestis NRRL 8126
|
199
|
85
|
4
|
25
|
82
|
11
|
406
|
Gloeophyllum trabeum
|
186
|
64
|
9
|
19
|
57
|
6
|
341
|
Podospora anserina
|
215
|
82
|
7
|
45
|
128
|
15
|
492
|
Schizophyllum commune
|
239
|
73
|
17
|
37
|
83
|
16
|
465
|
Rhizomucor pusillus
|
97
|
99
|
2
|
24
|
17
|
2
|
241
|
Rhizopus oryzae
|
90
|
118
|
4
|
31
|
16
|
7
|
266
|
GH, glycoside hydrolase; GT, glycoside transferase; AA, auxiliary activity; CE, carbohydrate esterase; PL, polysaccharide lyase; CBM, carbohydrate-binding module. All CAZy domains were identified using dbCAN2 (HMMER algorithm).
Regulation of plant cell wall-degrading enzymes
Regulation of (hemi)cellulolytic enzymes in filamentous fungi occurs mainly at the transcriptional level (54–56). Here, we used BLASTn and BLASTp to detect possible homologues of known transcription factors (TFs) from regulatory cascades recorded in other filamentous fungi. TF genes related to lignocellulose degradation in T. terrestris LPH172 included transcriptional (hemi)cellulase activator XYR1/XLNR1 (TT_07823), cellulase activators Clr-1 (TT_06796) and Clr-2 (TT_06838), known carbon-catabolite repressor CreA (TT_07794), cellulase repressor ACE1 (TT_01416), and arabinose-responsive Ara1 (TT_09773). A homology search revealed the presence of positive cellulase regulator McmA (TT_02138), C-derepressing VIB1 (TT_03515), and Hap-complex protein Hap5 (TT_04392).
Transcriptome analysis
Highly expressed genes on Avicel, rice straw, and beechwood xylan
To verify genome annotation and analyse gene expression, in particular CAZyme-encoding gene expression, the transcriptome was analysed under different growth conditions. The fungus was grown in shake flasks on four substrates—glucose, Avicel, rice straw and beechwood xylan—and total mRNA was extracted and sequenced. Glucose was chosen as reference monosaccharide because its degradation involves a limited number of CAZymes and should, therefore, reflect expression of mostly constitutive genes. Beechwood xylan, comprising a xylan backbone with 4-O-methyl glucuronic acid side groups, was selected to detect CAZymes required for hardwood hemicellulose degradation (43). Rice straw, which contains approximately 12% lignin, 38% cellulose, and 25% hemicellulose (57) was chosen to represent a complex, heterogeneous substrate requiring a large array of different CAZymes for degradation. Importantly, rice straw has also vast potential as feedstock in biorefinery applications. Finally, Avicel, which is up to 98% cellulose (58, 59), was selected to identify enzymes required to degrade a highly crystalline and recalcitrant cellulosic substrate. Transcriptome data from RNAseq experiments were used to refine gene annotation through ab initio training with GeneMark v4.3 and an evidence-guided build with MAKER package v3.01.1. Results are summarized in Tables 4–6.
Table 4
Forty most highly expressed genes during T. terrestris LPH172 growth on Avicel.
Transcript ID
|
fpkm
|
CAZy domain(s)
|
Putative function
|
TT_06621
|
6586
|
-
|
NA
|
TT_06050
|
6458
|
-
|
NA
|
TT_00578
|
4005
|
-
|
Respiratory supercomplex factor 2 homolog
|
TT_05797
|
3343
|
GH7-CBM1
|
Endoglucanase
|
TT_03518
|
2876
|
-
|
NA
|
TT_08370
|
2417
|
AA9
|
Endo-β-1,4-glucanase
|
TT_06655
|
2353
|
GH6
|
1,4-β-D-glucan cellobiohydrolase
|
TT_03075
|
2251
|
GH11-CBM1
|
Endo-1,4-β-xylanase
|
TT_06499
|
2110
|
CBM1
|
Feruloyl esterase
|
TT_09215
|
2012
|
-
|
Lactose permease
|
TT_00215
|
1665
|
-
|
Oxidoreductase
|
TT_07455
|
1622
|
AA9
|
LPMO
|
TT_07008
|
1467
|
-
|
NA
|
TT_08166
|
1450
|
CE5-CBM1
|
Acetylxylan esterase
|
TT_05599
|
1337
|
-
|
Mitochondrial oxidase
|
TT_00225
|
1326
|
AA4
|
Vanillyl-alcohol oxidase
|
TT_10132
|
1309
|
-
|
Cytochrome c
|
TT_09465
|
1232
|
-
|
Cross-pathway control protein 1
|
TT_09870
|
1089
|
-
|
Protein FDD123
|
TT_05357
|
1073
|
-
|
Acyl-CoA desaturase
|
TT_06750
|
1049
|
-
|
NA
|
TT_00529
|
1019
|
-
|
NA
|
TT_07123
|
940
|
-
|
NA
|
TT_01736
|
931
|
AA9
|
LPMO
|
TT_04350
|
928
|
AA9-CBM1
|
LPMO
|
TT_03837
|
880
|
-
|
5'-AMP-activated protein kinase subunit
|
TT_05536
|
815
|
-
|
Elongation factor 3
|
TT_09000
|
790
|
GH45
|
Endoglucanase
|
TT_04380
|
777
|
AA3_1-AA8
|
Cellobiose dehydrogenase
|
TT_06689
|
755
|
-
|
Inositol oxygenase
|
TT_00703
|
682
|
-
|
SDO1-like protein
|
TT_01019
|
679
|
GH5_5
|
Endoglucanase
|
TT_10041
|
665
|
-
|
Actin-related protein
|
TT_00207
|
640
|
-
|
Voltage-gated potassium channel subunit
|
TT_03870
|
630
|
-
|
Multiprotein-bridging factor
|
TT_09312
|
613
|
-
|
Protein vip1
|
TT_05898
|
611
|
-
|
NA
|
TT_06609
|
609
|
-
|
Uncharacterized protein C32A11.02c
|
TT_07036
|
571
|
-
|
Transcriptional regulatory protein
|
TT_08478
|
557
|
-
|
Histone H2B
|
Fpkm values indicate average normalized transcript levels from three replicates. CAZy domains were predicted by dbCAN2 and functions were annotated by BLASTp search against the UniProt/Swiss-Prot reference dataset. NA, not annotated.
Table 5
Forty most highly expressed genes during T. terrestris LPH172 growth on rice straw.
Transcript ID
|
fpkm
|
CAZy domain(s)
|
Putative function
|
TT_10132
|
10224
|
-
|
Cytochrome c
|
TT_06693
|
8469
|
-
|
Stress protein DDR48
|
TT_06050
|
7056
|
-
|
NA
|
TT_06689
|
4893
|
-
|
Inositol oxygenase
|
TT_08478
|
4739
|
-
|
Histone H2B
|
TT_02247
|
3189
|
-
|
Mitochrondrial valine–tRNA ligase
|
TT_00469
|
2782
|
-
|
60S ribosomal protein
|
TT_09215
|
2701
|
-
|
Lactose permease
|
TT_01345
|
2474
|
-
|
40S ribosomal protein
|
TT_02932
|
2465
|
-
|
60S ribosomal protein
|
TT_01839
|
2461
|
GH11
|
Endo-1,4-β-xylanase
|
TT_01967
|
2044
|
-
|
60S ribosomal protein
|
TT_01009
|
1893
|
-
|
NA
|
TT_04612
|
1849
|
-
|
40S ribosomal protein
|
TT_00107
|
1730
|
-
|
NA
|
TT_02482
|
1678
|
-
|
NA
|
TT_02213
|
1642
|
-
|
Elongation factor 1-α
|
TT_01072
|
1633
|
-
|
60S ribosomal protein
|
TT_07670
|
1577
|
-
|
Peptide chain release factor 1
|
TT_00703
|
1534
|
-
|
SDO1-like protein C21C3.19
|
TT_02715
|
1465
|
-
|
NA
|
TT_02172
|
1448
|
-
|
Translation initiation factor
|
TT_06668
|
1436
|
-
|
Hedgehog-interacting protein
|
TT_00918
|
1410
|
-
|
Mitochondrial eptidyl-prolyl cis-trans isomerase
|
TT_02974
|
1404
|
-
|
Heat shock protein
|
TT_08110
|
1381
|
-
|
Glycogen phosphorylase
|
TT_01225
|
1358
|
-
|
THO complex subunit 4A
|
TT_09947
|
1341
|
-
|
Mitochondrial phosphate carrier protein
|
TT_02608
|
1327
|
-
|
DNA-binding protein
|
TT_02868
|
1321
|
-
|
60S ribosomal protein
|
TT_01583
|
1318
|
-
|
40S ribosomal protein
|
TT_05389
|
1233
|
-
|
Ubiquitin-60S ribosomal protein
|
TT_04052
|
1181
|
-
|
60S ribosomal protein
|
TT_00966
|
1179
|
-
|
Allergen Asp f 4
|
TT_03265
|
1149
|
-
|
Tropomyosin
|
TT_06621
|
1129
|
-
|
NA
|
TT_01563
|
1093
|
-
|
Polypeptide-associated complex subunit α
|
TT_00802
|
1077
|
-
|
40S ribosomal protein
|
TT_08653
|
1059
|
-
|
Translation initiation factor 3 subunit C
|
TT_01895
|
1046
|
-
|
40S ribosomal protein
|
Fpkm values indicate average normalized transcript levels from three replicates. CAZy domains were predicted by dbCAN2 and functions were annotated by BLASTp search against the UniProt/Swiss-Prot reference dataset. NA, not annotated.
Table 6
Forty most highly expressed genes during T. terrestris LPH172 growth on beechwood xylan.
Transcript ID
|
fpkm
|
CAZy domain(s)
|
Putative function
|
TT_05599
|
2440
|
-
|
Mitochondrial oxidase
|
TT_00578
|
2053
|
-
|
Respiratory supercomplex factor 2
|
TT_03518
|
2024
|
-
|
NA
|
TT_10132
|
1967
|
-
|
Cytochrome c
|
TT_05357
|
1748
|
-
|
Acyl-CoA desaturase
|
TT_06621
|
1234
|
-
|
NA
|
TT_02482
|
1054
|
-
|
NA
|
TT_05010
|
1054
|
GH25
|
N,O-diacetylmuramidase
|
TT_01009
|
1030
|
-
|
NA
|
TT_05536
|
1028
|
-
|
Elongation factor 3
|
TT_07036
|
1009
|
-
|
Transcriptional regulatory protein
|
TT_07008
|
1007
|
-
|
NA
|
TT_06689
|
1004
|
-
|
Inositol oxygenase
|
TT_09947
|
913
|
-
|
Mitochondrial phosphate carrier protein
|
TT_09076
|
894
|
-
|
Copper-containing nitrite reductase
|
TT_00107
|
869
|
-
|
NA
|
TT_09465
|
867
|
-
|
Cross-pathway control protein 1
|
TT_03837
|
822
|
-
|
5'-AMP-activated protein kinase subunit β-2
|
TT_09870
|
802
|
-
|
Protein FDD123
|
TT_06824
|
796
|
-
|
Heat shock 70 kDa protein
|
TT_03870
|
795
|
-
|
Multiprotein-bridging factor
|
TT_09441
|
754
|
CE9
|
N-acetylglucosamine-6-phosphate deacetylase
|
TT_04420
|
753
|
-
|
Uncharacterized protein C18H10.17c
|
TT_08478
|
748
|
-
|
Histone H2B
|
TT_02247
|
710
|
-
|
Mitochondrial valine–tRNA ligase
|
TT_06668
|
690
|
-
|
Hedgehog-interacting protein
|
TT_09930
|
665
|
-
|
Histone H3
|
TT_06609
|
634
|
-
|
Uncharacterized protein C32A11.02c
|
TT_04469
|
632
|
-
|
5-methyltetrahydropteroyltriglutamate–homocysteine methyltransferase
|
TT_00469
|
590
|
-
|
60S ribosomal protein
|
TT_02932
|
576
|
-
|
60S ribosomal protein
|
TT_04772
|
573
|
-
|
Melanoma-associated antigen
|
TT_01967
|
562
|
-
|
60S ribosomal protein
|
TT_06693
|
558
|
-
|
Stress protein DDR48
|
TT_08166
|
554
|
CE5-CBM1
|
Acetylxylan esterase
|
TT_03035
|
552
|
GH72
|
1,3-β-glucanosyltransferase
|
TT_01345
|
535
|
-
|
40S ribosomal protein
|
TT_08034
|
534
|
-
|
AN1-type zinc finger protein
|
TT_01037
|
534
|
-
|
Glycerol-3-phosphate dehydrogenase
|
TT_01225
|
525
|
-
|
THO complex subunit 4A
|
Fpkm values indicate average normalized transcript levels from three replicates. CAZy domains were predicted by dbCAN2 and functions were annotated by BLASTp search against the UniProt/Swiss-Prot reference dataset. NA, not annotated.
To identify which genes, including CAZyme-encoding genes, were the most highly expressed (by transcript number) on the chosen substrates, we looked at the top 40 (arbitrary number) candidates under each growth condition, ranked by their average fragments per kilobase million (fpkm) value. The complete list of all expressed genes is available in Additional File 4. In general, the fpkm values of the 40 most abundant transcripts varied from 6586 to 557 for growth on Avicel (Table 4), from 10,224 to 1,046 for growth on rice straw (Table 5), and from 2,440 to 525 for growth on beechwood xylan (Table 6). Interestingly, when grown on Avicel, two of the most highly expressed genes encoded short peptides of 22 (TT_06621) and 124 (TT_06050) amino acids. TT_06621 was also among the top 40 expressed genes on both rice straw and beechwood xylan, whereas TT_06050 was very highly expressed on rice straw but not on beechwood xylan or glucose. The fourth most highly expressed gene on Avicel encoded a CAZyme: a putative GH7 endoglucanase with a CBM1 (TT_05797). Twelve other putative CAZymes were among the 40 most abundant transcripts on Avicel. These included typical cellulose-active enzymes, such as four AA9 LPMOs (TT_08370, TT_07455, TT_01736, and TT_04350), a GH6 cellobiohydrolase (TT_06655), GH5 and GH45 endoglucanases (TT_09000 and TT_01019), and an AA3-AA8 cellobiose dehydrogenase (TT_04380), as well as typical xylan-active enzymes, such as a GH11-CBM1 endoxylanase (TT_03075), a BLAST-annotated feruloyl esterase with a CBM1 (TT_06499), and a CE5 acetyl-xylan esterase (TT_08166). CAZymes with cellulose-binding CBM1 domains were overrepresented among the 40 most highly expressed genes on Avicel, and included all five CAZymes with a CBM1 in T. terrestris LPH172. Interestingly, a lactose permease (TT_09215) was also very abundant on both Avicel and rice straw.
Growth on rice straw seemed to favour gene expression and translation processes, as indicated by the high number of ribosome- and histone-related gene products. Out of 40 highly expressed genes, 12 encoded ribosomal subunits, which could coincide with the generally higher fpkm values on rice straw. Given the diverse polymer composition of this substrate, it was surprising to see only one CAZyme among the top 40 transcripts – a GH11 endo-1,4-β-xylanase (TT_01839) (Table 5). AA9 LPMOs, xylanases, and acetyl xylanesterases were also expressed on this substrate, but at lower levels (Additional File 4). A stress-response (TT_06693) and heat shock protein homologue (TT_02974) were highly expressed on rice straw, suggesting stress conditions during growth.
On beechwood xylan, four CAZymes were found among the 40 most highly expressed genes: a GH25 N,O-diacetyl muramidase (TT_05010), a CE9 N-acetylglucosamine-9-phosphate deacetylase (TT_09441), a CE5-CBM1 acetylxylan esterase (TT_08166), and a GH72 1,3-β-glucanosyltransferase. Three of those CAZymes are not involved in lignocellulosic biomass degradation but in growth and remodelling of the fungal cell wall (GH72) (60), fungal amino sugar metabolism during chitin degradation (CE9) (61), and defence against bacteria (GH25) (62). Similar to growth on rice straw, most highly expressed genes on beechwood xylan were related to general cellular metabolism (e.g. mitochondrial proteins), gene expression (histones and ribosomal proteins), and stress response (heat shock proteins and stress proteins). Biomass-degrading CAZymes, such as xylanases (GH10 and GH11), mannosidases (GH76), AA2 and AA3 oxidoreductases, and a GH13 amylase, were generally expressed at lower levels on beechwood xylan (Additional File 4).
Upregulated CAZymes on Avicel, rice straw, and beechwood xylan
Gene expression levels do not show the full spectrum of available lignocellulose-degrading enzymes in the organism, because many of them are sufficiently active at low concentration. Therefore, to understand which genes were induced under the tested conditions (Avicel, rice straw or beechwood xylan), we examined the differential expression of CAZymes with respect to glucose as reference. In particular, we focused on transcripts that were significantly more abundant (i.e. upregulated) compared to glucose.
On Avicel, AA9 LPMOs in combination with a AA3-AA8 cellobiose dehydrogenase and GH7, GH5, and GH45 endoglucanases, were the most highly upregulated CAZymes (Fig. 4, Additional File 5). AA9 LPMOs showed the highest differential expression, particularly in the case of TT_01736 (4644-fold), TT_08370 (1793-fold), and TT_04350 (468-fold). All three enzymes presented also very high fpkm numbers, indicating both high upregulation and high expression levels. Four more AA9 LPMOs were also significantly more abundant on Avicel: TT_07455 (84-fold), TT_06268 (64-fold), TT_04352 (43-fold), and TT_02354 (5-fold). Notably, TT_06268 exhibited an fpkm value of only 5, whereas the other AA9s had fpkm values between 25 and 2417. Interestingly, many non-cellulose-acting CAZymes were also upregulated on Avicel; this included feruloyl esterase (TT_06499, 525-fold), GH11 xylanases (TT_03075, 151-fold; TT_01839, 102-fold; TT_08161, 30-fold), CE16 and CE5 acetyl (xylan) esterases (TT_06012, 147-fold; TT_08166, 54-fold; TT_05762, 26-fold), and an AA4 vanillyl-alcohol oxidase (TT_00225, 426-fold). The expression levels of these genes varied widely.
The set of highly upregulated CAZymes on rice straw shared some candidates with Avicel, such as several AA9 LPMOs (TT_06268, 1134-fold; TT_08370, 265-fold; TT_04352, 150-fold; TT_01736, 140-fold) and the hemicellulose-active GH11 xylanases TT_01839 (1951-fold) and TT_02489 (105-fold), CE5 acetylxylan esterase TT_05762 (55-fold), and CE16 acetyl esterase TT_06012 (52-fold). Generally, more hemicellulose-active enzymes were highly upregulated on rice straw than on Avicel, pointing to a more diverse substrate composition of the former. Interestingly, the highest upregulation on rice straw was detected for the mannan endo-1,4-β-mannosidase TT_06537 (1951-fold), even though mannan is not a major polymer in this substrate. Notably, this gene had a low fpkm value of 53. The second most upregulated gene, GH11 endo-1,4-β-xylanase TT_01839 (1610-fold), had an fpkm of 2461. A few putative cellulose-acting enzymes were upregulated on rice straw but not on Avicel, such as AA9 LPMO TT_03770 (12-fold) and the AA8 TT_09190 (19-fold). Another putative AA8 cellobiose dehydrogenase (TT_02325) was upregulated 1168-fold, although not at a statistically significant level (p = 0.135) (Fig. 4, Additional File 5).
On beechwood xylan, upregulation of CAZymes was more muted, and fewer overlaps with other substrates were detected. Despite beechwood xylan being a pure xylan substrate, only a fraction of upregulated CAZymes were xylan-acting, such as CE5 acetylxylanesterases TT_05762 (67-fold) and TT_08166 (21-fold), GH11 endo-1,4-β-xylanases TT_01839 (35-fold) and TT_03075 (10-fold), and GH10 endo-1,4-β-xylanases TT_08161 (5-fold) and TT_09033 (5-fold). Acetylxylan esterase TT_05762 presented the highest upregulation and expression on beechwood xylan; whereas the other candidates were more highly upregulated and expressed on Avicel, rice straw, or both. Several enzymes active on chitin and possibly involved in fungal cell wall modulation were upregulated on beechwood xylan, such as GH18 chitinases TT_05685 (28-fold), TT_04717 (11-fold), endo-chitosanoase TT_08109 (3-fold), and the GH72 and CE9 enzymes mentioned above. Transcripts of several AA9 LPMOs were also more abundant on beechwood xylan compared to glucose (TT_06268, 36-fold; TT_01736, 36-fold; TT_08370, 5-fold), although again at much lower levels than on the other substrates. A variety of cellulose-, mannan-, pectin- and arabinan-active CAZymes were upregulated at low levels (2- to 4-fold); the same was observed for some enzymes typically associated with lignin degradation (Fig. 4, Additional File 5).