Morphological characteristics of walnut endocarp
To identify the critical lignification periods during development of young fruit, the timing and pattern of lignin deposition was studied. Within 30 DAFB, the pericarp was clearly stratified, and the endocarp appeared milky white and the Wiesner reaction revealed that no lignification occurred during this period (Fig.1 A, C, E, G). Only the vascular tissue at the junction of the mesocarp and endocarp was stained. At 45 DAFB, the endocarp near the outer pericarp began to lignify. The Wiesner reaction showed that lignification gradually occurred out from the outside to the inside of the endocarp (Fig.1 B, D, F, H). The apex (blossom end) and the junction of the endocarp and mesocarp were most stained. Pink gradually diffused inward from the junction and became shallow. Little or no staining was observed in tissues other than the endocarp with the exception of scattered small vascular elements in the mesocarp, trichomes present on the fruit surface and residual floral structure. Seed kernels were still water-like at this time. The key point at which the lignification of the endocarp begins to develop is between 30 and 45 DAFB, suggesting that the lignification of the endocarp begins relatively early in the development of the fruit. To ensure there was no lignin deposition in the whole endocarp, the endocarp of ZM and L7 of 20 DAFB was collected for transcriptome sequencing, when the endocarp and mesocarp had differentiated, ensuring the accuracy of sampling. Forty-five DAFB is a time of rapid lignin formation in the endocarp. Therefore, samples of 45 DAFB were also collected for transcriptome sequencing. As seen from the fresh fruit pictures in each dimension, the mesocarp of ZM is thicker and oval in shape, while the mesocarp of L7 is thinner and round in shape. The difference in the endocarp between the two cultivars, however, was not obvious.
Transcriptome sequencing and quality control
In total, 501,163,896 raw reads were obtained from the transcriptome sequencing of all samples, and 423,289,427 clean reads were mapped to the walnut reference genome (https://www.ncbi.nlm.nih.gov/genome/?term=txid51240[orgn]) after filtering out the low-quality reads (Table 1). The results of the mapping rate, Q30 rate and GC content suggested the high quality of our data.
Table 1 Data output quality list
Sample name
|
Clean reads number
|
HQ clean reads number
|
HQ clean data(bp)
|
Q20(%)
|
Q30(%)
|
GC(%)
|
Total mapped
|
Multiple mapped
|
Uniquely mapped
|
T-ZM-1
|
35615318
|
35374570
|
5291868071
|
96.13
|
90.61
|
45.78
|
35285776(85.50%)
|
1264394 (3.58%)
|
28905769 (81.92%)
|
T-ZM-2
|
25050350
|
24890318
|
3723545540
|
96.20
|
90.69
|
45.76
|
24773372(85.70%)
|
888828 (3.59%)
|
20341281 (82.11%)
|
T-ZM-3
|
27408692
|
27205416
|
4068797895
|
96.00
|
90.30
|
45.88
|
27113566(85.18%)
|
964252 (3.56%)
|
22131934 (81.63%)
|
F-ZM-1
|
28753730
|
28587726
|
4277543832
|
96.39
|
91.09
|
45.32
|
28516602(86.08%)
|
1134118 (3.98%)
|
23413987 (82.11%)
|
F-ZM-2
|
24711764
|
24530414
|
3671111290
|
96.01
|
90.33
|
45.42
|
24472276(85.18%)
|
932310 (3.81%)
|
19911981 (81.37%)
|
F-ZM-3
|
26865506
|
26702752
|
3995147175
|
96.22
|
90.71
|
45.33
|
26633542(85.61%)
|
1045948 (3.93%)
|
21754542 (81.68%)
|
T-L7-1
|
25572784
|
25419584
|
3801704427
|
96.29
|
90.90
|
45.83
|
25336008(86.08%)
|
897638 (3.54%)
|
20910521 (82.53%)
|
T-L7-2
|
30463302
|
30261640
|
4525656120
|
96.10
|
90.52
|
46.05
|
30188378(85.55%)
|
1061016 (3.51%)
|
24764898 (82.03%)
|
T-L7-3
|
22644096
|
22460148
|
3359569046
|
95.76
|
89.82
|
45.76
|
22395556(84.57%)
|
755580 (3.37%)
|
18184535 (81.20%)
|
F-L7-1
|
26090952
|
25933396
|
3878988211
|
96.18
|
90.73
|
45.48
|
25875950(85.23%)
|
994914 (3.84%)
|
21058716 (81.38%)
|
F-L7-2
|
27939918
|
27764000
|
4153061801
|
96.03
|
90.35
|
45.55
|
27676730(85.16%)
|
1065578 (3.85%)
|
22503546 (81.31%)
|
F-L7-3
|
24382998
|
24230820
|
3625142180
|
96.28
|
90.96
|
45.39
|
24122168(85.48%)
|
952856 (3.95%)
|
19665864 (81.53%)
|
Clean reads: the reads removing low-quality reads. HQ clean reads: the reads from the second filter. Q20 and Q30: the mistake rate lower than 1% and 0.1%, respectively. GC content: the GC ratio of total base number. Total mapped: the number of reads which can be mapped to the reference genome. Multiple mapped: the number of reads with multiple alignment positions on the reference sequence. Uniquely mapped: the number of reads with unique alignment positions on the reference sequence.
Supplementary Fig. S1 showed the Pearson’s correlation coefficient results of all groups. The correlation of each group was higher than 0.95, suggesting the high quality of the sequencing data. PCA is largely used to show the structure and relationship of the samples; the results of PCA in this study showed good correlation in each group and the difference between the two periods in each cultivar (Supplementary Fig. S2).
Analysis of genes between groups
There were 36,861 reference genes in total, 30,276 of which were known. The new gene number of all the samples through sequencing was 1,870. The expressed gene number of each group was clearly illustrated in Table 2.
Table 2 Gene statistics for each group
Group name
|
Known gene number
|
New gene number
|
All gene number
|
T-ZM
|
27494 (74.59%)
|
1762
|
29256
|
F-ZM
|
27663 (75.05%)
|
1774
|
29437
|
T-L7
|
26899 (72.97%)
|
1744
|
28643
|
F-L7
|
27453 (74.48%)
|
1768
|
29221
|
To identify the overall difference, differential expression analysis was performed for the different periods and cultivars. The (T-ZM vs F-ZM) and (T-L7 vs F-L7) groups revealed the shells’ DEGs in different developmental stages and showed that they were related to lignin biosynthesis, which had the most DEGs. As a result, 3502 and 3663 genes were individually up- and downregulated between T-ZM and F-ZM while there were 3418 and 2727genes between T-L7 and F-L7 (Fig. 2a). Compared with L7, ZM had more DEGs during the lignin formation period. In the (T-ZM vs F-ZM) group, there were more downregulated genes than upregulated genes. But in the (T-L7 vs F-L7) group, there were more upregulated genes than downregulated genes. The two groups had 4,537 DEGs in common (Fig. 2b) from which the same regulatory genes controlling the growth and development of the two varieties can be found. Apart from these common differences, the (T-ZM vs F-ZM) and (T-L7 vs F-L7) group had 2,628 and 1,608 genes, respectively, which may hold the key to the differences between the two varieties. The (T-L7 vs T-ZM) and (F-L7 vs F-ZM) groups showed the shells’ DEGs in different cultivars at the same development stages, in which 1672 and 881 genes were individually up and downregulated between T-L7 and T-ZM, respectively, and 1,233 and 1,025 genes were up and downregulated between F-L7 and F-ZM. Compared with L7, ZM had more upregulated genes in each period. The (T-ZM vs F-ZM), (T-L7 vs F-L7) and (F-L7 vs F-ZM) groups had 516 common DEGs, including DEGs that play a role in the different varieties and at different stages of development. Since there was no lignin at 20 DAFB, we did not compare the (T-L7 and T-ZM) group with the other groups. The above results showed that the number of DEGs in different developmental stages of ZM was higher than that of L7, indicating that the developmental regulation of ZM was more complicated than that of L7.
Function prediction
To learn the potential functions of the DEGs between the different groups, GO (http://www.geneontology.org/) and KEGG (https://www.genome.jp/) enrichments were conducted for functional classification. GO analysis showed that all DEGs can be classified into biological processes (BP), cell components (CC) and molecular functions (MF) and each of them contains several exact functions and GO items (Fig. 3).
The overall GO analysis showed that the differential genes for CC were mainly concentrated in the cell parts, organelles and membranes, the MF included catalytic activity and binding; and the BP were metabolic processes, cellular processes and single-organism processes, etc. The proportion of upregulated genes was significant in (T-L7 vs T-ZM) and (F-L7 vs F-ZM) groups, while the proportion of downregulated genes was significant in the other two groups, and the total number of DEGs was much larger. In comparing the (T-ZM vs F-ZM) and (T-L7 vs F-L7) groups, the number of downregulated genes of ZM in each component was significantly higher than that of L7, while the upregulated genes were not significantly different. After FDR<0.05 (adjusted p value) condition screening and refinement classification, further functional analysis could be performed (Supplementary Table S1). Except for the (F-L7 vs F-ZM) group, the lignin metabolic process pathway and phenylpropanoid metabolic processes were significantly enriched in the rest of the groups. The lignin metabolic process pathway was the most significant in the (F-L7 vs F-ZM) group. No lignin was found in the endocarp at 30 DAFB (Fig. 1), nor 20 DAFB. The enrichment of the lignin metabolic process pathway and phenylpropanoid metabolic process in the (T-L7 and T-ZM) group may have been due to the growth and development of cell walls in the endocarp. It may also have been due to the development and formation of vascular tissue. There were more different pathways in the (T-ZM vs F-ZM) and (T-L7 vs F-L7) groups, and the cell wall related pathways and DEGs were also significantly increased. Secondary metabolic processes, responses to hormone and hormones transport were also significantly enriched. A large number of hormones regulated and participated in regulating the changes in the different developmental stages, indicating that the growth and development at this time were due to secondary metabolism.
The top 20 KEGG pathway enrichment analyses of four groups are shown in Fig. 4. In the (T-ZM vs F-ZM) and (T-L7 vs F-L7) groups, most DEGs were enriched in plant hormone signal transduction, phenylpropanoid biosynthesis, tryptophan metabolism and starch and sucrose metabolism (Fig. 4A, B). A large number of hormones were involved in the transformation of the endocarp from growth to secondary metabolism. The differences in these metabolic pathways were consistent with the results in GO enrichment. In the (T-ZM vs F-ZM) group, fatty acid elongation and DNA replication were also significantly enriched. In the (T-L7 vs F-L7) group, photosynthesis, ascorbate and aldarate metabolism, thiamine metabolism and ABC transporters were significantly enriched. These differences might explain the developmental status and nutrient supply differences between the two varieties. Glutathione metabolism and taurine and hypotaurine metabolism showed significant differences in the (F-L7 vs F-ZM) group, while DNA replication, glutathione metabolism and purine metabolism showed significant differences in the (T-L7 and T-ZM) group (Fig. 4C, D).
Expression analysis of DEGs related to lignin biosynthesis and validation of DEG expression by RT-qPCR
The continuous deposition of lignin in walnut endocarp and its eventual lignification is complicated. A series of biological processes are involved, including the synergistic action of many genes to regulate various pathways, of which the phenylpropanoid biosynthesis pathway is one of the most important.
The relative expression of the DEGs in the (T-ZM vs F-ZM) and (T-L7 vs F-L7) groups in the phenylpropanoid biosynthesis pathway are intuitively showed in Fig.5. The expression of the genes corresponding to the key enzymes in the metabolic pathway at different stages of ZM and L7 are also visually illustrated. In total, seventy-six DEGs were annotated in this pathway in ZM and forty DEGs were highly expressed, including, 4CL (6), CAD (3), CCR (1), C4H (2), HCT (2), C3’H (2), COMT (2), CCoAOMT (1), F5H (2), CSE (1) and POD (18). Most of the genes were upregulated and the most downregulated gene was POD. A total of sixty-five DEGs in L7 were enriched in this pathway, of which thirty-two were highly expressed, including PAL (1), 4CL (5), CAD (3), CCR (1), C4H (2), HCT (1), C3’H (2), COMT (1), CCoAOMT (1), F5H (1) and POD (14). As in the case of ZM, most of the DEGs in L7 were upregulated, although there were fewer DEGs than in ZM. The expression of most of the DEGs was consistent in the two varieties, and the differences were mainly concentrated in PAL, HCT, POD, CSE, etc. PAL is located at the entrance of the phenylpropanoid pathway and is involved not only in the synthesis of lignin monomers but also in the synthesis of other phenolic substances [34]. Therefore, there was no expression difference between the two periods in ZM. PAL, 4CL and C4H often form multi-enzyme complexes to efficiently regulate the phenylpropanoid metabolism pathway [35]. Both 4CL and C4H showed an upregulated trends. HCT controls the biosynthesis of lignin monomers and is an important control point for the biological conversion of H monomer and G/S monomer [36, 37]. During the period of lignin synthesis, the downregulation of the gene expression might be related to the regulation of lignin monomer type. POD enriched the most DEGs in both ZM and L7. POD is a key enzyme involved in the last step of lignin synthesis, which catalyses the dehydrogenation of the monomers metabolized by phenylpropane to form lignin. POD is encoded by multiple gene families. Due to the differences in gene structure and function, POD also participates in a variety of metabolic pathways, such as auxin metabolism, cell wall extension and thickening, metabolism of reactive oxygen species and reactive nitrogen, and various resistance processes of plants [38, 39]. CSE has been demonstrated to have the ability to catalyse the conversion of caffeoyl shikimate into caffeate in Arabidopsis thaliana [40], and Wout [41] confirmed its role in poplar lignin synthesis. However, due to a lack of further evidence, its function has not been fully explored.
The transcriptome results narrowed the scope for further screening. RT-qPCR was performed on some identified genes related to the lignin pathway. The specific gene primers involved in the RT-qPCR analysis and corresponding gene information are listed in Supplementary Table S2. Eighteen genes encoding key enzymes in the phenylpropanoid biosynthesis pathway were selected. Flavonoids are important secondary metabolites, and involved in plant growth and development, disease resistance and other important processes [42]. The synthesis of flavonoids is derived from the phenylpropanoid pathway, of which chalone synthase (CHS) is the first step. Thus, a gene encoding CHS was also being examined.
The results showed that the expression profile of the pathway related to DEGs determined by RT-qPCR was consistent with the corresponding log2 (F/T) value of the RNA-seq analysis (Table 3).
The expression level of genes involved in the phenylpropanoid pathway was downregulated in L7 compared with the ZM cultivar in both the development stages and lignin synthesis. The two COMT genes showed the same expression pattern in ZM and L7. They were highly expressed at the growth stage in ZM, and they continuously increased in the lignin formation stage until they reached the peak value of 65DAFB. This gene showed the same change trend in L7, but the difference multiple was much lower than that of ZM (Fig.6). Except for CCR, the expression levels of other genes in the phenylpropanoid biosynthesis pathway were all greater in ZM than in L7. C3'H (gene11203) showed the same expression trend in ZM and L7, but the peak value of L7 appeared later (70DAFB). C3'H (gene21068) was always in highly expressed in ZM, reaching its maximum value at 65DAFB, which was much higher than that in L7. HCT (gene 11284), CAD (gene39775), and C4H (gene40342) had a similar situation with C3'H (gene21068). CCR1 was highly expressed at the beginning of lignin formation in L7 and reached its peak at 50 DAFB. The trend of CCoAOMT (gene 32982) was similar. The expression levels of POD varied greatly. The large differences in these key genes may be the key reason for the differences in endocarp lignification between the two varieties.
Table 3 Expression trends of 18 genes by real-time PCR and sequencing
Gene ID
|
F-ZM
|
F-L7
|
Note
|
gene10640
|
1.2
|
1.34
|
RNA-seq
|
|
9.26
|
1.13
|
qPCR
|
gene11284
|
1.66
|
0.98
|
RNA-seq
|
|
2.48
|
2.61
|
qPCR
|
gene12497
|
-3.43
|
-2.93
|
RNA-seq
|
|
0.12
|
0.11
|
qPCR
|
gene12948
|
2.04
|
1.96
|
RNA-seq
|
|
2.87
|
4.85
|
qPCR
|
gene16314
|
-3.38
|
-2.24
|
RNA-seq
|
|
0.53
|
0.32
|
qPCR
|
gene21068
|
2.36
|
1.58
|
RNA-seq
|
|
24.54
|
9.92
|
qPCR
|
gene22810
|
3.99
|
4.08
|
RNA-seq
|
|
2.42
|
2.39
|
qPCR
|
gene32982
|
4.79
|
5.97
|
RNA-seq
|
|
7.10
|
7.08
|
qPCR
|
gene35880
|
8.21
|
8.92
|
RNA-seq
|
|
8039.50
|
256.15
|
qPCR
|
gene37179
|
6.99
|
7.01
|
RNA-seq
|
|
29.61
|
28.32
|
qPCR
|
gene39775
|
1.88
|
1.93
|
RNA-seq
|
|
1.54
|
1.38
|
qPCR
|
gene40342
|
1.27
|
1.33
|
RNA-seq
|
|
1.68
|
1.26
|
qPCR
|
gene40352
|
1.02
|
0.82
|
RNA-seq
|
|
11.22
|
1.26
|
qPCR
|
gene42443
|
9.70
|
8.13
|
RNA-seq
|
|
82.97
|
629.17
|
qPCR
|
gene8744
|
-1.95
|
-0.49
|
RNA-seq
|
|
0.15
|
0.13
|
qPCR
|
gene35863
|
-1.72
|
-1.03
|
RNA-seq
|
|
0.81
|
0.45
|
qPCR
|
Gene11203
|
2.54
|
1.95
|
RNA-seq
|
|
0.10
|
0.25
|
qPCR
|
gene32581
|
-1.63
|
-0.86
|
RNA-seq
|
|
0.15
|
0.01
|
qPCR
|
Note: The data in RNA-seq is the difference multiple of log2 (F/T) value, and the data in qPCR is the relative expression of 2-ΔΔCT value. The endocarp of 20 DAFB was used as control group.
According to the previous studies, the lignin biosynthetic genes are generally regulated by different families of transcription factors, such as the MYB, NAC,LBD and NST families etc. [27, 43]. We made a heat map analysis of all the screened genes in these families (Supplementary Figure S3). The specific gene information is shown in the Supplementary Table S3.Under the given conditions (FPKM>5, |log2F/T|>1), the lignin-related transcription factors concerned in some studies were not screened out, such as MYB46 and MYB83. NST1 and LBD41 were highly expressed in the lignin generation stage of both varieties, but they were significantly higher in ZM than in L7, which is note worthy. Several transcription factors whose differential multiples did not meet the criteria were also screened because their expression might be involved in lignin metabolism. Most transcription factors were upregulated during lignin formation. Several groups of different transcription factor families had similar expression patterns, which were related to the lignin formation law. NAC100 and NAC12 had significant high expression in F-L7.