Gene family identification
Combined with BLASTP and HMMER, 30 gene families related to anthocyanin metabolism were systematically identified from tea genome, including 2110 independent sequences. Among them, PAL (6), C4H (6), 4CL (34), CHS (16), ACC (2), CHI (7), F3H (2), F3'H (1), F3'5'H (4), DFR (60), FLS (3), ANS (2) are responsible for the catalytic reaction of anthocyanin skeleton synthesis. Gene families involved in anthocyanin structural modification include UFGT (340), OMT (44), BAHD (181), gene families involved in anthocyanin transport include GST (94), MRP (92), MATE (99), SNARE (40), gene families involved in anthocyanin degradation include PPO (6), PRX (119) and β-glycosidase (25). The families related to intracellular iron ion content include FC (7) and VIT (6). Among the transcription factor families involved in anthocyanin regulation, MYB family has the largest number of members, with 343. The WD4O family has 243 members and the BHLH family has 174 members. SUT (12), SWEET (24), and MST (116) are involved in sugar transport (Supplementary Table S1).
C4H, F3'H, and F3'5'H belong to the cytochrome P450 (CYP450) superfamily. C4H belongs to CYP73 subfamily, F3'5'H and F3'H belong to CYP75A and CYP75B subfamily, respectively. Secondly, F3H, FLS and ANS belong to 2-ODD superfamily. In order to further distinguish the subfamily members involved in anthocyanin metabolism, we constructed a phylogenetic tree of superfamily. By comparing the evolutionary relationship with well-defined family members in related species, we finally determined the member composition of C4H/F3'H/F3'5'H and F3H/FLS/ANS (Supplementary Fig. S1).
The sequences of subfamily members were conservatively analyzed. Motif analysis of protein sequences showed strong sequence conservation among members belonging to the same superfamily. Under the conservative search framework of 10 motif, the protein sequence contains at least 4 motif. Compared with F3H and FLS family, ANS family proteins lack Motif7, and F3H family proteins have all Motif. Among the three families of C4H, F3'H and F3'5'H, Motif6 and Motif8 belong to C4H family alone. The promoter sequences of family members were analyzed by functional modules. A fragment with a length of 2000bp upstream of ATG was extracted as a promoter sequence to search for cis-acting elements. The results show that in C4H, F3'H and F3'5'H, light-responsive elements and stress-responsive elements are the most important components(Fig. 1A). The same results also appear in F3H, FLS and ANS(Fig. 1B).
Expression Profile of Anthocyanin Metabolic Genes in Tea Plant
Based on the sample transcriptome sequencing data, we analyzed the expression of anthocyanin metabolism related genes. We selected the members with the highest average expression (TMM) in each family for visual display. On the whole, the genes responsible for skeleton catalytic reaction, structural modification and transcriptome regulation have more intense expression intensity in the process of tea anthocyanin metabolism. In the anthocyanin skeleton structure synthesis family, ANS obviously showed lower expression level in all samples. F3'5'H maintained high expression in purple leaves and low expression in green leaves. Compared with MATE/MRP/SNARE, the anthocyanin transporter gene GST maintained high expression in all samples. PRX, a gene related to anthocyanin degradation, was highly expressed in all samples. In all samples, PPO was only highly expressed in purple leaves of ZJ(Fig. 2A).
Comparative transcriptome analysis
The data of gene expression during anthocyanin deposition and digestion in tea leaves were analyzed by sample difference analysis. In the anthocyanin accumulation stage, we compared the leaf transcriptome data of ZK and N61 at the 15th day. Compared with the leaves of N61 on the 15th day, the leaves of ZKZ on the 15th day were purple and rich in anthocyanins. Of the 2110 identified genes related to anthocyanin metabolism in tea plants, 1727 genes were detected to have transcriptional expression fluctuations. Among them, 351 genes were significantly up-regulated and 425 genes were significantly down-regulated. There were 30 up-regulated and 29 down-regulated genes related to sugar transport (SUT/SWEET/MST), 95 up-regulated and 166 down-regulated genes related to anthocyanin structural modification (BAHD/UFGT/OMT), 22 up-regulated and 17 down-regulated genes related to anthocyanin degradation (PPO/peroxidase/β-glycosidase). Among all differentially expressed genes (logFC>1), F8073952_MYB had the largest up-regulation. F8125006_BAHD has the largest downward adjustment(Fig. 3A).
The young leaves (the second and third leaves) of ZJ tea plant are purple, and gradually turn green with the growth and development (the fourth and fifth leaves). We explored the family members involved in anthocyanin degradation by comparing the variation of genes in the process of leaves changing from purple to green. Transcription levels of 1718 genes were detected. Among them, 585 genes were significantly up-regulated and 495 genes were significantly down-regulated. There were 30 up-regulated and 30 down-regulated genes related to sugar transport (SUT/SWEET/MST), 142 up-regulated and 169 down-regulated genes related to anthocyanin structural modification (BAHD/UFGT/OMT), 46 up-regulated and 29 down-regulated genes related to anthocyanin degradation (PPO/peroxidase/β-glycosidase). Among all differentially expressed genes (logFC>1), F8085057_4CL had the largest up-regulation and F8090744_UFGT had the largest down-regulation(Fig. 3B).
In order to further characterize the overall expression trend of anthocyanin-related gene family, we explored the expression distribution of anthocyanin-related genes in two groups of samples by gene set enrichment analysis (GSEA). According to the functional similarity, 30 family genes related to anthocyanin metabolism were divided into 7 gene sets. They include sugar transport (SUT/SWEET/MST), anthocyanin transport (GST/MRP/MATE/SNARE), transcription (MYB/BHLH/WD40), structural modification (UFGT/OMT/BAHD), skeleton synthesis (PAL/C4H/4CL/ACC/CHS/CHI/F3H/F3'H/F3'5'H/DFR/ANS/FLS), degradation (PPO/PRX/β-glycosidase) and iron ion (VIT/FC). The top five members of each gene set are identified in red font. The results showed that during anthocyanin deposition(15d_Zikui_vs_15d_N61) (Supplementary Fig. S3A-G)and deposition(ZJG_vs_ZJP) (Supplementary Fig. S3H-N), the overall distribution characteristics of different gene sets were similar, and during anthocyanin digestion, the gene sets showed a more obvious apical enrichment trend. At the same time, the composition of the top five members in the gene set changes along with the deposition and degradation of anthocyanins(Supplementary Fig. S3).
GO and KEGG analysis
GO and KEGG databases were used to classify differentially expressed genes during anthocyanin synthesis and degradation in tea leaves. As shown in the figure(Supplementary Fig. S4A), the differentially expressed genes in the process of anthocyanin degradation are mainly involved in biological processes such as phenylpropane metabolism and biopigment synthesis in plants. Among the differentially expressed genes, the expression level of F8000014_MST was up-regulated and that of F8019759_MST was down-regulated.
The differentially expressed genes in anthocyanin synthesis are mainly involved in the biological processes of plant growth and development and bio-pigment synthesis. The expression level of F8117555_PRX was up-regulated and that of F8133422_GST was down-regulated(Supplementary Fig. S4B).
In KEGG pathway analysis, we focus on the biosynthetic pathway of flavonoids. During anthocyanin deposition in tea leaves(Supplementary Fig. S5A), only BAHD (EC:2.3.1.133) and DFR (EC:1.1.1.234) were noted to have the greatest down-regulation. The expression level of F3'5'H (EC: 1.14.14.81) was up-regulated most. During anthocyanin digestion, the expression levels of BAHD (EC:2.3.1.133), CHS (EC:2.3.1.74) and F3'5'H (EC:1.14.14.81) were up-regulated. DFR (EC:1.1.1.234) and F3'H (EC:1.14.14.82) maintained low expression levels(Supplementary Fig. S5B).
PPI
Protein network interaction analysis of family members was carried out. By calculating the median center value, the interaction regulation network between proteins was constructed. The results show that F8133625_4CL, F8010408_OMT, F8111050_ANS, F8104292_ACC, F8131565_CHS, F8027654_DFR, F8078229_MYB, F8126447_MST, F8033662_CHS, F8062972_CHS, F8131575_CHS, F8131570_CHS, F8117377_C4H, F8061752_MYB and F8130616_MYB have the largest intermediate centrality and are at the core of the network(Supplementary Fig. S6).
WGCNA
Transcriptional expression data of all samples were analyzed by gene weighted co-expression network. The coexpression network is constructed by multi-step method, and the soft threshold is set to 14. Genes are divided into 31 modules according to the similarity of expression(Supplementary Fig. S7). Screen the module with the highest correlation with the sample, and check the distribution of gene family members in the module. Focus on genes with more than 0.75 correlation with modules and traits (leaf development stage) (MM > 0.75 and GS > 0.20). In the green tea tree variety N61, ‘coral2’ and ‘bisque4’ modules were selected(Fig. 4A-B). The high correlation between genes and modules and traits indicates that they may play an important role in anthocyanin metabolism. In the leaf samples of tea ZK variety on the 15th day(Fig. 4C), a large amount of anthocyanins were synthesized and accumulated. F8034678_MATE, F8057423_GST and F8115002_MATE are located in the highlighted areas. The leaves of 'Zikui' at 45 days had almost regained their green color, and the module with the highest correlation was 'antiquewhite2'(Fig. 4D). The second and third young leaf blades of 'Zijuan' are purple because they are rich in anthocyanin, and the module with the highest correlation at this time is 'coral3'(Fig. 4E). In ZJG samples of tea ZJ varieties(Fig. 4F), anthocyanins were degraded in large quantities, and leaves recovered to green color. F8132897_MATE, F8008682_MATE, F8027311_SWEET, F8075814_MST, F8054991_MST and F809393_MST are marked.
Mantel test
The effects of transcription regulator (MYB/BHLH/WD40) and sugar transporter (SUT/SWEET/MST) on anthocyanin metabolism genes in tea leaves were analyzed by mantel test(Fig. 5). The results showed that the transcription factor MYB had the highest significant positive correlation with F8084859_β-glycosidase during anthocyanin deposition, and the correlation coefficient was 0.95. BHLH had the highest significant positive correlation with F8012329_F3'5'H, and the correlation coefficient was 0.98. WD40 has the highest significant positive correlation with F8073909_FC, and the correlation coefficient is 0.91(Fig. 5A). MST has the highest significant positive correlation with F8021299_FLS, and the correlation coefficient is 0.99. SWEET had the highest significant positive correlation with F8021299_FLS, and the correlation coefficient was 0.94. SUT had the highest significant positive correlation with F8004560_MRP, and the correlation coefficient was 0.98(Fig. 5B).
During anthocyanin digestion, transcription factor MYB had the highest significant positive correlation with F8091811_4CL, with a correlation coefficient of 0.98. BHLH had the highest significant positive correlation with F8012329_F3'5'H, and the correlation coefficient was 0.99. WD40 had the highest significant positive correlation with F8012329_F3'5'H, and the correlation coefficient was 0.99(Fig. 5C). MST had the highest significant positive correlation with F8012329_F3'5'H, and the correlation coefficient was 0.99. SWEET had the highest significant positive correlation with F8012329_F3'5'H, and the correlation coefficient was 0.99. SUT had the highest significant positive correlation with F8098931_CHI, and the correlation coefficient was 0.92(Fig. 5D).
PLS-SEM
Structural equation modeling was used to analyze the interaction between anthocyanin-related gene families. According to the functional similarity of genes, 30 family members related to anthocyanin metabolism in tea plant were divided into 7 gene sets. Each gene set is treated as a latent variable, and these latent variables characterize different factors that influence anthocyanin metabolic response. They include sugar transport (SUT/SWEET/MST), anthocyanin transport (GST/MRP/MATE/SNARE), transcription (MYB/BHLH/WD40), structural modification (UFGT/OMT/BAHD), skeleton synthesis (PAL/C4H/4CL/ACC/CHS/CHI/F3H/F3'H/F3'5'H/DFR/ANS/FLS), degradation (PPO/PRX/β-glycosidase) and iron ion (VIT/FC). The transcriptional expression levels of family members in different samples were taken as observation variables. We calculated the interaction models between various factors in the process of anthocyanin deposition and depletion in tea leaves.
In the process of anthocyanin deposition in tea leaves(Fig. 6A), sugar transport variables have positive effects on anthocyanin transport variables and anthocyanin degradation variables, and anthocyanin transport variables have positive effects on anthocyanin degradation variables. Iron ion variable has a positive effect on anthocyanin degradation variable. In the process of anthocyanin digestion in tea leaves(Fig. 6B), sugar transport variables had a positive effect on anthocyanin transport variables and a negative effect on anthocyanin degradation variables, and anthocyanin transport variables had a negative effect on anthocyanin degradation variables. Iron ion variable has negative effect on anthocyanin degradation variable. The transcription factor variable had a negative effect on anthocyanin degradation variable during anthocyanin deposition in tea leaves. During anthocyanin digestion in tea leaves, transcription factor variables had a positive effect on anthocyanin degradation variables.
Quantitative real time PCR analysis
Based on structural equation model analysis of transcriptome data, it was found that three potential variables, sugar transport, anthocyanin transport and iron ion, had obvious relationship with anthocyanin degradation in tea leaves. Therefore, we focus on these three potential variables. And further verify their expression at transcription level. Finally, 18 were selected for qRT-PCR analysis(Fig. 7). They all changed significantly during anthocyanin digestion in tea leaves. F8100989_SUT, F8046659_SUT, F8038375_SWEET, F8056930_SWEET, F8102264_MST and F8110687_MST are potential variables of sugar transport. F8072987_GST, F8068893_GST, F8121721_MRP, F8007020_MRP, F8121816_MATE, F8091057_MATE, F8060149_SNARE and F8013611_SNARE are potential variables of anthocyanin transport. Consistent with transcriptome data, their expression levels in tea leaf anthocyanin digestion samples (ZJG) were higher than those in anthocyanin deposition samples (ZJP). F8006548_FC, F8010116_FC, F8103044_VIT and F8001407_VIT are potential variables of iron ion. The expression levels of F8006548_FC and F8010116_FC in tea leaf anthocyanin digestion samples (ZJG) were significantly lower than those in anthocyanin deposition samples (ZJP).