Principal component analysis (PCA) reveals differences in anthocyanin metabolite profiles
Traditionally and practically, the fruiting of C. humilis is classified into four developmental stages. Metabolite content data processing was performed to improve normalization using R (www.r-project.org/) to compare the accumulation of metabolites in different samples by HCA. Based on relative differences in accumulation patterns among different stages, the HCA of anthocyanins resulted in four main clusters in C. humilis (Fig. 1a). Anthocyanins belonging to clusters 1 and 2 accumulated at the highest levels in S4, while anthocyanins within clusters 3 and 4 did not differ significantly among stages.
Metabolite profiles of the fruit peel were then subjected to a principal component analysis (PCA) [32]. The score plots for the PCA (Fig. 1b) showed obvious separation among fruit peel development stages. The first two principal components (PC1 and PC2) for the flowers explained 62.42% and 15.02% of the total variance, respectively. The fruit peel contained 16 anthocyanins based on subsequent HPLC-MS and MRM (multiple reaction monitoring) analyses (Additional file 1: Table S1), with 11 differential metabolites (fold change ≥ 2 or fold change ≤ 0.5), including peonidin O-hexoside, cyanidin 3-O-malonylhexoside, cyanidin O-syringic acid, pelargonidin O-acetylhexoside, cyanidin 3-O-glucoside (kuromanin), procyanidin A1, cyanidin 3,5-O-diglucoside (cyanin), pelargonidin 3-O-beta-d-glucoside (callistephin chloride), cyanidin 3-O-galactoside, and peonidin 3-O-glucoside chloride. Apart from procyanidins, the predominant anthocyanins were cyanidin and its glycoside-derived compounds. Among the anthocyanin compounds discovered in fruit peel, two were found at particularly high concentrations in S4, suggesting that these two compounds [cyanidin O-syringic acid and pelargonidin 3-O-beta-d-glucoside (callistephin chloride)] contribute substantially the mechanism underlying fruit peel coloration.
Overview of transcriptome sequencing
To assess the gene expression profile of the fruit peel at S1–S4, RNAs from 12 C. humilis fruit samples were used for RNA-seq with three replicates per stage. Before using Illumina platform to read the cDNA libraries, the low-quality reads and adapter sequences were deleted. The clean reads for each sample ranged from 40,704,832 to 47,000,394, and a total of 541,781,582 clean reads were obtained. The Q30 values > 92%, meeting the criterion for gene discovery. Based on filtered clean data, the full-length transcript sequence was assembled using Trinity, and the total number of assembled transcripts was 176,109, with a mean length of 1340.05 nt and N50 length of 2,227 nt. These were assembled into 73,035 unigenes with a mean length of 1,018 nt and N50 length of 1,917 nt (Additional file 2: Table S2).
Annotation and identification of unigenes
All assembled unigenes and ORFs were annotated using Trinotate against the public databases NT, NR, UniProt, RNAMMER, eggNOG, GO, and KEGG. Using the Trinotate annotation results, genes were subjected to a GO analysis. According to the GO database, 73,035 unigenes are divided into 60 functional groups,, in the three main functional categories (biological process, cellular component, and molecular function) (Additional file 6: Figure S1). In the biological process category, 42,236 (57.83%) unigenes were associated with metabolic processes.
The assembled unigenes were searched through the COG (Cluster of Orthologous Group, KOG) protein database to evaluate potential functions. By a KOG analysis, 25,127 unigenes were clustered into 25 functional categories (Additional file 7: Figure S2). Among them, general function prediction only (Class R, 5,338 unigenes) was most common, accounting for 21.24% of unigenes. Class R is a category for which only a general prediction regarding protein function is feasible (e.g., biochemical activity). Most of the COGs could be assigned to a well-defined functional category. However, the fact that the largest category remains unspecific in nature indicates insufficient information within the COG database. The next largest categories were signal transduction mechanisms (Class T, 3,071 unigenes, 12.22%), posttranslational modification, protein turnover, chaperones (Class O, 2,593 unigenes, 10.32%), and translation, ribosomal structure and biogenesis (Class J, 1,111 unigenes, 7.02%). The smallest group was cell motility (Class N, 39 unigenes, 0.02%). The COG and GO annotations showed that unigenes expressed in the C. humilis fruit peel encode various proteins related to metabolism.
To gain more insight into the transcriptomic differences, we compared DEGs in C. humilis fruit peel among different developmental stages. Using the DESeq2 package, normalized RPKM (Reads Per Kilobase Million Mapped Reads) values were obtained from RNA-seq data to quantify transcript expression. We identified 7,930 DEGs in comparisons among different fruit developmental stages. Among the three comparisons (S2_S1, S3_S2, and S4_S3), the largest number of DEGs was found between S3 and S2, with 4,333 DEGs, including 1,540 up-regulated and 2,793 down-regulated DEGs. In addition, there were 430 up-regulated DEGs in S2_S1, and 1,179 up-regulated DEGs in S4_S3 (Fig. 2). These results indicated that DEGs were most active during the second developmental stage of the fruit peel and were perhaps largely responsible for anthocyanin biosynthesis. Therefore, we speculate that the transition from S2 to S3 is a key stage in fruit peel development.
GO enrichment and KEGG pathway analyses of DEGs
We functionally categorized the DEGs based on GO terms. A total of 7,927 unigenes were annotated, including 429 up-regulated and 201 down-regulated unigenes in the S2_S1-enriched GO term libraries and 1,539 up-regulated and 2,794 down-regulated DEGs in the S3_S2-enriched GO term libraries. The subcategories with the highest degree of enrichment were all cell part, followed by binding and cellular process, as shown in a histogram in Additional file 8: Figure S3. Apart from this, 1,178 up-regulated and 1,786 down-regulated unigenes in the S4_S3 comparison were enriched for GO terms, particularly cell part, followed by cellular process, and binding.
To classify the DEGs based on related pathways, a KEGG enrichment analysis was performed. In total, 125 biosynthetic and metabolic pathways were enriched in the pairwise comparisons of fruit peel developmental stages. The flavonoid biosynthetic pathway began to be enriched (ko00941, 10 genes) in S3_S2, and the anthocyanin biosynthetic pathway began to be enriched (ko00942, 2 genes) in S4_S3. We obtained 12 DEGs across the three comparisons related to pigment synthesis, suggesting that these genes are associated with fruit peel coloration (Additional file 9: Figure S4).
The changes of TFs in the process of peel coloring
Next, we focused on differentially expressed TFs based on our transcriptome data. The MYB and bHLH TF families were the most abundant TF families during the fruit coloring process, followed by the B3, AP2/ERF, and NAC families (Table 1), many of these were expressed before samples began to show color. Additionally, the TFs were mostly up-regulated before the color change and down-regulated after the color change. Notably, the MYB and bHLH TF families are important transcription factors for anthocyanin transcriptional regulation, and 68% of bHLHs and 70% of MYB genes were down-regulated. Interestingly, we found that 2 MYB genes (TRINITY_DN26515_c0_g1 and TRINITY_DN21536_c0_g1) were significantly up-regulated after the fruit peel color changes, and their expression levels were consistent with the pattern of fruit peel pigment accumulation, indicating that they might be key regulators of fruit peel color.
Table 1
Differentially expressed transcription factors (TFs) in fruit peel
DEG set | TF family | Number of DEGs | Up-regulated | Downregulated | Description |
S2_S1 | ARF/ERF | 20 | 12 | 8 | Ethylene-responsive transcription factor |
| B3 | 12 | 7 | 5 | B3 DNA-binding domain |
| bHLH | 22 | 11 | 11 | Helix-loop-helix DNA-binding domain |
| bzip | 8 | 5 | 3 | Basic region leucine zipper |
| C2H2 | 8 | 5 | 3 | Zinc fifinger C2H2 domain-containing protein |
| C3H | 9 | 7 | 2 | Zinc fifinger CCCH domain-containing protein |
| DBB | 8 | 7 | 1 | double B-box zinc finger protein |
| G2-like | 11 | 5 | 6 | Golden2-Like protein |
| GRAS | 8 | 7 | 1 | C-terminal GRAS domain |
| HSF | 22 | 18 | 4 | Heat stress transcription factor |
| MYB | 28 | 20 | 8 | MYB-related protein |
| NAC | 15 | 14 | 1 | NAC domain-containing protein |
| S1Fa-like | 12 | 6 | 6 | Leucine-rich repeat protein |
| other TFs | 104 | 56 | 48 | |
| total | 287 | 180 | 107 | |
S3_S2 | ARF/ERF | 98 | 24 | 74 | Ethylene-responsive transcription factor |
| B3 | 121 | 27 | 94 | B3 DNA-binding domain |
| bHLH | 152 | 53 | 99 | Helix-loop-helix DNA-binding domain |
| bzip | 62 | 25 | 37 | Basic region leucine zipper |
| C2H2 | 76 | 27 | 49 | Zinc fifinger C2H2 domain-containing protein |
| C3H | 52 | 17 | 35 | Zinc fifinger CCCH domain-containing protein |
| DBB | 17 | 3 | 14 | double B-box zinc finger protein |
| G2-like | 67 | 22 | 45 | Golden2-Like protein |
| GRAS | 47 | 14 | 33 | C-terminal GRAS domain |
| HSF | 64 | 14 | 50 | Heat stress transcription factor |
| MYB | 159 | 48 | 111 | MYB-related protein |
| NAC | 92 | 25 | 67 | NAC domain-containing protein |
| S1Fa-like | 32 | 7 | 25 | Leucine-rich repeat protein |
| other TFs | 717 | 252 | 465 | |
| total | 1756 | 558 | 1198 | |
S4_S3 | ARF/ERF | 78 | 25 | 53 | Ethylene-responsive transcription factor |
| B3 | 91 | 32 | 59 | B3 DNA-binding domain |
| bHLH | 116 | 33 | 83 | Helix-loop-helix DNA-binding domain |
| bzip | 55 | 17 | 38 | Basic region leucine zipper |
| C2H2 | 47 | 13 | 34 | Zinc fifinger C2H2 domain-containing protein |
| C3H | 39 | 13 | 26 | Zinc fifinger CCCH domain-containing protein |
| DBB | 6 | 3 | 3 | double B-box zinc finger protein |
| G2-like | 40 | 17 | 23 | Golden2-Like protein |
| GRAS | 30 | 15 | 15 | C-terminal GRAS domain |
| HSF | 39 | 25 | 14 | Heat stress transcription factor |
| MYB | 112 | 32 | 80 | MYB-related protein |
| NAC | 74 | 32 | 42 | NAC domain-containing protein |
| S1Fa-like | 43 | 4 | 39 | Leucine-rich repeat protein |
| other TFs | 484 | 160 | 324 | |
| total | 1254 | 421 | 833 | |
Notes: q < 0.05 and | log2 (ratio) | ≥ 1 were set as the threshold values for significantly different expression. |
Analysis of unigenes related to anthocyanidin biosynthetic pathways in the fruit peel
Previous studies have suggested that the anthocyanin biosynthetic pathway is an important metabolic branch of the flavonoid pathway, responsible for the production of anthocyanins in different plant tissues. Thus, we evaluated the mechanisms underlying red pigmentation in fruit peel based on a comparative transcriptome analysis. The compositions of compounds in anthocyanin biosynthesis pathways differed significantly depending on the growth stage (Fig. 3). The PAL gene regulates the synthesis of cinnamic acid and consistently showed a gradual decrease over the four growth stages. Seven 4CL genes exhibited significant differences in expression levels, including 2 up-regulated and 5 down-regulated genes. The CHS gene is the first key enzyme in the flavonoid pathway, which regulates the synthesis of chalcone; it consistently showed a gradual increase over the four growth stages. The expression levels of CHI, F3H, DFR, and ANS were significantly up-regulated from S1 to S4, and the expression levels were positively correlated with the accumulation of anthocyanin. UDP-glucose: flavonoid-3-O-glycosyltranferase (UFGT), which mainly catalyzes the transformation of unstable anthocyanin glycosides into stable anthocyanin. By using a transcriptomic analysis, two UFGT genes (TRINITY_DN23815_c1_g1 and TRINITY_DN19893_c1_g5) involved in the anthocyanin biosynthesis pathway were screened (Fig. 3). In order to test the accuracy of the transcriptome data, we selected 12 anthocyanin synthesis genes and 2 MYB genes for qRT-PCR verification. The results showed that the expression trend of qRT-PCR was consistent with the transcriptome data, indicating that the transcriptome data has high reliability (Fig. 4).
Integrated analysis of the transcriptome and metabolome
By transcriptome and metabolome analyses, we constructed a co-expression network to identify interactions (Fig. 5). We found that 11 anthocyanin biosynthesis DEGs and 11 differential metabolites exhibited interactions. Highly positive correlations were obtained among most differential metabolites, and only pmb0542 (cyanidin 3-O-malonylhexoside) was minimally relevant to all other differential metabolites. The correlation analysis between DEGs and differential metabolites also showed that levels of 5 DEGs (TRINITY_DN15939_c0_g1, TRINITY_DN20784_c0_g2, TRINITY_DN21962_c0_g2, TRINITY_DN23815_c1_g1, and TRINITY_DN26276_c0_g1) were positively correlated with all differential metabolites, while the other 6 DEGs (TRINITY_DN17945_c0_g2, TRINITY_DN21141_c0_g1, TRINITY_DN25788_c2_g8, TRINITY_DN22885_c0_g4, TRINITY_DN25099_c1_g5, and TRINITY_DN24108_c0_g2) were negatively correlated with all differential metabolites (Fig. 5a, Additional file 4: Table S4). In addition, the combined metabolome and transcriptome analysis revealed a significant positive correlation between both ANS (TRINITY_DN15939_c0_g1) and UFGT (TRINITY_DN23815_c1_g1) (PCC > 0.82) and most metabolites (Fig. 5b). These data indicated that expression patterns of ANS and UFGT have decisive roles in color development, and their regulatory relationships need to be further studied.