2.1 Flowering induction of pitaya by supplementary light treatment in the winter season
These findings showed that sufficient supplementary light can induce pitaya flowering if the minimum air temperature is ≥ 15 °C. In this supplementary light experiment, NL represents the control group (no light, no flowering) (Fig. 1A), L0 represents no flowering in the light-treated group (Fig. 1B), L1 represents the flower bud stage in the light-treated group (Fig. 1C), and L2 represents 1 week after the bud stage in the light-treated group (Fig. 1D). About 20~25 days after the buds appeared, the pitaya flowers were in full bloom at night (Fig. 1E), signifying the success of this flower induction system with respect to pitaya fruit production in the winter season.
Figure 1 Supplementary light experiment. A: NL represents the control group (no flowering under no light); B: L0 represents the no-flowering stage in the light treatment group; C: L1 represents the flower bud stage after light treatment; D: L2 represents 1 week post bud stage under light treatment; E: flowering of pitaya induced by light treatment in the winter season.
2.2 Sequencing, de novo assembly, and annotation
To understand the underlying molecular events involved in the response to light-induced flowering, we sent pitaya samples from the NL, L0, L1, and L2 stages to the Gene Denovo Biotechnology Co. (Guangzhou, China) for de novo assembly of RNA-seq data. More than 39 Gb raw data reads were retrieved. The high-quality clean reads number of the three biological replicates for the NL sample were 45444076 (97.78%), 42546349 (97.75%), and 42956302 (97.82%); the clean reads for the L0 sample were 35518498 (97.83%), 38856464 (97.71%), and 49626346 (98.11%); the clean reads for the L1 sample were 42262290 (97.78%), 52312394 (97.94%), and 49023546 (98.01%); and the clean reads for the L2 sample were 49899592 (98.17%), 48213826 (98.08%), and 45191594 (98%) (Additional file 1). These findings indicated that the proportion of high-quality clean reads after filtration of each sample was relatively high, signifying that the sequencing quality was good. The de novo assembly resulted in a total of 60580077 bases, but only 68113 unigenes (not less than 200 nt) with an N50 of 1730 nt (Table 1). The average length of our assembly was 889 nt, of which the maximum length of unigene was 36165 nt, the minimum length 201 nt, and the GC (guanine-cytosine) percentage 41.96%. Figure 2 showed the scatter plot of size distribution of the unigenes. In order to search for homologous sequences, annotation analysis was performed on the unigenes using four public databases: KOG, KEGG, Swissprot, and NR. In total, 29959 unigenes (removing duplicates) were found in at least one of these databases (Table 2). There were 29782 unigenes with functional annotations in the NR database, 20716 annotations in SwissProt, 18088 annotations in KOG, and 11059 annotations in KEGG. We also observed that 8618 unigenes were annotated by all four major databases (Fig. 3). At the same time, we found only 6220, 75, 33, and 32 unique annotations in the NR, Swissprot, KOG, and KEGG databases, respectively.
Table 1 Results of assembly
Figure 2 Length distribution of all unigenes.
Table 2 Results of annotations in four databases
Figure 3 Venn diagram of the four database annotations.
2.3 Global analysis of differential expression profiling
The abundance of each gene was determined by counting reads per kilobase per million reads (RPKM) to infer the expression level. Therefore, this method could be used to directly compare the difference in gene expression among samples. The correlation between gene expression levels among the samples was a key criterion in determining whether the experiments were reliable and whether the samples chosen were suitable. If one sample had a high degree of similarity to another, the correlation value between them would be very close to 1. We calculated the correlation values between samples based on the RPKM results. According to the standard recommended by the ENCODE (Encyclopedia of DNA Elements) project[15], the square of the correlation value should be ≥0.92 (under ideal experimental conditions and with suitable samples). The heatmap of correlations for these samples is shown in Figure 4. From the figure, we can see that the coefficient between the three repetitions for the NL stage was found to be above 0.98, and the correlation between the three repetitions for the L0 stage was above 0.96. Similarly, the correlation between the three repetitions for the L1 and L2 stages was found to be above 0.99 and 0.97, respectively, signifying that the levels of expression between all the biological replicates were highly correlated. In other words, we found a low level of correlation between the different samples, indicating that the difference between these four samples was significant.
Figure 4 Relationship analysis of the samples. NL: stage of no light; L0: stage of no flowering after light treatment; L1: flower bud stage after light treatment; L2: 1 week post bud stage under light treatment.
In our study, comparison between different samples resulted in different numbers of DEGs (Fig 5). When NL and L0 were compared, we found only 51 up-regulated genes and 88 down-regulated genes, totaling 139 DEGs. A total of 12352 DEGs were obtained when we compared NL and L1, of which 8640 genes were up-regulated and 3712 genes were down-regulated. In other words, compared with NL, the number of DEGs in the L1 sample was found to be about nine times that in the L0 sample, where the number of up-regulated genes was 169 times that of L1 and the number of down-regulated genes was 41 times higher than that of L1. Similar results were found in other comparison groups. Gene ontology (GO) functional enrichment analysis was applied in our subsequent study in order to annotate the expression patterns of the DEGs to the chosen GO terms: molecular function, cellular component, and biological process. Since a gene often has multiple different functions, the same gene will appear under different terms and each histogram is statistically independent of any others. In our research, 50 GO terms relating to molecular function, 30 GO terms relating to cellular component, and 158 GO terms relating to biological process were enriched with respect to the L0 stage, as compared to the NL stage, after light treatment (Additional file 2,3,4). Under the same light treatment, DEGs associated with the L1 stage were enriched by 399, 162, and 1042 GO terms relating to molecular function, cellular component, and biological process, respectively (Additional file 5,6,7). Moreover, findings regarding the GO analysis of other groups were similar to those described above. Most importantly, we found that the main enriched terms of all groups were the same regardless of ontologies. In particular, ‘cellular component’ mainly related to the cell part, membrane, and organelle; ‘molecular function’ mainly related to binding, catalytic activity, and transport activity; and ‘biological process’ mainly related to cellular, metabolic, single-organism transport, suggesting that DEGs primarily enriched in those terms may be associated with induction of pitaya blossoms. To further investigate the gene expression profiles, we made use of KEGG pathway analysis to determine some of the DEGs involved in important biochemical metabolic and signal transduction pathways during light-induced flowering of pitaya. For example, two auxin response factors (unigene0028447, unigene0002811) were up-regulated, two genes related to jasmonic acid (unigene0030164, unigene0033693) were up-regulated, and five genes related to starch and sucrose metabolism (unigene0052631, unigene0036015, unigene0040385, unigene0041472, unigene0035017) were also up-regulated.
Figure 5 Gene statistics for DEGs in different samples. NL: stage of no light; L0: stage of no flowering after light treatment; L1: flower bud stage after light treatment; L2: 1 week post bud stage under light treatment.
2.4 Energy-related genes in the process of flowering induction
During our investigation of light-induced pitaya flowering, we focused on the differentially expressed genes in the NL, L0, and L1 stages. It is well known that plant flowering is a normal physiological process. Before a plant flowers, some essential organic nutrients accumulate in the plant so as to accommodate the significant uptake of nutrients during flowering. Soluble sugar and soluble protein are important nutrients that are closely associated with plant flowering. Large increases in ATP concentration initiated by changes in environmental conditions, especially by different kinds of light treatments, have already been observed[16]. Our analysis found that two up-regulated genes, unigene 0046195 (beta-glucosidase BoGH3B-like) and unigene 0014135 (ATP synthase), in the Krebs cycle of glucose metabolism, were involved in the L0 stage, but not the NL stage. Moreover, the expression levels of eight genes (unigene0006689, unigene0031088, unigene0044754, unigene0009228, unigene0045801, unigene0047216, unigene0050048, unigene0053090) involved in sucrose synthesis in L1 were found to have increased, and three of these genes were up-regulated in L1, but not in L0. This finding, which relates to energy metabolism, demonstrates that large amounts of sucrose are needed as an energy supply during the flowering process.
2.5 Plant hormone and signal transduction related genes
Three types of phytohormone—auxin, jasmonic acid (JA), and the brassinosteroids (BRs)—have been individually connected to floral timing, though they have not been extensively studied. These hormones can be transported as signal molecules in plants, which in turn produce some signal transduction cascades that direct a series of metabolic activities in plants.
Genetic evidence has revealed that in Arabidopsis, the polar transport of auxin controls flower formation and differentiation[17]. Genes regulating floral organ development and gynoecium vascularization have been discovered, indicating the probable involvement of auxins in flower development. In our study (Fig 6 and Additional file 8), we found an auxin-related gene (unigene 0029163) that was significantly up-regulated in L0, but not in NL. Twenty-three genes were significantly expressed in L1, of which two genes (unigene 0027380, unigene 0030569) were down-regulated and 21 genes were up-regulated. Studies have reported that in Arabidopsis thaliana, the BR biosynthesis-deficient det2 mutant is delayed flowering by at least 10 days compared to the wild type. The level of endogenous BR is lower 10% than that of the wild type[18]. Compared with NL, only one up-regulated gene was expressed in L0 (unigene0026249), but six up-regulated DEGs (unigene0004470, unigene0025032, unigene0030389, unigene0038596, unigene0035058, unigene0042473) were expressed in L1. At the same time, compared with L0, eight up-regulated genes involved in the brassinosteroid metabolism pathway were expressed in L1. Generally speaking, JA is known to activate transcription factors (TFs) that trigger a large-scale process of various abiotic and biotic stresses, but also plays an important role in regulating flower opening. Jasmonate-ZIM domain (JAZ) proteins have been shown to regulate the level of JA, to counteract flower abscission in Nicotiana attenuata plants[19]. In our own study, we found one gene that was down-regulated (unigene 0021973) in L0, and four genes that were up-regulated (unigene 0030164, unigene 0032865, unigene 0034754, unigene 0049406) in L1. In addition, only one up-regulated gene (unigene0033693) was expressed in L1, but not in L0.
Figure 6 Heatmap of the DEGs associated with various plant hormones. NL: stage of no light; L0: stage of no flowering after light treatment; L1: flower bud stage after light treatment; L2: stage of flowering 1 week after light treatment.
2.6 CO and FT or other directly regulating DEGs
In plants, seasonal changes in day length are perceived in leaves, which initiate long-distance signaling that induces flowering at the shoot apex. In Arabidopsis, FLOWERING LOCUS T (FT) is now known as florigen, which is crucial for the proper timing of flowering, and CONSTANS (CO) is a key player in the activation of FT expression. In other words, FT acts partially downstream of CONSTANS (CO), which promotes flowering in response to long days. In this study (Fig. 7 and Additional file 9), relative to NL, in the light treatment group L0, one gene—the CONSTANS-related gene (unigene 0025406)—was found to be significantly down-regulated, and the FT gene was not significantly expressed. In comparison with NL, 13 CONSTANS-LIKE genes were significantly differentially expressed in L1, among which three genes were up-regulated (unigene002346462, unigene0027129, unigene0035118) and 10 genes were down-regulated (unigene0025406, unigene0027332, unigene0028118, unigene0029386, unigene0030366, unigene0034031, unigene0034048, unigene0035131, unigene0043102, unigene0050191). At the same time, two upregulated FT genes (unigene0001751, unigene0005649) were significantly differentially expressed. One FT gene (unigene0025148) was down-regulated, and six FT genes (unigene0002836, unigene0006372, unigene0025249, unigene0038103, unigene0042376, unigene0045680) were up-regulated in L1, but not in L0.
In addition to these well-known CO and FT genes, certain genes are closely related to the regulation of plant flowering processes. For example, the protein encoded by Hd3a, a rice ortholog of FT, moves from the leaf to the shoot apical meristem and induces flowering in rice, suggesting that the Hd3a protein may be the rice florigen[20]. In our analysis, an HD3a-like gene (unigene0019970), whose expression decreased under light treatment, was detected in L0. Maybe the HD3a-like gene was up-regulated during the rice flowering period but down-regulated during the pitaya flowering period.
The LEAFY gene is an important element in the transition from the vegetative to the reproductive phase, as LEAFY is both necessary and sufficient to initiate the growth of individual flowers. On long days, Arabidopsis plants flower soon after germination, in parallel with rapid up-regulation of LEAFY[21]. In this research, we found one LEAFY homolog gene (unigene0024655) to be significantly up-regulated in L1. We also found two genes encoding flowering time control protein to be involved in L1, and FPA (unigene0045587) to be down-regulated, while the reverse was true for the expression of FCA (unigene0049096). One flowering-promoting factor 1-like gene (unigene0023129) was up-regulated in L1.
Light is one of the clock entrainment signals and can induce a plant’s circadian rhythm, acting as a bridge in the signaling network between the environmental stimulation and its internal processes. According to KEGG pathway analysis, some DEGs are associated with circadian rhythm in pitaya. One gene encoding Arabidopsis pseudo response regulator (APRR5)-like protein (unigene0045107) was found to be down-regulated in L0, but not in NL. Two down-regulated circadian clock associated 1-like genes (unigene0036436, unigene0049229), two down-regulated genes encoding EARLY FLOWERING 4-like protein (unigene0022870, unigene0039817), and an up-regulated gene encoding EARLY FLOWERING 1-like protein unigene0043786 were expressed in L1, but not in NL.
Figure 7 Heatmap of the DEGs associated with CO and FT. NL: stage of no light; L0: stage of no flowering after light treatment; L1: flower bud stage after light treatment; L2: 1 week post bud stage after light treatment.
2.7 Transcription factors control the course of flowering
Cycling DoF factor (CDF) proteins are part of the larger family of plant-specific DOF (DNA binding with one finger) transcription factors, which presumably includes a single C2-C2 zinc finger, a highly conserved DNA-binding domain[22]. Several Arabidopsis DoF transcription factors—CDF1, 2, 3, and 5—are implicated in circadian rhythms and photoperiodism, and their expression at elevated levels is sufficient to repress flowering[23]. In this study (Fig. 8 and Additional file 10), we found two CDF genes (unigene0033003, unigene0031426) that were down-regulated in L0, but not in NL. Similarly, three CDF genes (unigene0031426, unigene0026114, unigene0033003) were down-regulated, and two genes (unigene0029432, unigene0006016) up-regulated, in L1.
In the development and evolution of flowering plants, the MADS domain transcription factor plays an important role because it determines the establishment of flower morphology and flower development. The MADS-box gene contains a conserved MADS-box motif and is generally classified into a type I (type I) and a type II (type II) with a conserved K-, I-, and C -domain subfamily. In general, the type II subfamily is divided into ABCDE models during the development of the flower[24]. Compared with NL, there were no significantly differentially expressed MADS-box genes in L0. In contrast, one gene encoding AGAMOUS-like MADS-box protein AGL19 (unigene0027094) was down-regulated, and the other six genes encoding AGAMOUS-like MADS-box protein (unigene0018425, unigene0021927, unigene0025394, unigene0037726, unigene 0047376, unigene 0202108) were up-regulated, in L1.
Other transcription factors that can bind to the CO promoter include the TEOSINTE BRANCHED 1/ CYCLOIDEA/ PROLIFERATING CELL NUCLEAR ANTIGEN FACTOR (TCP) transcription family[25]. Members of this family control multiple traits in diverse plant species, including flower and petal asymmetry[26]. We discovered no DEG-related TCP TFs in L0, while all six genes encoding TCP (unigene0005880, unigene0009351, unigene0025734, unigene0028812, unigene0030793, unigene0033768) were up-regulated in L1, but not in NL.
Figure 8 Heatmap of the three main TFs involved in pitaya flowering. NL: stage of no light; L0: stage of no flowering after light treatment; L1: flower bud stage after light treatment; L2:1 week post bud stage after light treatment.
2.8 Validation of the selected DEGs by quantitative RT-PCR analysis
We consider the control of floral development genes or transcription factors that regulate downstream genes to be more important than other aspects of plant flowering processes. One cyclic DoF factor (CDF) transcription factor gene in each of the L0 and L1 samples was detected by quantitative RT-PCR. One gene encoding response regulator-like APRR5 related to circadian rhythm involved in course in L0 and two genes encoding circadian clock associated protein involved in L1 were selected to aid the detection of mRNA levels by quantitative RT-PCR. One and two CONSTANS-LIKE (CO) genes in L0 and L1, respectively, were chosen for real-time quantitative PCR. Furthermore, one flowering locus T (FT) gene and its ortholog encoding HEADING DATE 3A-like (HD3a-like), involved in L1 and L0 respectively, were detected by quantitative RT-PCR. In addition, two TCP TF genes and a gene encoding phytochrome B in L1 were also selected for quantitative RT-PCR (Table 1). The UBQ gene was used as a reference gene[27]. The quantitative RT-PCR (qRT-PCR) results from the L0 and L1 stages are shown in Figures 9 and 10. The expression profiles of all 13 detected genes revealed a similar trend and consistent results between qRT-PCR and RNA-seq. Three DEGs of L0 displayed significant down-regulation in plants under light conditions compared with NL, including unigene0031426 (cycling DoF factor 1), unigene0025406 (zinc finger protein CONSTANS-LIKE 2), and unigene0019970 (HEADING DATE 3A-like); however, unigene0045107 (response regulator-like APRR5) that concerning circadian clock revealed no remarkable decrease in its mRNA level. In the L1 stage, all nine DEGs showed significant differential expression through qRT-PCR. Two genes exhibiting functional homology to circadian clock associated 1 in the L1 stage showed a significant reduction in transcript levels under light treatment. A similar situation was found for two genes encoding TCP transcription factor, for which the mRNA abundance was substantially enhanced in the L1 sample following light treatment. Interestingly, two CO genes revealed an opposite expression in the L1 stage: one was up-regulated and in the other the mRNA level was reduced. This finding indicates that different CO gene families may be subjected to different types of regulation, resulting in opposite types of expression. In addition, CDF3-like and PHYB were found to be down-regulated, while FT displayed a higher mRNA level during L1 light treatment than NL, as determined by qRT-PCR.
Figure 9 Relative expression of the floral development genes or transcription factors involved in the flowering process for the NL and L0 samples was determined by quantitative RT-PCR analyses. Expression was normalized to that of UBQ. The transcript levels from the NL sample were set at 1. The blue columns represent the expression levels determined by qRT-PCR (left axis), while the lines represent the gene expression levels determined by RNA-seq (right axis). The transcript abundances of the genes encoding cyclic DoF factor 1 protein (A), zinc finger protein CONSTANS-LIKE 2 (B), HEADING DATE 3A-like (C), and response regulator-like APRR5 (D) were determined and compared across the time course of light-induced treatment. Data represent means ± SD for the three replicates (n=3). The lowercase letters above the columns indicate a significant difference at p≤0.05 between the columns by T test using SPSS statistical software. Data in columns with the same letters showed no significant difference (p>0.05).
Figure 10. Relative expression of the floral development genes or transcription factors involved in the flowering process for the NL and L1 samples was determined by quantitative RT-PCR analyses. Expression was normalized to that of UBQ. The transcript levels from the NL sample were set at 1. The blue or red columns represent the expression levels determined by qRT-PCR (left axis), while the lines represent the gene expression levels determined by RNA-seq (right axis). The transcript abundances of the genes encoding cyclic DoF factor 3-like protein (A), CONSTANS-LIKE 15 (B), CONSTANS-LIKE 4 (C), circadian clock associated1 (D, E), flowering locus T (F), phytochrome B (G), transcription factor TCP4 (H), and transcription factor TCP15 (I) were determined and compared across the time course of light-induced treatment. Data represent means ± SD for the three replicates (n=3). The lowercase letters above the columns indicate a significant difference at p≤0.05 between the columns by T test using SPSS statistical software. Data in columns with the same letters showed no significant difference (p>0.05).
2.9 Tissue specific expression of key genes
In this study, in order to further explore whether the genes associated with the flowering process of Pitaya are tissue-specific, different growth stages of stem, flower and fruit in Figure 11 were selected for specific analysis. We selected CDF-3 like(Unigene0026114) and CO-15 like(Unigene0035118) for tissue-specific analysis in Figure 12. The results showed that CDF-3 like had a certain expression level in various tissues and various periods, while its expression was increased in pistils and styles and highest in small green peel compared with in the old stem. The expression level of CO-15 like in flowers and fruits is higher than that of old stems, and the specificity is not obvious.
Figure 11 Different growth periods and tissues of pitaya.1: old stem,2:young stem,3: small flower bud,4: large flower bud, 4A: pistil,4B: stamen,4C: style,4D: flower scales,5: small green fruit, 6: big green fruit, 7: red pitaya, 8E: small green peel, 8F: small green flesh, 9: big green fruit, 9G: big green peel, 9H: big green flesh,10: red fruit, 10I: red peel, 10J: red flesh
Figure 12 The result of CDF-3 like(Unigene0026114) and CO-15 like(Unigene0035118) tissue-specific expression. Expression was normalized to that of UBQ. The transcript levels from the old stem were set at 1.