Identification and chromosomal distribution of TaDGK family in wheat
To identify the members of TaDGK family, the DGK sequences from other plants, such as Arabidopsis and rice, were used to conduct local BLAST against wheat genome databases. Furthermore, the keyword and protein domain searchers were also executed. Ultimately, a total of 20 putative wheat diacylglycerol kinase (TaDGK) genes were identified (Table 1 and S1). All 20 genes were distributed almost evenly along the 14 wheat chromosomes (Chr), with the exception of Chr 4 (Fig. 1). In hexaploid wheat, homologous genes coming from the A, B, and D subgenomes respectively, were deemed as homoalleles of a single ancestral TaDGK gene that arose from a polyploidization event during genome evolution. Every TaDGK gene exhibited its own orthology among three diploid relatives, with the exception of TaDGK7A, which might have been lost throughout the course of asymmetric subgenome evolution. To determine when its homoalleles were lost during evolution, we blasted the genomes of ancestral species of modern wheat, including Triticum urartu (with an AA diploid genome), Aegilops tauschii (DD diploid genome), and Triticum dicoccoides (AABB tetraploid genome). Notably, we found orthologs in the A subgenomes of Triticum urartu (AA) and Triticum dicoccoides (AABB), which suggests TaDGK7A was lost after the allohexaploidy event.
Taking into account their chromosomal locations, 20 TaDGK genes were identified as TaDGK1A/B/D, TaDGK2A/B/D, TaDGK3A/B/D, TaDGK4A/B/D, TaDGK5A/B/D, TaDGK6A/B/D, and TaDGK7B/D, respectively (Fig. 1). The ORFs of these genes were 1,467–2,172 bp, encoding polypeptides of 488–723 amino acids, with predicted molecular weights of 54.25–80.33 kD (Table 1). Their theoretical isoelectric point (Pi) values ranged from 5.79 to 9.04 (Table 1).
The nucleotide and amino acid sequences of each gene are shown in supplementary Table S1.
Phylogenetic analysis of TaDGK genes
A phylogenetic analysis was conducted according to the protein sequences of Arabidopsis, rice, apple, soybean, and maize. The obtained unrooted phylogenetic tree confirmed that these DGK genes were grouped into clusters I, II, and III. TaDGKs were distributed as follows: TaDGK1A/B/D, TaDGK4A/B/D, and TaDGK5A/B/D were found in cluster I; TaDGK6A/B/D were found in cluster II; TaDGK2A/B/D, TaDGK3A/B/D, and TaDGK7B/D were found in cluster III (Fig. 2). The phylogenetic tree also revealed that the TaDGKs were more aptly classed with DGKs from the monocots rice and maize than those from the dicots Arabidopsis and soybean, and TaDGK2/TaDGK7 formed a clear paralogous pair.
Protein domains and sequence characterization of TaDGK genes
By utilizing IBS, a schematic diagram was developed for the protein domains in all TaDGKs (Fig. 3). This diagram demostrated each TaDGK harbored a diacylglycerol kinase catalytic domain (DGKc) (PF00781) and one accessory domain (DGKa) (PF00609), and demonstrates that the domains of TaDGK have different distributions based on the conservation of the macro protein domains throughout the evolution of all three clusters and. Furthermore, all the TaDGK genes belonged to cluster I contained two C1 domains (PF00130), i.e., the trans-membrane domain and the DAG/phorbol ester (PE)-binding domain.
To make sure that the conserved sequences in TaDGKs, we performed multiple alignment of the domains of TaDGK for cluster I in three phases, the DGKc domain (Fig. 4A) and each of the two C1 domains (Fig. 4B, 4C, respectively). The alignment revealed that all TaDGKs, like AtDGKs, possessed a conserved DGKc domain containing a putatived ATP-binding site with a GXGXXG consensus sequence (the red box in Fig. 4A) [23]. As in other studied plants, TaDGKs have the classical generalized structure as seen in other studied plants (Fig.5). Additionally, the two C1 domains harbor the sequences HX14CX2CX16–22 CX2CX4HX2CX7C and HX18CX2CX16CX2CX4HX2CX11C, respectively (Fig. 5) [11, 13, 14]. By sequence alignment, we found the upstream basic region and extCRD-like domain were substantially conserved, with only slight variation: in the basic region, conserved KA residues were replaced by KVin TaDGK1 (A/B/D), and the flanking residue V of extCRD-like was replaced by L in TaDGK4A (Fig. 5).
Structures and protein motifs of TaDGK genes
Structural analysis was performed to obtain some valuable information about duplication events of gene families in the form of phylogenetic relationships. The exon-intron distributions of TaDGK genes were analyzed using Evolview, which showed that genes in the same Cluster were highly similar, especially the ones with closer evolutionary relationships. All of the TaDGKs in clusters II and III had 12 exons, while those in cluster I had 7 exons (Fig. 6A, B). The exons of genes within the same cluster showed extraordinary conservation in order and size. Among genes in the same cluster, not only homologs across different wheat chromosomes, but also rice DGK genes had very similar exon-intron structures (Fig. S1). This result showed the orthology of DGK genes across different plant species and suggested that TaDGKs have undergone gene duplications throughout their evolution.
However, we found the fifth introns of TaDGK7B and TaDGK7B were much longer than those of the others homologs, with lengths of 14,797 and 8,325 bp, respectively. The sequences of their second introns were blasted against the NCBI protein database. Interestingly, we found that the introns of TaDGK7B and TaDGK7D contained partial retrotransposon protein sequences (ABF96702.1 and XP_017609491.1, respectively), which suggests the formation of the second intron in TaDGK7 occurred through the insertion of transposons as potential controlling elements [24].
MEME analysis revealed 15 distinct motifs in the TaDGK family (Fig. 6A, C). Three copies of each TaDGK member presented the same motif compositions, and genes belonging to the identical cluster had similar motif compositions. Five motifs—namely 1, 8, 10, 12, and 14—were shared among all TaDGKs. Meanwhile, the motifs 4, 5, 11, and 15 were specific protein motifs to cluster I, and motif 13 was specific to cluster III. The motif 6 existed in clusters II and III. All the sequence logos for these motifs are showed in Fig. S2.
Cis-acting elements in the promoter of TaDGKs
Transcription factors regulate the target genes expression by binding to cis-regulatory elements [25]. TaDGK promoters, within 1500 bp upstream of the transcription start site, were analyzed to identify the putative cis-regulatory elements, using the PLACE and PlantCARE databases. CAAT-box and TATA-box elements were overrepresented among all 20 TaDGK promoters (Fig. 7). Moreover, we selected some representative components for subsequent investigation of expression. Thus, the cis-acting elements could be classified into several groups according to abiotic stress responsiveness (water, dehydration, and temperature), biotic stress responsiveness (disease and pathogens), responses to plant hormones (ethylene, auxin, abscisic acid [ABA], gibberellic acid [GA], and salicylic acid [SA]), and metabolic processes (GA biosynthesis) (Fig. 7 and Table S2). In addition, almost all TaDGKs contain MYBCORE (water stress), MYB1AT (dehydration-responsive), and ASF1MOTIFCAMV (abiotic and biotic stress) elements (Fig. 7), which suggests that TaDGKs mediate stress responses in wheat.
Expression profiles of TaDGK in various tissues
We executed a microarray-based expression pattern analysis of TaDGK genes using public datasets from the wheat gene expression database hosted by the Triticeae Multi-omics Center. All TaDGK genes members were determined to have some level of tissue-specific expression, and none were constitutively expressed in all investigated tissues (Fig. 8A and Table S3). TaDGK6A/B/D, TaDGK2A/B/D, TaDGK4A/B/D, and TaDGK5A/B/D showed high expression levels in roots. TaDGK7B/D showed high expression in spikes. TaDGK1A/B/D and TaDGK3A/B/D showed high expression in grain. Almost all TaDGK genes showed low expression in leaves.
Real-time PCR was also carried out to investigate the expression patterns of some specific TaDGKs—namely TaDGK2A, 3A, 4B, and 5A—in various organs. According to microarray data, TaDGK2A and 3A, which belong to cluster II, were highly expressed in roots and granules, respectively. TaDGK4B and 5A, which both belong to cluster I, were also highly expressed in roots. Our results showed that TaDGK2A, 4B, and 5A had strong expression levels in roots, while TaDGK3A was mostly highly expressed in stems and spikes. The expression of TaDGK5 was much lower in most tissues/organs and indeed almost undetectable in tissues other than leaves (Fig. 8B). These results suggest that TaDGK2/4/5/6 appear to be involved in root growth and development, while TaDGK1/3/7 may be related to grain development.
TaDGK expression patterns under salinity and drought stress
The promoters of almost all TaDGKs were enriched for abiotic stress responsive elements, strongly suggesting the potential functions of TaDGK genes in responses to salinity or drought stress. Accordingly, we tested the expression patterns of TaDGKs at the transcriptional level and found that expression of TaDGK genes was induced under stress. After only 10 min of salt treatment, the mRNA abundance of four tested TaDGK genes—TaDGK2A/3A/4B/5A—increased rapidly, with about 2-fold higher expression than controls (at 0 h). Three genes—TaDGK2A/4B/5A—that were highly expressed in the roots were significantly induced after 12 h, increasing by 25-, 18-, and 22-fold respectively, with subsequent gradual down-regulations of expression (Fig. 9A).
We also performed real-time PCR to obtain insights into expression patterns of TaDGKs under drought stress. TaDGK2A/3A/4B/5A were all also induced at 30 min, by approximately 2.5- and 4.5-fold, respectively, with the highest expression (8–32-fold increases) observed after 12 h of stress treatment. In contrast to the expression of TaDGK induced by salt treatment, the transcript level of TaDGk3, which is higher in leaves, was increased most strongly under drought stress, by a factor of up to 30, while the other TaDGKs were relatively less induced (Fig. 9B). The control gene TaDREB2encoding a transcription factors, with major roles in dealing with abiotic stresses, has been demostrated to be induced under drought stress and salt stress [26-27].
Subcellular localization of TaDGKs
Using the online prediction tool WoLF PSORT, subcellular localization of TaDGK expression was predicted. All cluster I TaDGKs have a trans-membrane region and were predicted to be distributed among multiple cellular organelles, though mainly within the nucleus and chloroplast. The cluster II TaDGKs TaDGK6A/B/D were mainly predicted to be localized to the chloroplast and cytoplasm. All TaDGK2/3/7 members were mainly predicted in the nucleus and cytoplasm (Table S4).
We selected TaDGK2A and TaDGK3A proteins for empirically assessing their predicted subcellular localization. Accordingly, TaDGK2A and TaDGK3A proteins fused a N-terminal GFP tag, were expressed in tobacco leaves (Fig. S3). Notably, the confocal microscopy results were consistent with the predicted subcellular localizations (Table S4). TaDGK2A was indeed expressed in the nucleus and cytoplasm. TaDGK3A, however, was mostly expressed in the cytoplasm based on confocal microscopy (Fig. 10), though its accumulation was predicted to be highest in the nucleus (Table S4).