Identification of UGT gene family members in walnut
In order to identify JrUGT, UGT proteins in walnut genome database were screened by using HMM file (PF00201) as a query. The screening criterion was set at an e-value < 1. We employed two screening methods, namely BLASTP and hmmsearch, resulting in the identification of 174 potential UGT genes in walnut. To verify the presence of conserved domains, TBtools was employed, and any UGT proteins lacking a PSPG box were manually excluded. Ultimately, 124 JrUGT genes were obtained, and designated as JrUGT1-124.
The 124 JrUGTs genes encoded proteins with diverse physicochemical properties (Table S3). Each member of the JrUGT family had a protein length ranging from 216 to 611 aa, with an average length of 474 aa. The theoretical molecular weight of the protein ranges from 29.29 to 61.50 KDa, with an average value of 52.90 KDa. The theoretical isoelectric point (pI) ranges from 5.17 to 8.1, with an average pI of 5.85. According to the PSORT II Prediction online tool for subcellular localization prediction, majority of the UGT proteins in walnut (78.23%) were localized in the cytoplasm, followed by mitochondria (12.10%). Additionally, eleven UGT proteins were found in the endoplasmic reticulum and one protein was located extracellularly within the cell wall.
Phylogenetic and chromosomal localization analysis
The classification and phylogenetic relationship of UGT proteins in walnut were investigated by constructing a phylogenetic tree based on UGT protein sequences from walnut, Arabidopsis thaliana, Zea mays, and Pilosella officinarum. The main clusters of walnut UGT family members consisted of 18 subgroups, with a total of 124 JrUGTs being classified into these previously identified subgroups (Fig. 1). Notably, JrUGTs were absent from groups F and Q, while the majority of them clustered within groups E (25), G (16), D (19), L (15), and A (15). It has been reported that AtUGT71B1, AtUGT72B1, and AtUGT88A1 of Arabidopsis were involved in flavonoid biosynthesis. Sequence analysis found JrUGT4, JrUGT13, JrUGT31, JrUGT32, JrUGT43, JrUGT44, JrUGT45, JrUGT57, JrUGT58, JrUGT59 JrUGT60, JrUGT61, JrUGT62, JrUGT67, JrUGT68, JrUGT69, JrUGT80, JrUGT81, JrUGT93, JrUGT108, JrUGT111, JrUGT114, JrUGT117, JrUGT118, JrUGT121 clustered with AtUGT71B1, AtUGT72B1, and AtUGT88A1, suggesting that these JrUGT may be involved in flavonoid biosynthesis in walnut.
Among the 124 JrUGTs identified, 124 JrUGTs were located on 16 chromosomes of walnut (Fig. 2). Among the 16 walnut chromosomes, chromosome 9 contained only one UGT family member, while chromosome 1 contains 17 UGT genes. Chromosome 7 contained 15 UGT genes, chromosome 3 contained 13 UGT genes, and chromosomes 2 and 15 each contained only 2 UGT genes. This unbalanced distribution of UGT gene in walnut on chromosome indicates that there was genetic variation in walnut during evolution.
Gene replication and collinearity analysis
Gene replication events play a pivotal role in the formation of gene families. In order to elucidate the expansion and evolution mechanism of UGT gene family in walnut, the potential gene replication events in walnut genome were further investigated. The detection of UGT gene replication events in walnut was performed using Tbtools software. A total of five genes located on chromosomes 1, 2, 4, 9, 10, and 14 were identified to undergo four gene replication events (Fig. 3). Notably, chromosomes 1 and 10 exhibited the highest frequency of tandem repeat events (three occurrences each), implying that UGT genes might have originated from such replication events which could be considered as key drivers for UGT evolution. Furthermore, by comparing the DNA sequence similarity between the UGT gene of walnut and the homologous genes of Arabidopsis thaliana (Fig. 4), a collinear relationship between 39 walnut genes and 36 Arabidopsis genes was found. These conserved genes are likely to possess crucial functions across different species.
Conserved sequence and gene structure analysis
To further elucidate the conserved domain characteristics of the walnut UGT family, 10 motifs were generated using the online tool MEME and then numbered from 1 to 10. Notably, motif 1 and motif 3 corresponded to the highly conserved PSPG box within the UGT family. The distribution pattern of these motifs among different types of walnut UGT members was depicted (Fig. 5). Remarkably, our findings revealed that members belonging to the same group exhibited either identical or similar conserved motifs. The characteristic sequence motif 1–3, present in all walnut UGT proteins, was considered to be the glycosyltransferase recognition site for glycosyl-donor. With a few exceptions, most walnut UGT proteins exhibited the following characteristics: motif 6 was located proximal to motif 2, and motif 5 was also positioned near motif 8. In the majority of sequences, motif 4 appeared at the beginning while motif 7 occured at the end. The protein's sequence typically follows this pattern: motif 4-5-8-6-2-10-1-3-9-7; however, variations existed among certain proteins.
The diversity in intron-exon structure often plays a pivotal role in the evolutionary dynamics of gene families and provides supplementary evidence to support phylogenetic classifications. In order to further understand the gene structure, the intron-exon structure of UGT gene in walnut was analyzed. Out of the 124 UGT genes identified in this study, 47 contained a single intron, 70 were devoid of any introns, while 7 harbored two introns.
Expression patterns of UGT genes
The transcriptomic sequencing results of the kernel pellicle of two walnut varieties, ‘NH1’ and ‘FH4’, at different developmental stages were analyzed in this study to gain further insights into the expression pattern of UGT genes in walnut. The findings revealed that out of 118 UGT genes analyzed in walnut, JrUGT2, JrUGT11, JrUGT68, JrUGT113, JrUGT119 and JrUGT121 were not detected in the transcriptome data. A total of 19 JrUGTs (FPKM > 10) exhibited high transcription levels in ‘NH1’ and ‘FH4’, respectively (Fig. S1). The expression profiles of JrUGT39, JrUGT58, JrUGT75, and JrUGT118 were consistently elevated across all three developmental stages in both varieties, suggesting their potential involvement in diverse biological processes throughout growth and development. Group E, which represents the largest subset of the UGT gene family in walnut, exhibited predominant expression of UGT during the hard core stage in ‘NH1’ and ‘FH4’. Additionally, some genes showed higher expression levels in the mature stage compared to those observed in he hard core stage and the fatty stage. Similarly, Group G also displayed peak expression during the hard core stage with greater abundance detected in ‘FH4’ than ‘NH1’. Most UGT genes demonstrated elevated transcript levels during early fruit development stages but decreased as fruit matured.
In this study, we performed a comprehensive analysis of the walnut UGT gene family through phylogenetic tree analysis and transcriptome differential gene screening. Subsequently, 13 differentially expressed walnut UGT genes were randomly selected for validation using real-time fluorescence quantitative PCR. The qRT-PCR results were presented (Fig. 6). The gene expression levels of JrUGT39, JrUGT58, JrUGT88, and JrUGT95 were found to be higher in August, while JrUGT118 exhibited the lowest gene expression levels during this period. These findings were consistent with the transcriptome data analysis.
Analysis of phenolic substances in walnut kernel pellicle at different periods
HPLC was used to determine the content of ‘NH1’ and ‘FH4’ phenolics in walnut kernel pellicle at different periods. With the growth and development of walnut fruit, the content of phenolics also changed, and the results were shown (Table 1). The content of Gallic acid was the highest among the 11 substances and the content of GCG was the lowest. The content of C was the highest among catechins, and the content of C in ‘FH4’ was significantly higher than that of ‘NH1’. The contents of EC and GC showed a trend of continuous increase on the whole, and the contents of EGC and EGCG in the two varieties showed a trend of first increasing and then decreasing, and the content of EGC in ‘NH1’ fatty stage was significantly higher than that in other periods. The content of ‘FH4’ in syringate was higher than that of ‘NH1’ in different periods.
Correlation analysis of UGT gene expression and phenolic content in walnut
The pearson coefficient was employed to investigate the correlation between UGT gene expression and phenolic substance content in walnut, and the results were presented (Fig. 7). No significant correlation was observed between JrUGT36 and phenolic substances. Only JrUGT118 showed a positive correlation with EGCG and Chlorogenic acid (P < 0.05). JrUGT6, JrUGT38, JrUGT39, JrUGT58, JrUGT69, JrUGT75, and JrUGT82 exhibited positive correlations with Vanillic acid (P < 0.01). Additionally, JrUGT38, JrUGT39, JrUGT67, JrUGT69, JrUGT75, and JrUGT82 were positively correlated with C, GC, and EC. These findings suggest that these identified UGTs may play a role in the biosynthesis of phenolic substances in walnut; thus providing valuable insights for further study on this topic.