Identification, characterization and phylogenetic analysis of rice DUF247 proteins among monocot plants
A total of 179 protein sequence entries with a predicted DUF247 domain were found after searching the pubic database through NCBI and Phytozome website with domain number PF03140. To eliminate redundancies, sequences encoding the same proteins and non-representative transcripts were excluded from the analysis. The remaining protein sequences were then screened to ensure the presence of a complete DUF247 domain by HMMER, SMART, CDD, and Pfam. After removing non-DUF247 domain proteins and those with estimated E-values above 1x10− 10, we identified 69 non-redundant DUF247 genes in the rice genome (MUS7.0). Detailed information regarding the identified DUF247 domains is provided in Supplementary Table 2.
The length of rice DUF247 proteins ranges from 75 (LOC_Os06g08760) to 711 (LOC_Os02g15430) amino acids. The conserved DUF247 domain typically contains about 66–455 amino acids, accounting for the divergent length of the DUF247 genes. The molecular weight of rice DUF247 proteins ranges from 8.68 kDa (LOC_Os06g08760) to 80.03 kDa (LOC_Os02g15430). The DUF247 genes are randomly distributed across 12 chromosomes, except for one gene (PRR), which could not be anchored onto any chromosome. A total of 1,439 pseudogenes were identified with pseudo gene models, and 12% of these genes were expressed within intergenic regions, lacking definable open reading frames. This may explain why the PRR gene could not be anchored onto any chromosome. The predicted isoelectric points of these proteins range from 5.03 (LOC_Os08g42570) to 11.2 (LOC_Os07g47520). Information about DUF247 gene family members including the positions on the chromosomes, the predicted subcellular localizations were presented in Supplementary Table 3.
To investigate the evolutionary relationships among DUF247 members in rice, we used 69 rice DUF247 proteins to construct an unrooted phylogenetic tree using MEGAX with a bootstrap of 1000. The resulting phylogenetic tree (Fig. 1A) showed that the 69 DUF247 proteins were classified into four clades, representing distinct phylogenetic lineages, each supported by a bootstrap value over 80%. Group A and C contained 14 and 15 members, respectively, while groups B and D contained 20 members each. Some genes located on the same chromosomes clustered together in phylogenetic tree, indicating that tandem duplication events may have occurred on the same chromosome.
To investigate the conservation of DUF247 gene family members among monocot species, we used Pfam (PF03140) and HMMER to search the whole genomes of other monocots, including brachypodium, sorghum, maize, and barley. A total of 57, 43, 28, and 59 non-redundant DUF247 genes were detected in their respective genomes (Supplementary Table 4). Phylogenetic analysis revealed that DUF247 proteins from rice, brachypodium, sorghum, maize, and barley could be integrated into six clades, with most of the DUF247 proteins located in clade I (Fig. 1B). The DUF247 family appears to have expanded more rapidly in rice than in other species in a genome size-independent manner.
Gene structure, conserved domain and motif analysis of DUF247 gene family
Given the importance of gene organization in the evolution of gene families (Xu et al., 2012), we determined the gene structures and phases of introns/exons in DUF247 genes by aligning genomic DNA and full-length cDNA sequences. The number of introns varied from 0 to 4 among the DUF247 genes, with 18 (26%) genes lacking introns, 36 (52%) genes containing only one intron, and more than half of the members having more than one intron (Fig. 2A). In plants, the number of introns has been shown to be related to gene expression levels, with more introns generally associated with higher expression levels. However, compact genes may lead to rapid expression when exposed to environmental conditions (Jeffares et al., 2008; Ren et al., 2006).
We identified ten motifs on the conserved domains of DUF247 proteins using MEME program, (Supplementary Table 5). Each protein contained a varying number of conserved motifs, ranging from 2 to 11, and the arrangement of motifs in genes within the same clade was almost identical. Each motif except for motif 10 was found only once in each DUF247 protein sequence. Most members of Clade A shared all ten motifs, with an arranged motif pattern of 3-9-4-2-10-5-6-1-8-7. Some members of Clades B and C shared double "9" motifs. Clade D could be further divided into three sub-groups based on their motif patterns and numbers (Fig. 2B). Sub-group D1 had the fewest number of motifs, while sub-group D2 had an incomplete motif pattern of 3-9-4-2-10-5-6-1-8-7, and sub-group D3 had an almost complete motif pattern, except for motif 10. Furthermore, gene domain analysis revealed a consistent pattern wherein genes within the same group exhibited comparable domain quantities and lengths. Specifically, all members of the DUF247 gene family possessed a solitary domain, while variations in domain length were observed among genes within the same cluster, aligning seamlessly with their corresponding phylogenetic relationships.
As observed through gene domain analysis, members within the same group of DUF247 genes shared similar domain numbers and lengths. All DUF247 gene members were found to have only one domain, and the length of these domains varied among genes within the same cluster, consistent with their phylogenetic relationships (Supplementary Table 2).
In general, closely related DUF247 proteins in adjacent clades or sub-clades of the phylogenetic tree had the same or similar motif structures. The extensive sequence diversity observed in the conserved domain suggests that domain shuffling after genome duplication may have occurred (Morgenstern and Atchley, 1999). However, as the domain range in this gene family is extensive, almost all motifs were included in the domain. This may explain why the motif structure among the four sub-groups was so similar, indicating that no domain shuffling occurred in the structure of the DUF247 protein family.
Chromosome location and syntenic analysis of DUF247 genes
The distribution of DUF247 genes on the chromosomes was uneven, with varying numbers of genes across chromosomes (Fig. 3). The number of DUF247 genes on each chromosome ranged from 1 to 15. Chromosome 8 contained the most DUF247 genes, with a total of 15, while only one gene was located on chromosome 7. These DUF247 genes were observed to be distributed on both the distal and proximal ends of chromosomes.
Gene duplication plays a crucial role in the evolution of plants. Gene family expansion and genomic evolutionary mechanisms mostly depend on gene duplication events. In this study, we identified gene duplication events in the DUF247 gene family. DUF247 gene pairs resulting from segmental and tandem duplications were marked with red lines and red arcs, respectively. As shown in Fig. 3, three tandem duplications were found on chromosomes 8, 11, and 12, accounting for about 10% of the gene family, and were indicated by bending lines. Gene pairs such as LOC_Os08g26220-LOC_Os08g26710 and LOC_Os09g09550-LOC_Os09g12840 have been duplicated several times to form more than one gene pair with the other genes. The duplication on chromosome 12 was too close to resolve their duplication order (detailed duplication information is listed in Supplementary Table 6). Subsequent synteny analysis of the DUF247 gene family revealed an extensive occurrence of over 18 segmental duplication events (Fig. 4). Notably, these duplications transpired not only within genes residing on the same chromosomes but also across different segments of the chromosomes, implying that duplication events likely served as the primary mechanism driving the expansion of the DUF247 gene family in rice. Among the identified duplicated gene pairs, there were a total of 18 segmental duplicates (86%) and 3 tandem duplicates (14%). Remarkably, within the segmental duplicates, 7 of the 18 pairs (39%) belonged to the same phylogenetic clusters. Similarly, 1 out of the 3 tandem duplicates (33%) clustered together phylogenetically. For the remaining gene pairs, close proximity on the phylogenetic tree indicated substantial similarities in their protein domains.
In addition, the synteny analysis of the DUF247 gene family revealed over 18 segmental duplication events (Fig. 4). Duplication not only occurred among genes on the same chromosomes but also among different chromosome segments, indicating that duplication events were the primary mechanism responsible for expanding the DUF247 gene family in rice. A total of 18 pairs of segmental duplicated gene pairs (86%) and 3 pairs of tandem duplicated gene pairs (14%) were identified. For the segmental duplication and tandem duplication, 7 out of 18 and 1 out of 3 pairs were from the same phylogenetic groups, respectively. The other gene pairs were very close in the phylogenetic tree, suggesting that their protein domains had significant similarities.
To gain more insight into the evolutionary constraints acting on the DUF247 gene family, we calculated the Ka/Ks ratio of DUF247 gene pairs. Most segmental and tandem duplicated DUF247 gene pairs had a Ka/Ks ratio < 1 (86%), indicating that the rice DUF247 gene family likely experienced strong purifying selective pressure during evolution. Only 3 the gene pairs had a Ka/Ks ratio > 1, suggesting that positive selection also played a role in the evolution of the DUF247 gene family (Supplementary Table 6).
To further infer the phylogenetic mechanisms of OsDUF247 family, two comparative syntenic maps of japonica rice with two smaller genomic monocot species, including brachypodium and sorghum, were constructed (Fig. 5 and Supplementary Table 7). Twelve and thirteen OsDUF247 genes showed syntenic relationship with those in brachypodium and sorghum, respectively. All genes were found to be associated with its own syntenic gene pairs or syntenic blocks between the rice and other two species. Interestingly, 11 collinear pairs were identified between rice and the other two species, which indicated that these orthologous pairs may already exist before the ancestral divergence. And the collinear pairs were in the same phylogenetic groups, take LOC_Os01g19610.1-Sobic.009G030701.1, LOC_Os02g15430.1-Sobic.004G112400.1 for examples. They all belonged to clade IA (Fig. 1B). The Ka/Ks ratios of these collinear pairs were also calculated, and the majority of orthologous DUF247 gene pairs had Ka/Ks < 1, suggesting that OsDUF247 gene family might have experienced strong purifying selective pressure during evolution. There were also 2 collinear pairs (LOC_Os01g19610.1-Sobic.009G030701.1, LOC_Os05g03972.1-Sobic.009G030701.1) with the Ka/Ks > 1, which suggested that these two pairs might experience a strong positive selection during evolution (the detailed information of collinear gene pairs were listed in Supplementary Table 7).
To gain further insight into the phylogenetic mechanisms of the OsDUF247 family, we constructed two comparative syntenic maps of japonica rice with two smaller genomic monocot species, including brachypodium and sorghum (Fig. 5 and Supplementary Table 7). Twelve and thirteen OsDUF247 genes showed a syntenic relationship with those in brachypodium and sorghum, respectively. All genes were associated with their own syntenic gene pairs or syntenic blocks between rice and the other two species. Interestingly, 11 collinear pairs were identified between rice and the other two species, indicating that these orthologous pairs may have existed before their ancestral divergence. These collinear pairs were in the same phylogenetic groups, such as LOC_Os01g19610.1-Sobic.009G030701.1 and LOC_Os02g15430.1-Sobic.004G112400.1, which both belonged to clade IA (Fig. 1B). The Ka/Ks ratios of these collinear pairs were also calculated, and the majority of the orthologous DUF247 gene pairs had a Ka/Ks ratio < 1, suggesting that the OsDUF247 gene family likely experienced strong purifying selective pressure during evolution. However, there were also two collinear pairs (LOC_Os01g19610.1-Sobic.009G030701.1 and LOC_Os05g03972.1-Sobic.009G030701.1) with a Ka/Ks ratio > 1, which suggested that these two pairs likely experienced strong positive selection during evolution (detailed information of collinear gene pairs is listed in Supplementary Table 7).
The expression pattern of DUF247 genes in rice
To examine the expression patterns of DUF247 genes in various tissues of 63 (MH63) and Nipponbare (Nip) cultivars, we analyzed Affymetrix GeneChip transcriptome data for seedlings, roots, stems, leaves, young panicles, endosperm, and stamen at different developmental stages (Wang et al., 2010). The majority of DUF247 genes had relatively low expression levels across the tested organs and tissues (Fig. 6, Supplementary Table 8). Among the 69 DUF247 genes, LOC_Os08g26220 and LOC_Os02g15500 showed no or very low expression in all tested organs, indicating that they may be pseudogenes or have special temporal and spatial expression patterns not examined in this study. Around 44 genes showed constitutive expression across all detected tissues, while some genes exhibited tissue-specific expression patterns. For example, both LOC_Os01g21650 and LOC_Os01g21670 exhibited high expression in stamen and endosperm, but low expression in leaves. Overall, most DUF247 genes from the same phylogenetic groups shared similar expression patterns. Only 13 genes expressed at higher levels (above 200) in at least one tissue (Supplementary Table 8). We also analyzed three other microarray databases (GSE6901, GSE3053, and GSE4438), which revealed that 6 (46.2%) of genes were present in GeneChips datasets, 3 (23.1%) of genes were common to all three microarrays, and 4 (30.8%) of genes were detected in two GSE databases (Fig. 7a).
To validate the transcriptome profiles of DUF247 genes in different cultivars and tissues, we selected seven genes detected in the chips for qRT-PCR analysis. The results demonstrated that expression levels varied among different tissues (Fig. 7b), and the expression patterns were also different between the indica and japonica rice cultivars (Fig. 7d). For instance, LOC_Os03g19700, LOC_Os05g04060, and LOC_Os08g26820 were highly expressed in the stem and lowly expressed in leaves in both the indica cultivar MH63 and the japonica cultivar Nip, while LOC_Os01g2167 and LOC_Os08g26850 were specifically highly expressed in the stem of MH63 but at a low level in Nip. Notably, all genes were expressed at specifically high levels in panicles in MH63 but at low levels in Nip indicating that some DUF247 genes had low transcriptional activity, although several of them exhibited tissue-specific expression.
Abiotic stress response of DUF247 genes
Our analysis of the microarray database revealed that DUF247 were inducible by salt treatment in the indica cultivar Minghui63 (Fig. 7c). To further investigate their response to abiotic stresses, we searched the SRA database of rice for transcriptome data under salt, drought, gibberellins (GA), and paclobutrazol (PB) treatments (PRJEB4672, PRJNA408068, and PRJNA272723). We found that 26 DUF247 genes were induced by these treatments (Fig. 8, Supplementary Table 9). Interestingly, some genes were significantly induced or repressed by multiple treatments. For example, LOC_Os11g33394 was significantly repressed by drought and slightly repressed by GA, but remained insensitive to salt stress. Most genes were induced by salt stress, including the seven genes detected in the microarray database such as LOC_Os03g19700, LOC_Os08g26220, LOC_Os08g26850, and LOC_Os08g26840.
To gain further insight into the functions of these genes in the evolutionary process, we investigated the response of all seven genes to different abiotic treatments using qRT-PCR (Fig. 7e). Among them, LOC_Os01g21670 and LOC_Os08g26840 were repressed by multiple treatments, while LOC_Os03g19700 was repressed by all treatments except ABA. LOC_Os09g13410 showed repression under cold, drought, heat, and salt treatments, except GA and ABA treatments. Notably, three genes (LOC_Os05g04060, LOC_Os08g26820, and LOC_Os08g26850) detected in all gene chips were significantly induced by salt treatment. Additionally, LOC_Os05g04060 was also induced by heat and ABA treatments, LOC_Os08g26820 was induced by GA treatment, and LOC_Os08g2685 was induced by heat, GA, and ABA treatments.
To further investigate the expression patterns of these three genes, qRT-PCR was performed under 200 mM NaCl treatment at different time points (Fig. 7F). The results showed that compared to the control, all three genes were significantly upregulated after two hours of salt treatment. Subsequently, their expression levels returned to normal within 48 hours of treatment. It suggested that these genes are involved in the response to salt stress and may play a role in the rice plants to saline environments.