Protein domains are spatially discrete segments, are considered as functional and/or structural units, and form basis of protein family classification. Being structurally and functionally important, protein domains are of considerable importance from an evolutionary perspective (i) because of their utility in identifying homologs present in different species; (ii) in investigations regarding protein evolution including proteins with permutation and combination of different domains; and (iii) understanding structure-function relationship. It is increasingly evident that functional conservation and divergence across protein orthologs and paralogs are governed by retention or loss of domains (Jakubec et al., 2018; Ponting and Russell, 2002). Novel domains emerge from existing repertoire of genes via different mechanisms, are responsible for structural and functional novelty, and emergence of new gene families (Dangwal et al., 2013; Malik et al., 2012; Marsh and Teichmann, 2010). Ovate Family Proteins (OFPs) are plant-specific, multigene family with a conserved OVATE domain. Homologs of OFP gene family have been reported spanning from early land plants lineages to monocots and dicots (Chahar et al., 2021; Dangwal and Das, 2018; Wang et al., 2016). Even though these have been reported in several plant genomes, limited information is available about their structural and functional diversity, particularly from an evolutionary perspective. Previously, we catalogued and examined genomic features and phylogeny of OFP homologs in the genomes of the early land plants represented by Marchantia, Physcomitrella, Selaginella, and Sphagnum, (Dangwal and Das, 2018), and across the members of Brassicaceae(Chahar et al., 2021). In the present study, we identified and analyzed homologs of OFPs in selected members of grasses including Oryza sativa (31 OsOFPs), Brachypodium distachyon (32 BdOFPs), Brachypodium stacei (31 BsOFPs), Panicum hallii (30 PhOFPs), Panicum virgatum (58 PvOFPs), Setaria italica (34 SiOFPs), Setaria viridis (34 SvOFPs), Sorghum bicolor (37 SbOFPs), Triticum aestivum (105 TaOFPs) and Zea mays (43 ZmOFPs) (Fig. 2A). No direct correlation was observed between the family size numbers of OFPs and the genome-size of the species (except T. aestivum) as had been observed in Arabidopsis (Hackbusch et al., 2005), tomato (Liu et al., 2002), pepper (Tsaballa et al., 2011), apple (Li et al., 2019), banana (Zhang et al., 2020) and early land plants (Dangwal and Das, 2018). The large family size in T. aestivum can be attributed to the hexaploid nature of the genome. A recent study in wheat reported that usually group of proteins harboring specific domain or tandem repeats are more prone to duplication events than that of those proteins in which that domain or tandem repeats is absent (Mishra et al., 2018). Different recent studies also revealed that the average number of OFPs in genome of dicots was lesser than that of the monocots (Hackbusch et al., 2005; Zhang et al., 2020).
Transcripts are considered as an operational unit of a genome (Gingeras, 2007). The correlation of the transcript with that of genome sizes vis-à-vis to the number of OFPs in rice and other grasses which exhibited almost a similar size of their transcript with that of their corresponding genomic size. Yet, it differs in case of LOC_Os04g33870 in rice and in Zea mays and wheat, suggesting the absence or loss of introns (Fig. 1, 2A).
Previous studies have uncovered several interesting features about gene families. For instance, a comparative analysis across early land plants lineage showed general trend towards the reduction in average length of transcript and protein from bryophytes to pteridophyte (Dangwal and Das, 2018), and that transcript size is inversely co-related to expression levels (Caldwell et al., 2015; Smith and Eyre-Walker, 2002). The presence of intron is believed to increase regulatory and transcriptional diversity in protein families (Chorev and Carmel, 2012; Meng et al., 2021; Rose, 2019), and analysis of OFPs across angiosperms such as A. thaliana, O. sativa, Malus sps, Musa sps have revealed these to be intron poor (Li et al., 2019; Yu et al., 2015; Zhang et al., 2020). The loss- and gain- / retention-of introns are an outcome of selection pressure; functionally significant introns have been retained and fixed because of Darwinian selection, whereas introns have been lost because of lack of positive selection (Belshaw and Bensasson, 2006). An earlier report in Arabidopsis and rice suggested that subsequent to a segmental duplication, the rate of intron loss is faster than the rate of intron gain (Lin et al., 2006). In a variety of plant lineages, low rates of intron gain have been reported, with intron losses being more frequent than the gains. For instance, the rate of intron loss in A. thaliana is ~ 12.6 times more than gain, whereas in rice, the loss is ~ 9.8 times higher than gain (Roy and Penny, 2007).
Domains and motifs in proteins are the key elements that contribute to structural and functional diversity of proteins (Forslund and Sonnhammer, 2012; Moore et al., 2008). These therefore are under negative / purifying selection which and do not permit accumulation of deleterious mutations that may negatively impact protein function (Camps et al., 2007; Neduva and Russell, 2005). Intra- and intergenomic rearrangements, and loss- and gain-of-domains are responsible for generating structural and functional diversity among protein homologs (Bornberg-Bauer and Albà, 2013; Kersting et al., 2012). Presence of DNA binding domain and RPT domain in OFPs in rice and other Gramineae members suggested their functional significance. Our analysis of distribution across plant lineage revealed independent events of gain and elimination of OVATE domain, DNA binding domain and internal repeats in some OFPs during the course of plant evolution (Fig. 2B).
OFPs are known to be a plant-specific transcriptional repressors; such repressors are known to have two functional domains - a DNA-binding domain and a repressor domain (Ohta et al., 2001). Although members of protein family, and transcriptional regulators with similar functions harbour similar domain/s, changes in specific amino acid residues within the domains have also been observed possibly to provide functional plasticity (Gonzalez, 2016). RPT domains are stretches of amino acids known to be present across species and participate in protein-protein interactions during several vital biological functions such as de-ubiquitination, and amelioration of stress responses, which are linked by involvement of protein-folding through ubiquitination-de-ubiquitination cycle and stress (Jaiswal et al., 2014; Kouranti et al., 2010; Sharma and Pandey, 2016). Presumably due to the higher degree of duplication in hexaploid wheat RPT-domains are proposed to have evolved through genetic recombination and intragenic tandem duplication; subsequent frequent duplications are responsible for diversity in both sequence and number of repeats even between orthologous genes, which possibly provides functional diversification to the RPT domain containing proteins (Andrade et al., 2001a; 2001b; Ma et al., 2017; Mishra et al., 2018; Sharma and Pandey, 2016).
Analysis of structure and function of protein/s is paramount in understanding the underlying molecular mechanisms (Vandromme et al., 1996). Subcellular localization of OFPs in rice and other grasses predicted through CELLO and LOCALIZER revealed that most of the OFPs were predicted to be localized in nucleus (Fig. 3A), which is consistent with their role as transcriptional repressors (Jian-ping et al., 2012; Reynolds et al., 2013; Wang et al., 2011; Withers et al., 2012; Yu et al., 2015). Several OFPs were organelle localized which likely is evidence of these as nuclear-encoded regulators of organellar transcription. Several proteins showed multiple localizations, and also outside the nucleus (Fig. 3A), because of their roles during developmental events which has also been reported earlier (Vandromme et al., 1996; Xin et al., 2021).
The 3-D structure of proteins is critical for interaction with other biomolecules, and functionality, and is influenced by altered protein sequence as a result of evolution. The OFP from homologs across members of gramineae exhibited similarity to PDCD4 C-terminal ma-3 domain, co-type nitrile hydratase alpha subunit and MIF4G domain-like (Fig. 3B; Supplementary Table 2), features which have also been reported from lower plants, indicating structural conservation (Dangwal and Das, 2018). PDCD4 protein is comprised of the two MA3 domains and recognized as a key module in translational initiation, and have been implicated in ethylene-mediated signaling and abiotic stress responses, particularly in the higher plants (Cheng et al., 2013; Lei et al., 2011). Co-type nitrile hydratases represent a group of metalloenzymes which are associated to catalyze the production of ammonia and organic acids using nitrile group containing compounds as substrate, and postulated to have been acquired by the eukaryotes through lateral gene transfer (Marron et al., 2012). Proteins containing MIF4G domain are reported to be similar to PDCD4 C-terminal MA3 domain, and are involved in translational initiation (Virgili et al., 2013). Besides these major classes, several unique domains were also observed such as, alpha-hemoglobin stabilizing protein (AHSP) in Setaria italica (Si.9G558100), phospho-ribosyl formyl glycinamidine synthase, hemoglobin-binding protease (hbp) (Si.5G128600); RNA-polymerase-binding protein RBPA (Pv.Db02258); and synaptonemal complex central element protein 3 (Pv.Ib02475) (Supplementary Table 2). The presence of different types of domains in homologs of OFP-gene family provides an opportunity to investigate relationship between structural and functional diversity in protein families; the origin of structural variants in the OFP gene family still remains to be investigated (Hackbusch et al., 2005; Liu et al., 2002; Schmitz et al., 2015).
Spatio-temporal expression analysis of the O. sativa OFP members across the eleven vegetative and reproductive stages using qRT-PCR, and in-silico analysis of the available microarray data of different vegetative and reproductive stages indicate their distinctive role in specific tissue and/or stage. Most of OFPs are differentially expressed, predominantly at reproductive stages including at early or mature panicle stage, or, young or mature seed stages; few of the members exhibited upregulation in vegetative stages (Fig. 9). Putative protein interactive network analysis of OFPs in O. sativa showed putative interactions with homeobox protein knotted 1 or homeobox domain containing protein which play key roles in regulating several traits of plant growth and development, hormone signaling (Fig. 10) (Chan et al., 1998; Jain et al., 2008; Kamiya et al., 2003; Sakamoto et al., 1999). Indirect evidences based on expression analysis of OFPs, and protein interaction network prediction indicating a coordinated mode of action between homologs of OFP and the homeobox protein KNOTTED1 needs to be experimentally validated.
Three-dimensional conformation of a protein is decided by its constituent amino acids which in turn determines molecular weight, isoelectric point (pI), hydrophobicity / hydrophilicity and influences structural variation (Aftabuddin and Kundu, 2007; Brown et al., 2010; Mishra et al., 2018). Several reports have investigated the role of mass and charge of protein to the spatio-temporal dynamics and interaction pattern; similarly, charge and polarity are indispensable for solubility of a protein, and ligand proximity (Mohanta et al., 2019; Xu et al., 2013). Indeed, a report also examined the correlation between pI, size and molecular mass of proteomes, and taxonomy and ecological niche of organisms (Kiraga et al., 2007). Analysis of native disorderness by RAPID exhibited a range of ~ 19–48% disorder in rice OFPs, while other grasses showed ~ 12–66% of disorderness (Fig. 4D). Concerning the composition and function of proteins, two different theories have developed. The 'structure-function model', which holds that a protein must have its natural three-dimensional structure under physiological conditions in order to function. The other theory is the recently developed "disorder-function model," which is based on proteins that carry out cellular tasks under physiological settings without achieving a stable three-dimensional structure (Trivedi and Nagarajaram, 2022). However, these proteins are implicated in several biological processes such as signaling, regulation of gene expression, cell-cycle regulation, amelioration of stress and many more regardless of their lack of any unique structure (He et al., 2009; Radivojac et al., 2007; Vucetic et al., 2007). Our earlier study of the OFPs from early land plants also showed that all these physicochemical parameters play a decisive role in species-specific diversity either individually or collectively, and are likely to be evolutionarily conserved (Dangwal and Das, 2018).
In conclusion, the present study provides a comprehensive cataloguing of the OFP gene family across the genome of ten-selected species of Gramineae. A thorough analyses of various features revealed variability in copy number, gene and protein structure, presence of introns, and domain composition suggesting probable functional divergence. Phylogenetic reconstruction indicates that the members of Gramineae inherited the entire OFP family from their last common ancestors as all species harbour atleast one copy of the homologs, and lineage-specific expansions were rare. Spatio-temporal expression analysis revealed differential transcript abundance across the developmental stages, with highest steady-state levels during reproductive stages in O. sativa. Prediction of interactome showed homeo-domain containing proteins as major interacting partners of majority of OsOFPs. The present study, to the best of our knowledge, thus provides a comprehensive collation of Gramineae OFPs with an exhaustive comparative analysis that will form the framework for evo-devo studies of multigene family proteins and understanding function and cross-species comparison, and identify candidates for functional analysis for crop improvement purposes.