Genes of house-keeping nature are highly conserved within the genomes of the members of a kingdom and evolve slowly as compared to tisuue-specific genes (Zhang and Li 2004). Plant genomes have also retained such genes that play key developmental roles and are important for the survival of plants despite domestication pressures (Meyer and Purugganan 2013; Wendel et al. 2016). Such sequences can be used for the analysis of relationships among the different genera. Within the kingdom Plantae, sequences belonging to nuclear, mitochondrial and chloroplastic genomes have assisted in the studies of molecular systematics and DNA barcoding with the help of their conserved and variable sites. The inferred relationships are considered successful if the clades thus formed are supported with high values of probability and the morphological evidences of similarity (Orozco-Sifuentes et al. 2023). Over time, it has been found that a combination of such conserved regions from the nuclear (18S rDNA) and chloroplastic genomes (rbcL, matK (megakaryocyte-associated tyrosine kinase) genes and atpB-rbcL intergenic spacer) give a substantially good approximation of their divergence of species over time. The importance of nuclear genes cannot be ruled out as it is bi-parentally inherited while the chloroplastic genome is maternally inherited. Due to the maternal inheritance of chloroplastic genome and hybridization at times, the inferred phylogenies using a chloroplast region alone may not be true. The support data provided through nuclear genes thus enhances the resolution and internal support for studying plant evolutionary pathways. (Morton 2011).
Both rcd1 and its component domain wwe sequences are from the nuclear genome that have important regulatory roles and have shown conservation of several important sites within the plant families (Jaspers et al. 2010a; Mur et al. 2006; Siddiqua et al. 2016). To depict this fact on the level of molecular phylogeny, the trees we obtained using wwe sequences from the monocot family (Brassicaceae) and Monocot family (Poacea) were compared to the trees obtained using sequences of rbcL gene and atpβ-rbcL intergenic spacer sequences. All of these trees were well-resolved with the monocot family Poaceae and dicot family Brassicaceae forming highly supported independent clades. Thus showing support for the fact that the wwe also evolve uniformly over time, as does the popular rbcL gene and atpβ-rbcL intergenic spacer that are widely used in the phylogenetic reconstruction studies. The universal tree-based comparison, where sequences from divergent genera of plant kingdom were combined, also showed the wwe effectively resolving the plant families Brassicaceae, Fabaceae, Solanaceae, Asteraceae and Poaceae into strongly supported independent clades. Same clades were also resolved by the rbcL gene based trees, strengthening the point of view that the wwe sequence being of an appropriately small length and easily amplified can be used for the assessment of relationships amongst plant genera at sub-familial level. (Siddiqua et al. 2016). It was also noted that when a sequence from Aborella tricopoda was included in the analysis, it was separated from the angiospermae clade and was placed near Physcomitrella patens which was used as outgroup in the current analyses. Looking at the scale of time, this separation may be justified with the help of evolutionary data that suggests Amborella trichopoda as an ancestral plant to other angiosperms (Soltis and Soltis 2013). It might also be linked to its divergence from the modern plants that have evolved over time through the practises of selective breeding. The plant species Physcomitrella patens has been used as outgroup for training data in the phylogenetic reconstructions, as it belongs to the division Bryophyta, which is believed to be one of the first land plants. Hence, it is then sufficiently divergent from other plants used in these analyses (Cuming 2011).
Evolutionary patters observed at the level of rbcL showing A. thaliana at a separate lineage to Brassica genus is most appropriate when making assessment with regard to the morphological data (He et al. 2021). The Brassica genus in the wwe and atpB-rbcL intergenic spacer based trees is scattered on two branches, however, it forms an independent clade using the rbcL gene sequences. “Triology of U” explains relationships between Brassica species in terms of evolution, where B. nigra (genome designated as BB) and B. oleracea (CC) are the ancestors of B. carinata (designated BBCC), while B. nigra (BB) and B. rapa (AA) are ancestors of B. junccea (AABB) and B. rapa (AA) and B. oleracea (CC) are ancestors of B. napus (AACC) (Nagaharu 1935). The wwe based tree places AACC and AA in close proximity and shows them evolving at a separate incident to A. thaliana. In this same category, the atpB-rbcL Intergenic Spacer based tree shows BB evolved with A. thaliana at a separate incident and places BBCC at great distance from BB, which is not in accordance to the actual historic data. On this level, the best interpretation of the actual relationship comes from the rbcL-based tree, followed by the wwe-based tree. Based on the triology of U, where the genomes of B. nigra (BB) and B. oleracea (CC) were combined to give rise to the hybrid variety B. carinata (BBCC) genome of 17 chromosomes (Murat et al. 2015; Negaharu 1935). This when represented on the level of a tree should ideally put BBCC close to BB and CC genomes. On the level of wwe based tree, BBCC is indeed placed between BB and CC genomes, while BB is at the upper position and CC is at lower position. On the other hand, rbcL based tree shows the genomes BB and BBCC evolved independently at a separate lineage while CC evolved separately and placed the genomes AA and AABB on the two sides of the CC genome. At this level, we find wwe-based tree better at resolving the relationship among the species of Brassica genus in comparison to the rbcL and atpB-rbcL intergenic spacer based trees. However, the evolutionary pattern of AACC being a hybrid of AA and CC is best represented by the rbcL based tree, followed by the wwe-based tree. The rbcL based tree places these three genomes in close proximity, while the wwe-based tree shows the BB genome in a separate lineage to the genome of its hybrid AACC. The third hybridization event, of the evolution of B. juncea from the crossing of B. nigra and B. rapa is best represented by the atpB-rbcL Intergenic Spacer based tree, where these three are placed in a close proximity. On the level of the monocot family Poaceae, the relationship of S. bicolor and Z. mays is best described by the wwe-based tree where they are grouped together in a clade, while H. vulgare, T. aestivum and O. sativa form a separate clade (Fig 8). This is in accordance to the data gathered and analysed using a combination of different marker genes by grass phylogeny working group (Barker et al. 2001). The rbcL and atpβ-rbcL intergenic spacer have grouped S. bicolor with S. officinarum on a separate clade, where they don’t resolve S. bicolor as a sister taxa to the Z. mays (Fig 2 and 3).
Analysis performed using universal trees, where candidate rcd1 sequences from all the plant taxa were combined, has revealed that both rcd1 and its wwe domain have evolved uniformly over the course of time. Both candidates have grouped the individual plant taxa into the clades represented by their families. These plant families have historically been formed by the plant taxonomists after studying the details of each individual plant’s morphology. Among these sequences, the wwe-based tree gives a clear picture of a speciation event that separated the monocots and dicots into separate branches of life. This dichotomy is apparent in the angiosperms as dicots bear fruits with two cotyledons while monocots have a single cotyledon in their fruits. Clearly, the plants belonging to monocot and dicot groups are distinct from each other by the morphology of their seeds (with dicots bearing two cotyledons and monocots bearing single cotyledon). Here rcd1 based tree gave a polytomy, where a lot of branches were present giving an ambiguity of relationship among two major groups of angiosperms. Taking in account the large size of the whole rcd1 sequences (of the order of kb) and this apparent polytomy, it is considered better to use the wwe as a marker in the phylogenetic reconstructions. wwe has a smaller size (in the order of several bp), conserved status across the kingdom Plantae, low copy number and a prediction efficiency comparable to the rbcL and atpβ-rbcL intergenic spacer sequences. Another disambiguate was with a member of the family Asteraceae, Lactuca sativa, which was present within the Asteraceae clade in the wwe-based tree, however, it was found within the Solanaceae clade in case of the rcd1 tree. The rcd1 based tree also clusters Nicotiana species away from each other in the Solanaceae clade, whereas, the wwe tree groups the members of the sub-families Nicotianoideae and Solanoideae within sub-clades, giving a better approximation of their morphological relationship. This suggests wwe to be more appropriate for the phylogenetic analysis than complete gene sequences, as the complete gene sequence might be long enough to harbour several informatively misleading sites. Hence, this research is suggestive of the fact that the wwe can be effectively used as a candidate with the combination of different marker genes for making high-precision assessment of phylogenetic relationships at both familial as well as sub-familial levels.