UC genes identification and Taxonomic Distribution Analysis
To systematically examine the phylogenetic distribution of UC genes in kinds of kingdom, we performed the local BLAST and HMM search using proteome sequences of 50 genomes, combined with the online search verification. The detailed taxonomic distribution was listed in Figure. 1 and Table S1. UC genes appear to have an ancient origin as evidenced by their presence in Prokaryotes, including Archaea and bacteria. Among the 50 species investigated, completed UC genes are present in most SAR species (except Alveolata species, Plasmodium falciparum and Babesia bovis, Oomycetes Phytophthora infestans), vertebrates, Streptophyta, fungi, two of Archaea (Natrinema pellirubrum and Haloterrigena tukmenica), three of bacteria (Deinococcus maricopensis, Marinithermus hydrothermalis and Bacillus subtilis), but absent in invertebrates, green and red algae, cyanobacteria, two of Archaea (Hyperthermus butylicus and Sulfolobus solfataricus), three of bacteria (Treponema brennaborense, Mycobacterium tuberculosis, and Escherichia coli).
CPS catalyzes the first step for both pyrimidine and arginine/urea biosynthesis. During the BLAST search, we found that two CPS copies exist in SAR, vertebrates, red algae and fungi. Both the fused protein and the separated large and small subunits were obtained. SAR and vertebrates have two copies of fused CPS genes uCPS and pCPS while red algae and fungi have one separated and one fused genes. The small amidotransferase subunit (carA, 1—376 amino acid residues in Ectocarpus siliculosus uCPS) and the large synthase subunit (carB, 392—1445 residues in E. siliculosus uCPS) fused into one gene. Likewise, the two subunits of pCPS fused together in SAR and metazoan. For metazoan, fungi and red algae, the pCPS also fused with PyrB domain, which encodes the catalytic chain of aspartate carbamoyltransferase involved in pyrimidine biosynthesis. Invertebrate species have only one fused CPS. Only one copy of CPS occurred in green plants, Archaea and Bacteria, excepted one gram positive bacterium Bacillus, which has two CPS. Notably, CPS in plants and prokaryotes are all composed of separated large and small subunit, represented the ancient state of CPS genes.
The other three UC members, OTC, ASS, and ASL generally appeared as single copy in all species, except several protists and lower metazoan lacking them. OTC is absent in the invertebrates surveyed. Alveolata species P. falciparum and B. bovis lack all the three inter—enzymes. The issue on last critical enzyme ARG seems to be complicated. Firstly, ARG are completely absent in all red algae and green algae investigated. SAR, except the Alveolata species, have one ARG. The number of ARG in Prokaryotes ranged from 0 to 2. Duplicated ARG are also found in metazoan, fungi, and higher plants. In addition to the five enzymes in UC, we also retrieved other UC related enzymes, including arginine decarboxylase (ADC), agmatine iminohydrolase (AIH, also known as agmatine deininase), N—carbamoylputrescine amidohydrolase (NCPAH) and agmatinase (AGM), all of which initiate the alternate pathway in urea production using arginine as substrate. Ornithine cyclodeaminase (OCD) and ornithine decarboxylase (ODC) are two enzymes catalyzing the formation of proline and polyamine from ornithine. ADC is only found in green plants and bacteria. AGM exist in SAR, cyanobacteria, Archaea, fungi, vertebrates, and some proteobacteria. OCD are only found in brown algae, diatoms, and Archaea, while absent in other groups. ODC exist in most organisms, but absent in Arabidopsis, Zostera marina and Spirodela polyrhiza. Despite that we performed comprehensive BLAST and HMM search, we cannot rule out the possibility that the genes were missed during the genome assembly.
Phylogenetic events of the five enzymes
To further understand the evolutionary events that gave rise to the UC pathway, we constructed the phylogenetic trees using the protein sequences of the five enzymes in 50 species ranging from prokaryotes to eukaryotes. The topology of all the trees was generally consistent with the results reported by Horák et al. (2020). Of the five UC genes, the most intricate evolution events were seen in CPS enzyme. As this enzyme composed of two subunits and undergone fusion or division in different species, in order to make accurate comparison, we performed sequences splice and concatenation before tree construction. Then the trees were constructed using complete CPS sequences, just the small subunit, and just the large subunit, respectively. Trees constructed using only small subunit did not give well resolution, while the trees with complete sequences and large subunits sequences have the same topology and complete sequences tree showed higher bootstrap support (Figure. 2A, Figure. S1 and S2). The tree was divided into two large clades, one comprising green plant, red algae and prokaryotes, while the other clade comprising archaea, fungi, red algae, SAR and metazoa. Green plant, together with prokaryotes were positioned at a primary place, suggested the ancestral heterodimeric CPS enzymes. The gene duplication event occurred in the common ancestor of SAR and metazoa. The duplicated gene passed through gene fusion and sub—functionalization, formed pyrimidine specific and arginine/urea specific enzymes. Gene loss has frequently occurred in the species absent of UC, such as invertebrates, in which only one pyrimidine specific CPS was retained.
The phylogenetic tree of the three enzymes OTC, ASS and ASL were shown in Figure. 2B, C, and D and Figure. S3, S4 and S5. Generally, prokaryotes in these trees are located at more primary place, suggesting an ancient origination. Interestingly, many of the enzymes, as well as CPS of Stramenopiles species are animal—like, that is, they have closer relationship with metazoans but not green plant or red algae, indicating they are possibly derived from the heterotrophic secondary host. The similar conclusion was drawn by Horák et al. (2020) which found that the stramenopiles—metazoan/opisthokont grouped together.
The tree of the last enzyme, ARG, exhibited an unanticipated picture (Figure. 3A). Genes from SAR, fungi and matazoan clustered together, then grouped with prokaryotes. While ARG from green plants distantly related to them. When added the AGM into the tree, ARG of green plants robustly form a clade with AGM of prokaryotes. When taken the sequence identity into account, the identity within ARG clade, AGM clade ranged from 23.81%—90.32% and 25.64%—90.84%, respectively, while the identity between them ranged from 20.37%—63.88%, suggesting ARG and AGM are two different enzymes or duplicated paralogs and the ARG of green plants are more homologous with AGM. Consistently, when we explored the ADC, which cooperates with AGM to complete the catabolism of arginine, they were only found in green plants (no green algae) and bacteria (Figure. 3B), further indicating that green plants use the ancient ADC and AGM route to perform arginine metabolism.
Selection pressure results
We evaluated the selection pressure of the five enzymes in the above mentioned species through calculating ω (dn/ds). We found that the ω values of all the organisms were far less than 1 (0.0027<ω<0.0355), suggesting that these enzymes have undergone purifying selection during evolution (Figure. 4). Furthermore, the ω value in Stramenoplies is generally higher than other organisms, indicating the relatively relax pressure. Considering that the ratio averaged over all lineages is almost never >1, since positive selection is unlikely to affect all sites over prolonged time, we further performed the site model to detect the positive selection sites. Most enzymes in Stramenoplies have several positive selection sites, suggesting the episodic positive selection might occur during the evolution of UC enzymes in Stramenoplies. To further compare among the five enzymes, we calculated the pairwise dn/ds, dn, and ds values within Stramenoplies. Dn/ds and dn values of ARG are significantly higher than other enzymes, suggesting this enzyme is under more relaxed pressure during evolution. The ds value is much higher than that of dn, and is similar among different genes, therefore ds was suggested to be used as a molecular clock for evaluating the evolutionary time. When comparing the ds values, we found that both pCPS and uCPS are larger than the other four enzymes, indicating CPS is a more ancient enzyme.
Expression of UC—related enzymes in model plant Arabidopsis
To further understand the potential role of OUC in green plants, we investigated the expression of UC members and related enzymes in different tissues and stress conditions of the plant model Arabidopsis (Figure. 5). Generally, the enzymes were differentially expressed in ten tissues, whereas some enzymes showed tissue—specific expression patterns. ASL was only highly expressed in seedling, and ARGAH1 are just up—regulated in old leaf and flower. ARGAH2 was up—regulated in many tissues but down—regulated in root and stem. On the other hand, the completed UC genes are highly expressed in seedlings, indicating the active arginine and urea metabolism in seedling plants. As for the expression profile in stressed conditions, we found that the first four UC genes are not stress responsive in most conditions, except ASS, which is up—regulated under light and hormone stimuli. However, ARG was obviously influenced by biotic or abiotic stresses, since it was up—regulated under salt, drought, wounding, hormone, pathogen infection or light stimuli, suggesting arginine or ornithine as a precursor for polyamine play important roles in defense response. Notably, as one important arginine catabolism pathway, ADCs are up—regulated in seedlings and most stress conditions, but AIH and NCPAH not. On the contrary, ADC exhibited similar expression profiles with ARG. Taken together, ARG and ADC enzymes play significant roles in both seedling development and stresses response.
Protein interaction network associated with UC enzymes
To further evaluate the UC function in different organisms, we predicted the PPI networks associated with the ARG enzymes. Among the organisms examined, we obtained the networks of diatoms T. pseudonana, plant model Arabidopsis, and animal model Mus musculus (Figure. 6 and Figure. S6). In the network of T. pseudonana, ARG was associated with ten proteins, two of which (OTC, ASL) are members of UC. Other proteins are involved in UC derived pathways, such as URE, which degraded urea. ODC, OCD, and Spermine synthase are also connected, which participate in polyamine and proline synthesis. Arabidopsis ARGAH1 and ARGAH2 genes show different networks. For ARGAH1, two UC members are involved in, such as OTC and ASL. URE is also directly connected with ARG. Besides, ADC1 and ADC2, which synthesize agmnine from arginine, are involved in. Compared with ARGAH1, ARGAH2 interacted with diverse proteins that participated in wider range of biological processes, such as spermidine synthesis, xylem specification, lysine synthesis, as well as transcription factors interaction. This indicated that diverged paralogs of ARG in Arabidopsis has undergone neo—functionalization and developed into various functions. On the other hand, ARG in M. musculus interacted with several NO synthase apart from the UC—related enzymes. Despite it is connected with ADC, we noted that ADC in animal actually act as an antizyme inhibitor and prevents degradation of ODC. More detailed description about the interacted proteins was included in Table S2.