Genome Wide Identification and Analysis of Physcomitrella patens NLP Homologues
In the present study, three genome databases (NCBI, Phytozome.v12, Phytozome.v13) and one plant TF database (iTAK) were screened to identify NLPs in Physcomitrella patens genome (Taxonomy ID: 3218) using Arabidopsis thaliana NLPs protein sequences as well as pfam accessions of RWP-RK (PF02042) and PB1 domain (PF00564) as queries. Initially, 62 sequences were obtained comprising 25 from NCBI, 24 from Phytozome, and 13 from iTAK. All the sequences and their information obtained from updated version of Phytozome (v13) were similar to those in v12 except their accession numbers. The spliced variants, repeated/redundant sequences, and short or incomplete fragments were excluded from retrieved sequences simultaneously validated through conserved domain identification. Finally, 6 PpNLPs were identified that contained both RWP-RK and PB1 domains (Table S3) and were labeled from 1 to 6 with respect to their location in chromosomes. Accession numbers of same or redundant sequences found in selected databases are enlisted in Table 1, while, the physical and chemical properties of A. thaliana and P. patens NLP gene families are summarized in Table 2.
The gene lengths, protein lengths, and molecular weights (MW) of PpNLPs were found higher than AtNLPs, however, the pI and GRAVY values of both plants were close to each other. The average gene lengths of AtNLPs and PpNLPs were found 4141 and 6471 bp, respectively. Likewise, a significant difference was observed in protein lengths of AtNLPs and PpNLPs with average of 880 and 1218 amino acids, respectively. Average MW of AtNLPs was found 97357 Kilo Daltons (KDa) while PpNLPs had average 131511 KDa MW. All the AtNLPs (except AtNLP3) and PpNLPs (except PpNLP6) had pI values below 7 indicating them as acidic proteins while AtNLP3 and PpNLP6 with pI values 8.14 and 7.30, respectively, were suggested as basic proteins. The study of sub-cellular localization of both A. thaliana and P. patens NLPs proposed them to be localized in nucleus while all NLPs from both plants showed negative GRAVY values which showed NLPs as hydrophilic proteins.
Sequence Alignment and Phylogenetic Relationship of PpNLPs Gene Family
The percent similarities of PpNLPs and AtNLPs were matched to confirm the appropriate selection as well as singularity of each identified PpNLP gene used for further analysis (Table S4). All the AtNLPs and PpNLPs shared less than 78% similarity in their protein sequences which assured the uniqueness of each gene as well as evolutionary diversity among members of PpNLP gene family. The alignment output of PpNLP gene family along with NLP gene families of Arabidopsis thaliana, Oryza sativa and Zea mays was used to construct a rooted neighbor-joining phylogenetic tree in MEGA-X v10.1.8 with default parameters and 1000 bootstrap replicates (Fig. 1). The phylogenetic evolutionary relationship among NLP gene families of selected four plants were clustered in three clades. The NLP gene family of non-vascular P. patens showed evolutionary divergence from other three vascular plants. The AtNLP8, -9, OsNLP2, -5, ZmNLP2, and − 9 were closest members in the clade of PpNLP gene family. This distribution of NLP gene families established substantial evolutionary divergence among vascular tracheophytes and non-vascular bryophytes.
Gene Structure, Consensus Motifs and Chromosomal Distribution of PpNLPs
Structural components of AtNLPs and PpNLPs were analyzed using the gene and their coding sequences. Identification of introns, exons, and UTRs in genic region (Fig. 2) shows that PpNLP2, and − 4 contains 3 exons while remaining PpNLPs possess 4 exons in each gene. The number of exons range between 4 and 6 in AtNLPs, while, AtNLP3 do not a 5’UTR.
Up to 15 consensus motifs were figured out using MEME in PpNLP proteins (Fig. 3, Table S5) compared with AtNLPs. All the sequences contained significantly conserved motifs in both A. thaliana and P. patens proteins. All the AtNLPs and all PpNLPs contained all motifs except AtNLP4, -8, and − 9 that contain 14 motifs while AtNLP3 has 11 motifs.
Appropriate localization of genes upon chromosome (Fig. 4, Table S6) revealed that 6 PpNLPs are localized on different chromosomes (Chr. 9, 12, 15, 17, 19, 22)
Identification of cis-Regulatory Elements in Promoter Regions of PpNLPs
The recognition of cis-regulatory elements in upstream promoter regions (2000 bp) is a significant approach in proposing the gene function and regulation. Three categories of cis-regulatory elements in promoter regions of both AtNLPs and PpNLPs were devised to categorize the identified cis-elements in three groups including phytohormone (PR), stress (SR), and plant growth and development (PGD), shown in table 3. Comparatively, AtNLPs possess higher number of regulatory elements than PpNLPs. Highest total number of cis-elements (87) identified in AtNLPs were responsive to phytohormones, while, total numbers of AtNLPs cis-elements responsive to SR and PGD were 45 and 46, respectively (Fig. 5). All AtNLPs contained higher number of PR cis-elements except AtNLP7 whose number of PGD responsive cis-elements were higher than SR and PR. Likewise, in PpNLPs, PpNLP4 possess higher number of PGD responsive cis-elements while remaining PpNLPs have higher number of cis-elements in PR group. The total number of PGD, SR, and PR cis-elements identified in PpNLPs are 19, 21, and 35, respectively.
Protein-Protein Interaction of PpNLPs
The interacting NLP proteins networks were predicted online through STRING (Table S7). All the PpNLP proteins were suggested to interact with plethora of N related genes. Among them, 10 genes were commonly interacting with all PpNLP proteins. Most of these 10 genes are un-annotated predicted proteins, however, three NIA: nitrate reductases genes (PP1S58_252V6.1, PP1S58_249V6.1, and PP1S79_76V6.2) have been identified as significant putative N related genes interacting with PpNLPs. Figure 6 shows schematic model of all PpNLPs interacting with cellular proteins.
Expression Pattern of PpNLPs Gene Family
The real time quantitative PCR was executed to assess the expression level of PpNLP in rhizoid, stem, and phylloids of P. patens while Actin3 was taken as internal control. Three N treatments 0 (deficient), 5 (limiting), and 10 mM (sufficient) were provided for 0, 6, 12, 24, 48, 72 hours. Results indicated a significant differential pattern common in all PpNLPs in rhizoid, stem and phylloids (Fig. 7, 8). Expression of PpNLPs increased with increasing time of treatment from 6 to 72 hours under limiting (5 mM) and sufficient (10 mM) N supply, while no changes were observed in N deficient (0 mM) conditions. Thus, indicated that PpNLPs are highly regulated with N availability. The overall expression pattern showed significant up-regulation of all PpNLPs with immediate response due to expression increment within 0 to 6 hours in all three plant parts.