3.1 The wheat genome consists of 412 nitrate transporter genes belonging to four different families
A total of 412 nitrate transporter sequences excluding splice variants were identified in IWGSC wheat genome assembly (RefSeq V2.0). The wheat genome consists of 292 TaNPF genes, 34 TaCLC genes, 40 TaSLAC1/TaSLAH genes and 46 TaNRT2 genes. The TaNPF genes could be divided into eight subgroups (TaNPF1 to TaNPF8) based on the presence of conserved domains (Table 1). TaNPF5 subgroup was the largest group consisting of 97 genes followed by TaNPF8 (70 genes), TaNPF2 (41 genes), TaNPF4 (33 genes), TaNPF6 (22 genes), TaNPF3 (12 genes) and TaNPF7 (11 genes). The NPF1 subgroup was the smallest one consisting of 6 genes present on homoeologous group chromosomes 3A, 3B and 3D. TaNRT1/TaNPF genes were present throughout the genome (Figure 1). The location of genes across chromosomes varied according to the size of the subfamily. The genes belonging to larger subfamilies (e.g., TaNPF5, TaNPF8, TaNPF2) were predominantly located in tandem positions on the distal region of chromosomes. The genes belonging to smaller subfamilies (TaNPF1, TaNPHF7, TaNPF3) were located on proximal regions of chromosomes. The genes present near distal ends of chromosomes were found to be in the form of clusters in close vicinity to each other. The majority of TaNRT2genes were present in the clusters on the distal end of homoeologous chromosomes 6A, 6B and 6D.TaCLC genes were distributed across the wheat genome. TaSLAC1/TaSLAH genes were only distributed on homoeologous chromosomes 1A,1B, 1D, 2A, 2B, 2D, 3A, 3B and 3D. The predicted gene structures contained several intron regions (Supplementary Figures 1 a-k) for many genes in TaNPF, TaCLC and TaSLAC1/TaSLAH families. All the TaNRT genes were intron less. The size of predicted genes ranged between 1 Kb to 25 Kb. Several truncated and duplicated genes were also predicted.
3.2 Phylogenetic relationships among nitrate transporter genes
The maximum likelihood phylogenetic tree of all the nitrate transporter genes predicted that wheat contains all the major subfamilies present in Arabidopsis and rice (Oryza sativa) (Figure 2a). The TaNRT1/TaNPF and TaNRT2 genes could be classified into five subclades. The subclades in the phylogenetic tree followed species phylogeny with Arabidopsisgenes displaying sister group relationship with wheat genes. Based on the phylogenetic relationship, TaNRT1/ TaNPF genes fitted well into eight subfamilies (TaNPF1 to TaNPF8) following the Arabidopsis model. The topology of larger subclades (TaNPF5, TaNPF8, TaNPF2) was more complex than smaller subclades as they were more expanded in wheat than Arabidopsis and rice (Figure 2a, Supplementary Figure 2). TaNRT2genes were present as a separate subclade and were closely related to the TaNPF2 subfamily. The phylogenetic analysis of TaCLC and TaSLAC1/TaSLAH genes was carried out separately. The results showed TaCLC genes could be classified into 6 groups according to phylogenetic relation with Arabidopsis and rice genes (Figure 2b). TaSLAC1/TaSLAH genes were divided into 4 subclades. The largest subclade in TaSLAC1/TaSLAH genes showed close relationship with rice SLAC1/SLAH genes but not with Arabidopsis genes (Figure 2c).
3.3 Homoeologs retention and gene duplication in nitrate transporter genes
The number of nitrate transporter genes in each family were significantly higher than those in Arabidopsis and rice (Table 1, supplementary table 3). The comparison with T. dicoccoides (AABB), T. turgidum (AABB), T. urartu (AA) and Ae. tauschii (DD) suggested that most of the homoeologs in hexaploid wheat were retained during evolution (Figure 6 supplementary table 3). There was also evidence of gene duplications in tetraploids and hexaploid wheat, reflected in gene number and phylogenetic data (Figure 2, Supplementary Figure 1 a-k). Most duplicated genes were present in subfamilies with a larger number of genes (TaNPF5, TaNPF8, TaNPF2 and TaNRT2). Nitrate transporters could be grouped into 13 triads, 26 diads, 2 tetrads and 48 singleton genes based on phylogeny (Table 2). Out of a total of 292 TaNPF genes, about 74% of TaNPF genes could be grouped into 72 triads of homoeologous genes (A, B, D) based on phylogenetic relationships. Similarly, 71% of TaNRT2 genes, 97% of TaCLC genes and 80% of TaSLAC1/TaSLAH genes could be grouped into homoeologous triads.
3.4 Nitrate transporter proteins contain multiple transmembrane helices
To study the structural features of nitrate transporters, we predicted the 3D structures of all 412 protein sequences. All nitrogen transporters were predicted to be transmembrane proteins containing multiple transmembrane segments (Figure 3 i). The majority of proteins comprised of 12-14 transmembrane helices (TMs) with some variation. The basic structure of TaNRT/TaNPF proteins included N and C terminal segments followed by multiple transmembrane helices (TMs). The transmembrane helices were connected by alternating cytoplasmic and extracellular loop segments (Figure 3 ii). In TaNRT1/TaNPF family, approximately 67% of the proteins contained 14 TMs, 21% contained 13 TMs, 7% of proteins contained 12 TMs while 4% of proteins contained less than 12 TMs (supplementary table 1). Subfamily wise studies showed TaNPF1 proteins contained only 13 TMs and TaNPF7 contained only 14 TMs. In rest of subfamilies (TaNPF2-6, TaNPF8) majority of proteins contained 14 TMs but variation existed. Proteins with even number of TMs had both C and N terminals in cytoplasmic side of membrane. Proteins with odd number of TMs had one end in cytoplasmic side and other in extracellular side (Figure 3 ii). All TaNRT2 family members contained only 12 TMs (supplementary table 1) (Figure 3 ii). Both C and N terminals of TaNRT2 proteins were present in cytoplasmic side of the membrane. Both TaCLC and TaSLAC1/SLAH proteins contained 10 TMs with both N and C terminals in cytoplasmic side of membrane. TaCLC genes were characterized by presence of a 30-40 amino acids long re-entrant helix in cytoplasmic side (Figure 3 ii) which was not observed in the proteins of other nitrate transporter gene families.
3.5 Expression patterns of nitrate transporter genes in development stages of wheat
To elucidate the expression patterns of nitrate transporter genes, we studied and compared the expression data of Chinese spring and Azhurnaya for different developmental stages. Approximately 77% of TaNPF genes, 30% of TaNRT2, 85% of TaCLC genes and 36% of TaSLAC1/TaSLAH genes were expressed at least at one developmental stage in wheat with a wide expression range of 1-103 tpm (Supplementary table 2). The remaining genes showed very low or no expression (tpm<1). Overall, we identified 20 triads in which 48 genes were showing tissue specific expression, out of which 8 triads were root specific, 5 triads were leaf/shoot specific and 7 triads were showing grain/ spike specific expression (Supplementary table 4). Tissue and developmental stage-specific expression were observed in TaNPF1 genes, which were only expressed in spike and grain at the reproductive stage (Figure 4A). Similarly, TaNRT2 genes were predominantly expressed in roots in both vegetative and reproductive stages (Figure 4A). TaSLAC1/TaSLAH genes were predominately expressed in roots and leaves with some genes showing expression in spikes also (Figure 4B). TaCLC genes showed mostly ubiquitous expression (Figure 4B). For the rest of the subfamilies, the genes within one subfamily differed considerably in their expression patterns. In TaNPF2 genes, spike/grain specific (3 genes), leaf, spike and grain specific (5 genes) and ubiquitous expression (6 genes) were observed (Figure 4A). TaNPF3 genes showed spike/grain, leaf specific expression, TaNPF4 genes showed leaf/root-specific (4 genes) and ubiquitous expression (10 genes) (Figure 5A). TaNPF5 and TaNPF8 genes mostly showed ubiquitous expression though the root-specific expression was observed in a few genes (Figure 4A). TaNPF6 showed ubiquitous (6 genes), leaf and root-specific (6 genes), spike specific (3 genes) and root-specific expression (Figure 4A). TaNPF7 showed ubiquitous expression in three genes, grain specific expression in two genes and root-specific expression in one gene (Figure 4A).
To find out up to what extent homoeologs differ in the expression patterns, triad expression analysis was performed. Most of the triads showed balanced expression ranging from 55.6% to 65.2% in all the tissues (Figure 5A). In roots, a total of 54 triads were showing expression out of total 83 triads. Out of which 55.6% showed balanced expression, 18.5% showed A suppressed, 11.1% showed D suppressed, 9.3 % showed B suppressed expression. Three triads showed A, B and D dominant expression (1 each) (Figure 5B). In leaf/shoot out of 51 triads, 64.7% showed balanced expression, 9.8% showed A suppressed and B suppressed each, 3.9% triads showed D suppressed expression. 5.8% triads showed A and D dominant expression each while no B dominant expression was observed (Figure 5B). In spikes, 61.9% triads out of 42 triads showed balanced expression. Only D dominant expression was observed in 9.5% of triads while A suppressed, B suppressed, and D suppressed expressions were in about 16.7%, 7.1% 4.7% triads (Figure 5B). Only 23 triads were expressing in grains at the reproductive stage, out of which 65.2% showed balanced expression, 8.7% triads showed A, B, and D suppressed each and 4.3% triads showed B and D dominant expression (Figure 5B).
3.6 Nitrate transporter genes are located in close proximity to the NUE associated SNPs
In a parallel study in our laboratory, the nested synthetic wheat introgression libraries capturing novel genetic variation from wild wheat for the nitrogen use efficiency related traits were developed and genotyped using a high-density SNP array (Sandhu et al 2021). These libraries were phenotypically assessed for the root traits and agronomic performance under three nitrogen input conditions (N: 0 kg ha-1; N: 60 kg ha-1 and N:120 kg ha-1) in the field over two years in 2018 and 2019. Genome-wide association mapping was used to identify marker-trait associations for the root and agronomic traits to identify the marker-trait associations for traits improving nitrogen use efficiency in wheat. We compared 322 marker trait associations for NUE identified by Sandhu et al (2021) to nitrate transporter genes identified in the present study.We identified 67 SNPs, which were in close proximity to nitrate transporter genes in the wheat genome. A total of 93 nitrate transporter genes could be located near NUE linked SNPs, out of which, 63 genes belonged to TaNPF family, 15 genes belonged to TaNRT2 family, 11 genes belonged to TaCLC and 4 genes belonged to TaSLAC1/TaSLAH family(Table 4, Supplementary figure 4).
The existing genetic variability for 48 tissue specific genes and 93 genes in close proximity to NUE associated SNPs identified in the present study in different wild and cultivated wheat accessions/varieties may be further utilized in genomics-assisted breeding programs targeting improved nitrogen-use efficiency in wheat. The identification of improved breeding lines or the wild accessions possessing the potential nitrate transporters may serve as novel donors to be used in genomics-assisted introgression program developing nitrogen-efficient wheat varieties. The identified nitrate transporters may have potential for efficient nitrogen uptake and its transport from source to sink.