Orthologous groups of NAC genes in H. annuus genome
The genomic analysis showed that the InterPro protein domain IPR003441 was present in 164 out of 71289 automatically predicted proteins in the H. annuus genome V2. Additional genomic regions containing the domain IPR003441 were also detected and analyzed in this study. The sequences were manually curated to improve structural annotation, when necessary. Our analysis resulted in 151 potentially functional NAC genes (Supplementary material 1) and 15 pseudogenes/remnants.
A BLASTp analysis on whole NAC sequences set of four species representative of monocotyledonous (M. acuminata, O. sativa) and dicotyledonous species (A. thaliana, V. vinifera) assigned the 151 potentially functional NAC genes into 38 of 40 orthologous groups (OGs) defined by Cenci et al. (28) (Table 1). No gene could be assigned to OG-NAC-2c or OG-NAC-4f. Although some genes remained without unequivocal assignation to any OG, 11 of these genes were close to the group of genes assigned to OGs 3. Six additional NAC genes, with divergent sequences, also remained unassigned to any OG. Two of these genes were tentatively grouped on the OG-NAC-Y group, whereas the other four genes were grouped in the OG-NAC-X.
Table 1
Number of HaNAC genes assigned to the Orthologous Groups
Orthologous group
|
a
|
b
|
c
|
d
|
e
|
f
|
g
|
h
|
|
OG-1
|
3
|
2
|
1
|
4
|
1
|
1
|
3
|
6
|
|
OG-2
|
8
|
3
|
|
2
|
2
|
6
|
|
|
|
OG-3
|
4
|
4
|
2
|
5
|
6
|
|
|
|
11
|
OG-4
|
3
|
2
|
2
|
7
|
4
|
|
2
|
|
|
OG-5
|
10
|
3
|
1
|
|
|
|
|
|
|
OG-6
|
2
|
3
|
2
|
|
|
|
|
|
|
OG-7
|
2
|
4
|
1
|
4
|
2
|
8
|
|
|
|
OG-8
|
3
|
6
|
|
|
|
|
|
|
|
OG-X
|
|
|
|
|
|
|
|
|
4
|
OG-Y
|
|
|
|
|
|
|
|
|
2
|
A phylogenetic tree of the 305 NAC protein sequences of H. annuus, V. vinifera and A. thaliana was built based on 88 positions retained after alignment cleaning (Supplementary material 2–6). Three clusters with very high support (aLRT = 1) were connected to the remaining tree by long branches. One of these three clusters (Fig. 1a) included all the sequences (8 of H. annuus and 1 of V. vinifera) assigned to the OG-NAC-7f. Another cluster (in blue; Fig. 1a) contained all the sequences (11 of H. annuus, 7 of A. thaliana and 5 of V. vinifera) assigned to the OGs-NAC-7a, 7b, 7c and 7d. The third cluster (in green; Fig. 1a) included the two H. annuus NAC sequences assigned to OG-NAC-Y. Another phylogenetic tree built with 181 aligned positions of the 23 sequences in the blue cluster. Four OG-specific clusters contained all the analyzed sequences, with aLRT support ranging between 0.87 and 1 (Supplementary material 2–6).
After removing the 34 above mentioned diverging sequences, we performed the phylogenetic analysis with the remaining sequences (271). The new phylogenetic tree, which was based on 94 aligned positions, resulted in clusters consistent with most of the OG assignations (Supplementary material 2–6); the phylogenetic information in the analyzed alignment, however, was insufficient to resolve some of the OGs in specific clusters.
To gain insight in the NAC sequence phylogeny, we divided the tree in two main clusters, one containing the sequences assigned to OGs-NAC-1 and 2 (in red; Fig. 1b) and another with the remaining sequences. New rounds of phylogenetic analysis were performed to construct trees with the sequences of the two clusters.
The tree containing the 86 sequences from OGs-NAC-1 and − 2 (125 aligned positions) was subdivided in two very well-supported clusters (aLRT = 1), one containing the sequences assigned to OGs-NAC-1 and the other to OGs-NAC-2 (Supplementary material 2–6; Fig. 1c). Two phylogenetic trees built for OGs-NAC-1 (131 positions in 44 sequences) and OGs-NAC-2 (158 positions in 43 sequences) led to OG assignments that were consistent with the sequence clustering, even if some OG specific clusters had low aLRT support (Supplementary material 2–6). The clusters including sequences assigned to each OG-NAC-1h and OG-NAC-2a could be split into two sub-clusters containing at least a sequence from each considered species.
The analysis of the 185 sequences included in the largest cluster (in black; Fig. 1b) and based on 92 aligned positions allowed the identification of two small and three large clusters (Supplementary material 2–6; Fig. 1d). The cluster represented in fuchsia (aLRT = 0.99) contained all the sequences assigned to OG-NAC-5e, whereas the cluster shown in green (aLRT = 0.97) contained the V. vitis sequence VvNAC43 (previously assigned to OG-NAC-3d) and 12 H. annuus sequences, which had remained unassigned to any OG with the previous analysis.
The cluster in red contained all the sequences assigned to OGs-NAC-6 and − 8, whereas the one indicated in blue consisted of all the sequences assigned to OGs-NAC-4 and − 5. The one in black, on the other hand, contained the sequences assigned to OGs-NAC-3 as well as three H. annuus non-assigned sequences (OG-NAC-X).
A phylogenetic tree was built using the NAC sequences on each of the three clusters containing sequences assigned to more than one OG. The tree built with the 39 sequences assigned to OGs-NAC-6 and − 8 was based on 119 aligned positions. Four out of five clusters corresponding to the five OGs had an aLRT support higher or equal to 0.9, whereas the cluster containing all the sequences assigned to the OG-NAC-6b displayed very low support (Supplementary material 5). The tree built with the 82 sequences (124 positions) assigned to the OGs-NAC-4 and − 5 contained ten clusters consistent with an aLRT support higher or equal to 0.88 (Supplementary material 6).
The tree built with the 46 sequences (132 positions), which were mainly assigned to the OGs-NAC-3, consisted of six main clusters (Supplementary material 2–6). Each of five of them contained all the sequences assigned to OG-NAC-3a, 3b, 3c, 3d and 3e. The sixth cluster contained two V. vinifera sequences, previously unassigned to a specific OG but considered close to the OGs-NAC-3 (VvNAC42 and VvNAC45), and three unassigned H. annuus NAC genes.
Chromosomal location of HaNAC genes
In this study, we localized the eight HaNAC genes previously described by our group (HaNAC001, HaNAC002, HaNAC003, HaNAC004, HaNAC005, HaNAC006, HaNAC007 and HaNAC008) in the latest version of the genome (Supplementary material 1)(30). The criterion for identifying the remaining 138 HaNAC genes consisted of numbering these genes successively according to their position in the genome. The first gene annotated in the genome was HaNAC009 and the last one was HaNAC0146. Five other genes that had not been previously annotated into the genome were named as HaNAC147, HaNAC148, HaNAC149, HaNAC150, HaNAC151, and subsequently localized in the genome.
A schema of the localization of the NAC genes into the sunflower genome by using the KaryoploteR package in R revealed an apparent random OG distribution of HaNAC genes in each of the 17 sunflower chromosomes (Fig. 2). Chromosomes 2 and 13 contained the largest number of HaNAC genes (15 each; 10% of total genes). Chromosome 6, on the other hand, contained the minimum (four; 2.7%). Chromosomes 17, 11 and 5 contained detectable clusters but no chromosome region enrichment. Both p and q chromosome arms contained genes close to the centromeric or telomeric regions.
NAC member duplications in H. annuus
H. annuus genome contains traces of multiple rounds of genome multiplication: the most recent, a whole genome duplication (WGD) specific to Helianthus spp.: a whole genome triplication (WGT), specific of the Asterids II clade (31) and a more ancient γ-WGT, which is shared between asterids and rosids (32). According to our analysis, there were NAC genes originated by polyploidization events and tandem duplications. Eighteen couples of NAC genes occurred in regions duplicated by the last H. annuus WGD detected in the first version of its genome sequence (31) (Table 2).
Table 2
HaNAC paralogs originating from the Whole Genome Duplication specific of Helianthus spp.
Orthologous groups
|
Paralogs
|
OG-NAC-1a
|
Ha_NAC082
|
Ha_NAC118
|
OG-NAC-1b
|
Ha_NAC073
|
Ha_NAC047
|
OG-NAC-1g
|
Ha_NAC099
|
Ha_NAC027
|
OG-NAC-2a
|
Ha_NAC064
|
Ha_NAC117
|
OG-NAC-2a
|
Ha_NAC108
|
Ha_NAC033
|
OG-NAC-2e
|
Ha_NAC049
|
Ha_NAC072
|
OG-NAC-2f
|
Ha_NAC051
|
Ha_NAC137
|
OG-NAC-2f
|
Ha_NAC106
|
Ha_NAC031
|
OG-NAC-3
|
Ha_NAC025
|
Ha_NAC101
|
OG-NAC-3a
|
Ha_NAC008
|
Ha_NAC143
|
OG-NAC-3c
|
Ha_NAC066
|
Ha_NAC007
|
OG-NAC-4c
|
Ha_NAC107
|
Ha_NAC032
|
OG-NAC-5a
|
Ha_NAC074
|
Ha_NAC045
|
OG-NAC-6b
|
Ha_NAC094
|
Ha_NAC094
|
OG-NAC-7b
|
Ha_NAC075
|
Ha_NAC028
|
OG-NAC-7b
|
Ha_NAC085
|
Ha_NAC092
|
OG-NAC-7d
|
Ha_NAC062
|
Ha_NAC114
|
OG-NAC-7f
|
Ha_NAC080
|
Ha_NAC130
|
OG-NAC-8a
|
Ha_NAC119
|
Ha_NAC141
|
OG-NAC-8b
|
Ha_NAC098
|
Ha_NAC026
|
Genes of the same OG and mapped at a distance lower or equal to 100 kb were considered tandem duplications. The analysis retrieved 30 genes organized in 12 different tandems containing two or three genes (Fig. 2).
Exon/intron structure analysis
The basic structure of the NAC gene is composed of three exons. The first exon contains A and B subdomains (33) and ends at the first nucleotide of the first codon after the B subdomain; the second exon contains the C and D subdomains and ends at the third nucleotide of a codon; the third exon begins with the E subdomain and contains all the C-terminal region of the gene that includes the TRR (28). The basic structure of three exons was found in the genes assigned to OGs 1, 2, 3, 6 and 8 as well as for the OG-NAC-4b, 5a and 7e. The unique exception was HaNAC146, whose first two exons were merged. The genes assigned to the OG-NAC-7a had also three exons; however, they corresponded to exons 1 and 2 of the basic structure, respectively. OG-NAC-7b showed a similar composition. In the OG-NAC-7c and − 7d sequences, which have an additional exon containing few codons at the beginning of the coding sequence, the third and fourth exons corresponded to exons 1 and 2 of the basic structure. The sequences of OG-NAC-7a, -7b, 7c and − 7d, in addition to the C and D subdomain, the third exon contained, also the E subdomain. This finding suggests an exon merge in the common lineage of these four OGs. As in other species (28), sequences included into OG-NAC-7b, -7c and − 7d had additional exons in the C-terminal region containing the highly variable transcription regulatory regions (TRRs). Similarly, in most of the sequences assigned to OGs 4, 5 and OG-NAC-7e, a variable number of exons were also present in the C-terminal, in addition to the three exons of the basic structure (Supplementary material 7a and 7b).
Structural analysis of NAC proteins
NAC proteins had a characteristic NAC domain in the N-terminal with the capacity to interact with the CGT[AG] core sequence in target genes. Despite the conserved NAC domain in the NAC TF family, the members of this family presented variability within the group. For example, some NAC genes had a variable number of NAC domains (named chimeric NAC domains), whereas others had extra domains, such as transmembrane domains (26).
According to a protein sequence analysis by Pfam, NAC domains of all the analyzed genes were of approximately 126 amino acids, except for HaNAC091 (26 amino acids), and all located in the N-terminal of the protein (Supplementary material 1). All the analyzed proteins had a unique NAC domain, although in some cases they contained other extra domains. Some OGs were characterized for these extra domains. For example, in the N-terminal, four of five genes of the OG-X had a Claterim_lim domain, whereas OG-7a had a CpXC domain. On the other hand, OG-3 had a Ring-hydroxyl-B domain in the C-terminal.
Additionally, according to a transmembrane domain analysis, HaNAC101 and HaNAC111 contained transmembrane domains at the N-terminal, whereas HaNAC061, HaNAC067, HaNAC050, HaNAC82 (all within the OG-4 subfamily) and HaNAC110 presented these domains at the C-terminal (Supplementary material 1).
Using MEME and LOGO software we evaluated the common motifs in the HaNAC proteins. Here we identify ten different motifs, of which four (motif 5, 6, 7, 8) occurred in almost all sequences (Fig. 3a)(34). Motifs 1–7 and 9 were located inside the NAC domain in the N-terminal. The motifs were organized in different structures, named A, B, C and D (Fig. 3b). The distance between the motifs was variable among the HaNAC proteins. However, motifs 2, 3, 4, 5 had a tendency to cluster in structure A (reading from left to right).
Almost all NAC genes of OGs 1, 2, 3, 4a, 4b, 4c, 4d, 5, 6 and X contained the structure A conformed by eight motifs (1–8). Some exceptions showed several motif deletions: motifs in HaNAC016, HaNAC045, HaNAC076, all of the OG-5a group and HaNAC91 from the OG-3. This last gene had two exons, whereas the rest of their OG had 3 exons.
In accordance with the variability in exon numbers, OGs-7 contained variable types of structures. OG-7a, OG-7b, OG-7c and OG-7d belonged to structure C and had an unequal motif, the motif 9. This motif may be functionally redundant to motifs 2, 3, 4 and 5. OG-7e belonged to structure B, whose structure was highly similar to structure A, but without motif 4. Other OG belonging to structure B were OGs-8 and the proteins HaNAC135 and HaNAC021 (of the OG-2b and OG-4e, respectively). Finally, OG-7f, which belonged to structure D, showed a different motif that may be associated with the TRR domain.
All the motifs positioned on the NAC domain were similar to the motifs in Arabidopsis thaliana NAC proteins. According to a comparative analysis between the motifs found here and the subdomains reported in Kikuchi et al. (33) and Puranik et al. (26), motifs 1 and 2 corresponded to the subdomain A and subdomain B, respectively, whereas motifs 3, 4, and 5 corresponded to the subdomain C. In addition, motifs 6 and 7 corresponded to subdomain D, whereas motif 8 corresponded to subdomain E. Motif 10 on the structure D was in the C-terminal and showed no similarity with Arabidopsis thaliana NAC proteins.
Regulatory sequences of NAC genes
TFs regulate cell processes by binding a specific (TFBS) DNA motif on promoter regions and, therefore, affect downstream gene expression. However, not all members of a TF family have the same functions or have the same expression profile in all the tissues or in all developmental stages. In this study, we evaluated the differences in the promoter region of the HaNAC using the database PlantPAN 3.0 (35). The analysis carried out for each OG showed the mean value of TFBS frequency from different TF families (Fig. 4).
Most representative TFBS families along the 151 HaNAC were AP2, ARID, AT-Hook, B3, bHLH, bZIP, C2H2, C3H, Dehydrin, Dof, GATA, HD-ZIP, Homeodomain, MADS, MYB, MYB-related, Myb/SANT, NAC, NF-YB, SBP, TBP, TCP, Trihelix, WRKY and ZF-HD.
Some OGs were defined by specific TFBS families. For example, AP2 TFBS was over-represented in OG-8b and moderately high in OG-4a, OG-4c and OG-5b. At-hook TFBS were over-represented in OG-3b, OG-4d and OG-7e, whereas bHLH and bZIP TFBS were particularly high in OG-5c. Dof TFBS were over-represented in OG-4g and OG-1e, whereas Myb-SANT TFBS were over-represented in OG-X, OG-8a, OG-3c but even more in OG-5c. NFYB, SBP and TBP TFBS were particularly high in OG-3c, OG-5a and OG-3b, respectively.
The heatmap plot showed a unique TFBS pattern for each OGs. In this pattern, some TFBS were more important than others in discriminating between OGs. OG-NAC-1a and OG-NAC-1c had similar TFBS values for all the analyzed TFs, except for NAC and WRKY TFBS. In addition, OG-NAC-1d, -1e, -1f, -1g and 1h had similar TFBS patterns, except for AP2, bHLH, Dof, Myb/SANT, NAC and WRKY TFBS. Moreover, the OGs of subfamily 2 had different values for Dof, NAC, WRKY and bZIP TFBS. Other HaNAC subfamilies showed similar results.
Expression of NAC genes profiles during senescence in cultivated sunflower
NAC TF family regulates several physiological changes during plant development. The senescence is the last stage in plant cycle life and starts after the flowering period (anthesis) and lasts during the grain filling until the complete dehydration of the plant and seeds. This process is a complex mechanism that accompanies the molecular recycling of carbon and nitrogen from leaves to seeds.
To evaluate the functionality of the HaNAC associated with senescence in sunflower, we performed an RNAseq experiment of two contrasting sunflower lines in time-to-leaf senescence traits. Genotype R453 showed an early senescence phenotype, whereas B481-6 had a delayed senescence phenotype. The analysis consisted of two evaluation times, at anthesis (A) and post-anthesis (PA), and four comparisons: line B481-6 post-anthesis vs. anthesis, line R453 post-anthesis vs. anthesis, R453 vs B481-6 at anthesis, R453 vs B481-6 PA post-anthesis (Fig. 5, Supplementary material 8).
Differentially expressed genes between post anthesis vs. anthesis were associated with molecular changes during leaf senescence. R453 (early senescence phenotype) displayed a singular expression profile. For example, genes of OG-1a (HaNAC118, HaNAC082), OG-1b (paralogues: HaNAC047 and HaNAC073) and OG-2b (HaNAC135) had log2 ratio values above or equal to 1.5, whereas a gene of OG-8b (HaNAC098) had log2 ratio values below or equal to 1.5. These HaNAC genes were not differentially expressed in B481-6 (delayed senescence phenotype). Conversely, B481-6 had only one gene of OG-3d with log2 ratio value above 1.5 (HaNAC044) and two genes of OG-3c and OG-2f (HaNAC007 and HaNAC024, respectively) with log2 ratio values below 1.5. HaNAC007 was an interesting gene because of its contrasting expression depending on the genotype: upregulation in R453 and downregulated in genotype B481-6.
Any difference between gene expression patterns of the two contrasting lines in a given time, could also be useful for assessing the senescence process. Indeed, differences in the observed phenotype could be explained by a different expression pattern of a given gene during anthesis (A) or post-anthesis (PA). For example, HaNAC014, HaNAC013 and HaNAC141 displayed downregulated patterns in R453 in comparison to B481-6 (during anthesis and post-anthesis), whereas HaNAC023 and HaNAC027 showed an upregulated expression (during anthesis and post-anthesis). These genes were only differentially expressed between genotypes and not in the PA/A comparison.
Other genes appear to be involved in the senescence process depending on both time and genotype. During post-anthesis, HaNAC082, HaNAC047 and HaNAC007 displayed an upregulated profile, while HaNAC146 and HaNAC98 showed a down regulated profile, in R453. In this line, on the other hand, HaNAC044 was upregulated during anthesis.
Phenotyping assay
Orthologous relationships are useful for inferring gene functionality. We analyzed Gene Ontology for all the Arabidopsis thaliana OGs described by Cenci et al. (28) (information provided by TAIR database) enriched in the terms “leaf senescence” and “programmed cell death” (Supplementary material 9). We found that Arabidopsis thaliana genes belonging to OG-1h, OG-3e and OG-3c had ontology terms associated with leaf senescence. For that reason, we selected four genes belonging to OG-1h, OG-3e and OG-3c: HaNAC001, HaNAC003, HaNAC005 and HaNAC007, for functional validation of HaNAC (18). These genes also showed differential expression in RNAseq assay (Fig. 5).
We generated Arabidopsis thaliana transgenics lines for these four sunflower genes and evaluated their phenotype. Two independent lines of Arabidopsis thaliana were selected for each gene construct. According to a qPCR analysis, the transgenic lines showed different transgene expression levels (Supplementary material 10). Growth and morpho-physiological traits of the eight lines (HaNAC01-1, HaNAC01-4, HaNAC03-3, HaNAC04-5, HaNAC05-1, HaNAC05-5, HaNAC07-2, HaNAC07-5) were quantified in the automated phenotyping platform PHENOPSIS.
The two first components of a PCA analysis performed with leaf number, rosette dry weight, total leaf area, Hue pixel ratio (40–61), dry matter content, leaf mass area, growth duration and maximum expansion rate as active variables explained 78.6% of the total variability of the measured data. Projection of individual observations revealed that individuals from the line HaNAC01-4 were grouped in a single contrasted cluster in comparison with the other lines and the control Col-0. HaNAC01-4 was the line with highest expression levels of the transgene HaNAC001 (Fig. 6).
Compared with Arabidopsis thaliana Col-0, the HaNAC01-4 line showed a similar number of leaves but significantly lower dry weights and leaf area values (Fig. 7). This was associated with a lower growth rate and growth duration in the transgenic line compared to the wild-type (Fig. 7). The mean Hue pixel ratio (40–61), i.e. the yellow and light green proportion of pixels in all the leaves of each plant, of HaNAC01-4 line plants was twice the ratio of Col-0 plants in concordance with the number of senescing leaves at flowering (Fig. 8).