Phenotypic Characterization
The traits recorded for grown-field NILs are shown in Table 1. The difference in flowering time between NILs was about 14 days (44.4 ± 2.8 days for NF10/82-E and 58.0 ± 1.1 days for NF10/82-L). NF10/82-E exhibited reduced vegetative biomass characterized by decreased branching (lower total number of branches and fewer branches in the initial nodes) and shorter plant height with fewer internodes (Fig. 2). Nevertheless, its internode length exceeded that of the late-flowering ones (2.47 ± 0.17 for NF10/82-E vs. 2.23 ± 0.10 for NF10/82-L).
Table 1 Phenotypic characterization of the NILs (Mean ± SD)
Genotype
|
DTF
|
PW
|
PH
|
IPP
|
IL
|
BPP
|
BF
|
BI
|
NF10/82-E
|
44.4 ± 2.8
|
3.79 ± 1.85
|
52.5 ± 6.1
|
22.3 ± 2.6
|
2.47 ± 0.17
|
3.91 ± 2.26
|
0.73 ± 0.65
|
1.25 ± 0.67
|
NF10/82-L
|
58.0 ± 1.1
|
9.14 ± 2.62
|
66.6 ± 3.8
|
30.9 ± 2.4
|
2.23 ± 0.10
|
21.5 ± 6.6
|
2.57 ± 1.02
|
6.01 ± 1.69
|
t-test
|
***
|
***
|
***
|
***
|
**
|
***
|
***
|
***
|
Significant difference Student’s t-test (ns: non-significant, *0.01 < P ≤ 0.05, **0.001 < P ≤ 0.01, ***P ≤ 0.001). DTF, Days to flowering; PW, Plant weight (g); PH, Plant height (cm); IPP, Internodes per plant; IL, Internode length (cm); BPP, Branches per plant; BF, Branches in the first three nodes; BI, Branching index
Genetic Characterization
The sequencing data confirmed a high degree of similarity between NILs. A total of 393,670,345 positions were read, revealing 120,441 different positions constituting the detected variants (Additional file 1). This indicates that the NILs differ in only 0.03% of positions between them. Additionally, for NF10/82-L, 209,276 heterozygous positions were detected, and for the NF10/82-E, 200,084, corresponding to an observed heterozygosity of 0.053% and 0.051%, respectively. Both lines underwent at least 11 generations of self-fertilization during their development (Fig. 1), so the expected residual heterozygosity is 0.098% deduced from Mendelian Genetics for self-fertilizing generations. The observed values being lower than expected are reasonable because the segregating population from which these lines were obtained may have undergone prior refreshing processes to maintain seed viability. In other words, it could have undergone additional generations of self-fertilization to maintain a suitable number of viable seeds before the actual process of obtaining the NILs.
Approximately 64% of the detected variants were successfully mapped to chromosomes (77,170), of which 45,481 met the applied quality criteria (Table 2). Only 15,690 variants were homozygous, with 4,932 being intragenic in 432 loci (HHQ-I variants; Additional file 3). There are 37 variants expected to affect two loci simultaneously.
Among all HHQ-I variants detected on chromosomes, 1,610 are located in CDS or exons (HHQ-I-C/E variants; 849 in CDS, 758 in exon regions and three are located in both depending on DNA strand), affecting 246 protein-coding genes (Table 2). Additionally, there are 176 variants located in non-protein-coding RNA genes, including 17 uncharacterized lncRNA (168 variants), two snRNA (4), one snoRNA (1) and three tRNA (3) (Additional file 4). Finally, 199 variants are positioned in pseudogenes or miscellaneous RNA. Notably, six variants are classified as intragenic, affecting LOC101491595, but in a region devoid of additional features according to the GFF data of C. arietinum. As explained by NCBI staff, this discrepancy is attributed to an artificial extension of this locus resulting from an error in the annotation of the non-protein-coding transcript XR_003470270.1 (personal communication). Consequently, XR_003470270.1 has recently been suppressed by NCBI RefSeq staff.
The distribution of the HHQ-I and HHQ-I-C/E variants did not follow a proportional pattern concerning chromosome size, nor was it uniform along the chromosomes (Fig. 3). Most of them are positioned in a region at the beginning of chromosome 1 (Ca1: 1.78 – 3.15 Mb) and the end of chromosome 6 (Ca6: 57.2 – 58.8 Mb). These are the only two chromosomes with specific regions containing more than 200 variants per 1 Mb window. Notably, chromosome 3 lacks any HHQ-I variant.
Table 2 Number of variants (SNPs and InDels) and protein-coding genes affected per chromosome in the pair of NILs
Chr
|
Size (Mb)
|
Detected Variants
|
HQ Variants
|
HHQ Variants
|
HHQ-I
Variants
|
HHQ-I-C/E Variants
|
Protein-coding Genes with HHQ-I-C/E
|
Ca1
|
48.36
|
31,790
|
23,428
|
12,585
|
4,185
|
1,334
|
165
|
Ca2
|
36.63
|
7,012
|
3,774
|
673
|
134
|
35
|
30
|
Ca3
|
39.99
|
5,311
|
2,095
|
9
|
0
|
0
|
0
|
Ca4
|
49.19
|
5,371
|
2,259
|
23
|
4
|
2
|
2
|
Ca5
|
48.17
|
6,843
|
2,936
|
106
|
17
|
4
|
4
|
Ca6
|
59.46
|
12,594
|
7,507
|
2,232
|
575
|
230
|
40
|
Ca7
|
48.96
|
6,598
|
2,625
|
15
|
3
|
1
|
1
|
Ca8
|
16.48
|
1,651
|
858
|
47
|
14
|
4
|
4
|
|
TOTAL
|
77,170
|
45,481
|
15,690
|
4,932
|
1,610
|
246
|
HQ, High-quality; HHQ, Homozygous high-quality; HHQ-I, Intragenic homozygous high-quality;
HHQ-I-C/E, Intragenic homozygous high quality in CDS or exon of mRNA
Functional annotation using Blast2GO was successfully performed on 216 out of the 246 coding genes affected by HHQ-I-C/E variants (Additional file 5). The distribution of GO Slim terms among protein-coding genes across the ontologies of “molecular function”, “biological process”, and “cellular component” (Fig. 4) revealed no enrichment compared to the entire chickpea annotated genome using Fisher's exact test (P < 0.05).
Candidate Genes
Based on GO Slim enrichment analysis in A. thaliana for the FLOR-ID set of flowering-related genes, there are 146 genes affected by HHQ-I-C/E variants that have any of these enriched GO Slim terms (Additional file 6). Among them, four genes seem to be homologous to those found in the FLOR-ID A. thaliana dataset according to the reciprocal BLASTp results. These genes are LOC101515142, LOC101489432 (also known as CaELF3a), LOC101499101 and LOC101507442 (Table 3).
LOC101515142 (Ca1: 2,285,592 - 2,298,911, complement) is annotated as “Mediator of RNA polymerase II transcription subunit 16-like” (MED16), an homologue of the A. thaliana MED16/SFR6 gene encoding a component of the Mediator complex involved in diversal aspects of gene expression regulation [59]. In chickpea, a second homologue is present on Ca6 (LOC101501202, Ca6: 16,660,218 - 16,679,432) without any detected variants between NILs. LOC101515142 is affected by 94 variants, from which 14 affect exon or CDS regions (Table 3). Most alternative variant alleles are found in NF10/82-E, with only one detected in NF10/82-L (a 21 bp deletion located in an intron in Ca1: 2,297,103). This locus encodes six different isoforms, being affected by three variants in CDS with different impacts. For example, NF10/82-E has a 6 bp deletion (Ca6: 2,298,571) that affects two isoforms with a moderate impact (by the loss of two Glu), whereas another two transcriptional isoforms are affected in the 5’ UTR (Additional file 7. Fig. S1a). However, the other 11 HHQ-I-C/E variants affect all isoforms equally with low or modifier theoretical impacts. In any case, the high number of detected variants affecting this locus could have some implications for its functional activity.
LOC101489432 (CaELF3a, Ca5: 36,011,384 – 36,016,600, complement) is one of the two homologs of A. thaliana ELF3 identified in legumes, previously reported to be involved in the regulation of the circadian clock and to influence the flowering process in chickpea [40]. In this study, an 11 bp deletion located at Ca5: 36,016,064 was detected in NIL10/82-E. This deletion is predicted to affect the first exon of CaELF3a, resulting in six missense amino acids followed by a premature stop codon. Consequently, this alteration reduces the protein length from 699 to 13 amino acids (Additional file 7. Fig. S1b). The absence of other isoforms encoded by CaELF3a suggests a significant impairment in its functionality.
Finally, two loci at the end of Ca6 are affected by HHQ-I-C/E variants. On one hand, LOC101499101 is a B-box finger protein homolog of STO/BBX24, known to link the FRI/FLC and photoperiod/circadian clock pathways, affecting flowering time in A. thaliana [60]. On the other hand, LOC101507442 is a VRN1-like transcription factor containing a B3 domain, encoding a DNA-binding protein involved in the vernalization pathway that represses FLC expression, thus promoting flowering [61]. Both loci are affected by only one SNP with modifier or low theoretical impact. LOC101499101 has a SNP affecting the 3’UTR (Additional file 7. Fig. S1c), while LOC101507442 has a SNP located in the third exon, influencing its two potential protein isoforms as a synonymous variant (Additional file 7. Fig. S1d).
Table 3 Genes affected by HHQ-I-C/E variants that appear homologous to four genes included in the A. thaliana FLOR-ID dataset
C. arietinum ID NCBI
|
Homologous A. thaliana
ID TAIR
|
Number
HHQ-I-C/E Variants
|
Variant Position
|
Ref (0)
|
Alt (1)
|
Variant Impacts
|
NF10/82-L
|
NF10/82-E
|
LOC101515142
Mediator of RNA polymerase II transcription subunit 16-like
(Ca1: 2,285,592 – 2,298,911, complement)
|
AT4G04920
|
14
(+ 80 in intron regions)
|
2,285,696
|
C
|
CAT
|
3_prime_UTR_variant [MODIFIER], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
2,285,879
|
G
|
A
|
3_prime_UTR_variant [MODIFIER], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
2,286,112
|
G
|
GA
|
3_prime_UTR_variant [MODIFIER], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
2,286,256
|
A
|
T
|
3_prime_UTR_variant [MODIFIER], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
2,286,460
|
C
|
T
|
3_prime_UTR_variant [MODIFIER], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
2,286,488
|
A
|
G
|
3_prime_UTR_variant [MODIFIER], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
2,288,143
|
C
|
T
|
synonymous_variant [LOW], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
2,291,280
|
T
|
G
|
synonymous_variant [LOW], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
2,292,486
|
A
|
C
|
synonymous_variant [LOW], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
2,292,528
|
G
|
A
|
synonymous_variant [LOW], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
2,294,335
|
A
|
G
|
synonymous_variant [LOW], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
2,298,490
|
G
|
T
|
synonymous_variant [LOW], 5_prime_UTR_variant [MODIFIER], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
2,298,571
|
CTCTTCT
|
C
|
disruptive_inframe_deletion [MODERATE], 5_prime_UTR_variant [MODIFIER], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
2,298,634
|
T
|
A
|
synonymous_variant [LOW], 5_prime_UTR_variant [MODIFIER], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
LOC101489432
protein EARLY FLOWERING 3a
(Ca5: 36,011,384 – 36,016,600, complement)
|
AT2G25930
|
1
|
36,016,064
|
ATCATCATCTTC
|
A
|
frameshift_variant [HIGH], non_coding_transcript_exon_variant [MODIFIER], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
LOC101499101
B-box zinc finger protein 24
(Ca6: 57,549,424 – 57,552,323, complement)
|
AT1G06040
|
1
|
57,549,449
|
T
|
A
|
3_prime_UTR_variant [MODIFIER], non_coding_transcript_variant [MODIFIER]
|
1/1
|
0/0
|
LOC101507442
B3 Domain-
containing transcription factor VRN1-like
(Ca6: 57,717,926 – 57,721,229)
|
AT3G18990
|
1
|
57,720,344
|
C
|
T
|
synonymous_variant [LOW], non_coding_transcript_variant [MODIFIER]
|
0/0
|
1/1
|
The 11 bp deletion in CaELF3a seems to be distributed across a small proportion of chickpea germplasm, with only two haplotypes identified that specifically differ in this deletion [40]. To gain insights into the genetic variability and conservation of the variants detected in LOC101515142, LOC101499101 and LOC101507442 along chickpea diversity, we analyzed the different accessions represented in the public repository CicerSeq, which contains information about the SNPs detected in cultivated chickpea [51]. Among the 94 variants detected for LOC101515142 in the pair of NILs, 68 are SNPs, with 51 positions registered in CicerSeq. The contrasting haplotypes for LOC101515142 detected in NILs are highly conserved along the 3,171 cultivated accessions for C.arietinum registered in the pangenome (Fig. 5 and Additional file 8). The NF10/82-L haplotype is conserved in approximately 70% of the accessions (H1), while the NF10/82-E is present in about 23.6% (H2). Interestingly, ~3.8% of accessions have the NF10/82-L haplotype except for one SNP variant located in Ca1: 2,295,317 (in the intron region of LOC101515142; H3), and 2% of accessions have an intermediate haplotype (36/51 SNPs like NF10/82-L haplotype; H4). For the SNPs in LOC101499101 (T/A) and LOC101507442 (C/T), the reference alleles are the majority (82.2% T/ 9.4% A and 83.7% C/ 7.3% T, respectively) (Additional file 9).
The DTF data for chickpea accessions, available in the public repository CicerSeq across six locations and two seasons (excluding the IIPR location, which only provided data for one season), were analyzed according to the SNPs present in LOC101515142 haplotype, LOC101499101, and LOC101507442 (Additional file 9). Accession density distribution plots were obtained for each group (Fig. 6 and Additional file 10). In accessions with contrasting haplotypes of LOC101515142, significant differences in DTF were only found in three different locations/seasons. In the other locations, the data reflect a similar DTF distribution for the different lines between the two groups of accessions with contrasting haplotypes. This pattern is also observed for the SNP in LOC101507442, where significant differences were only detected in ICARDA 2015/16. In contrast, for the SNP located in LOC101499101, interestingly significant differences are found in all locations/seasons, except RARI 2015/16. These consistent differences across locations and seasons suggest that LOC101499101 could influence flowering time along chickpea diversity.
In silico Expression Analysis
A total of 132 CDS from the selected 146 protein-coding genes, based on their functional annotation, were unambiguously matched with sequences in the GEO dataset. The TPM data for each transcript ID and their corresponding gene ID can be found in Additional file 11.
The TPM level heatmap categorized LOC101515142, LOC101489432, LOC101499101 and LOC101507442 into three distinct clusters (Fig. 7). LOC101499101 and LOC101507442 (both situated at the end of Ca6) show a similar expression pattern, closely grouped in the same subcluster, characterized by genes with higher expressions at all FB stages (Cluster I). LOC101515142 (Ca1: 2,285,592 – 2,298,911, complement) is in a neighboring cluster (Cluster III) with lower expression levels at the end of the FB stage (FB3 and FB4), but higher levels in SAM. Finally, LOC101489432 (Ca5: 36,011,384 – 36,016,600, complement) shows the most different expression profile, falling into a cluster with high expression levels in SAM and low expression in FB tissues (Cluster V). This locus is somewhat isolated from other genes in its cluster due to its lower expression level in YL and higher level in ML. The detailed description of co-expressed genes for these four genes can be found in Table 4.
In Cluster I, it is remarkable LOC101513952 (CaARF2), which is an auxin response factor protein (ARF). The ARF family members are considered the core of auxin signaling with important functions as regulators of plant growth and developmental processes [62, 63]. NF10/82-E exhibits 14 variants affecting this locus, of which six are located in exon or CDS with moderate and low theoretical impacts. LOC101491064 also stands out in this cluster, encoding a DNA-binding one zinc finger (DOF) protein. DOF transcription factor genes are involved in various fundamental processes in plants, including responses to light, phytohormones, as well as roles in seed maturation or germination [64]. For this locus, a total of 72 variants were detected in NF10/82-E with 13 located in exon or CDS regions. Additionally, LOC101491273 is an ethylene-responsive transcription factor (ERF) affected by four HHQ-I-C/E variants, one of them predicted to have a moderate impact as a missense variant. In Cluster III, no other gene apart from LOC101515142 seems to be prominent for flowering.
Finally, LOC101492009 and LOC101510831 show the most similar expression pattern with CaELF3a in Cluster V. LOC101492009 is the TIFY5A protein, and LOC101510831 is a helicase-like transcription factor CHR28, both with the stress response GO Slim term. Moreover, in this Cluster V fall LOC101504196 (ethylene-responsive transcription factor 12) with 2 SNPs and LOC101500880, another DOF transcription factor (dof zinc finger protein DOF5.3-like) with a SNP in 3’UTR. Thus, there are two members of the ERF family and two members of the DOF family that fall into the same clusters that the genes appearing to be homologous to those found in the FLOR-ID A. thaliana dataset.
Table 4 Co-expressed genes associated with the four genes affected by HHQ-I-C/E variants (highlighted in bold) homologous to genes included in FLOR-ID
Cluster
|
Gene ID
|
Variants
|
GO Slim Name
|
Gene Description
|
I
|
LOC101492907
|
8
|
P: lipid metabolic process
|
enoyl-CoA delta isomerase 2, peroxisomal-like
|
I
|
LOC101495155
|
33
|
F: hydrolase activity
|
GDSL esterase/lipase At5g45920
|
I
|
LOC101511773
|
1
|
F: hydrolase activity
|
ATP-dependent zinc metalloprotease FTSH 4, mitochondrial-like
|
I
|
LOC101497641
|
8
|
F: protein binding
|
heterogeneous nuclear ribonucleoprotein U-like protein 1
|
I
|
LOC101505696
|
8
|
P: response to biotic stimulus; P: response to external stimulus; P: response to stress
|
putative disease resistance protein At3g14460
|
I
|
LOC101507442
|
1
|
F: DNA binding
|
B3 domain-containing transcription factor VRN1-like
|
I
|
LOC101513952
|
14
|
P: response to chemical; P: response to endogenous stimulus; P: biosynthetic process; P: signal transduction; F: DNA binding
|
auxin response factor 2
|
I
|
LOC101491064
|
72
|
P: biosynthetic process; F: DNA-binding transcription factor activity; F: DNA binding
|
dof zinc finger protein DOF 4.7-like
|
I
|
LOC101507871
|
1
|
F: protein binding
|
pentatricopeptide repeat-containing protein At4g20740-like
|
I
|
LOC101499101
|
1
|
P: post-embryonic development; P: response to light stimulus; P: biosynthetic process
|
B-box zinc finger protein 24
|
I
|
LOC101513083
|
10
|
P: biosynthetic process; F: DNA-binding transcription factor activity
|
uncharacterized LOC101513083
|
I
|
LOC101510178
|
6
|
P: biosynthetic process; F: DNA binding
|
homeobox-DDT domain protein RLT1
|
I
|
LOC101491273
|
4
|
P: response to chemical; P: response to endogenous stimulus; P: biosynthetic process; P: signal transduction; F: DNA-binding transcription factor activity;
F: DNA binding
|
ethylene-responsive transcription factor 1B
|
III
|
LOC101504748
|
1
|
P: biosynthetic process
|
spermidine synthase 1
|
III
|
LOC101508062
|
19
|
F: kinase activity; F: protein binding
|
protein STRUBBELIG-RECEPTOR FAMILY 3-like
|
III
|
LOC101496576
|
29
|
P: biosynthetic process; F: DNA binding
|
TATA box-binding protein-associated factor RNA polymerase I subunit B
|
III
|
LOC101495479
|
37
|
P: biosynthetic process
|
THO complex subunit 7A-like
|
III
|
LOC101505912
|
1
|
F: kinase activity
|
protein kinase PINOID-like
|
III
|
LOC101514288
|
37
|
P: biosynthetic process
|
splicing factor U2af large subunit B-like
|
III
|
LOC101488340
|
102
|
P: biosynthetic process; F: protein binding
|
proteinaceous RNase P 2
|
III
|
LOC101515142
|
94
|
P: biosynthetic process
|
mediator of RNA polymerase II transcription subunit 16-like
|
III
|
LOC101501305
|
1
|
P: signal transduction
|
14-3-3-like protein C
|
III
|
LOC101500879
|
41
|
P: biosynthetic process; F: protein binding
|
pre-mRNA-splicing factor SYF1
|
III
|
LOC101491071
|
41
|
F: protein binding
|
phospholipase A-2-activating protein
|
III
|
LOC101501014
|
1
|
F: kinase activity; F: protein binding
|
probable inactive leucine-rich repeat receptor-like protein kinase At3g03770
|
III
|
LOC101504099
|
1
|
P: biosynthetic process; F: hydrolase activity
|
ribosome biogenesis protein BMS1 homolog
|
V
|
LOC101493874
|
15
|
P: response to chemical; P: response to stress; F: protein binding
|
E3 ubiquitin-protein ligase RMA1H1-like
|
V
|
LOC101515146
|
1
|
F: chromatin binding
|
uncharacterized LOC101515146
|
V
|
LOC101491385
|
4
|
F: hydrolase activity
|
non-cyanogenic beta-glucosidase-like
|
V
|
LOC101503100
|
43
|
P: response to chemical; P: response to endogenous stimulus; P: signal transduction; F: transporter activity
|
lysine histidine transporter-like 8
|
V
|
LOC101506337
|
1
|
P: biosynthetic process; F: hydrolase activity
|
U4/U6.U5 tri-snRNP-associated protein 2-like
|
V
|
LOC101504196
|
2
|
P: biosynthetic process; F: DNA-binding transcription factor activity; F: DNA binding
|
ethylene-responsive transcription factor 12
|
V
|
LOC101500880
|
1
|
P: biosynthetic process; F: DNA-binding transcription factor activity; F: DNA binding
|
dof zinc finger protein DOF5.3-like
|
V
|
LOC101509325
|
2
|
P: response to chemical; P: response to endogenous stimulus; P: signal transduction
|
two-component response regulator ARR17
|
V
|
LOC101512564
|
86
|
F: hydrolase activity
|
allantoinase
|
V
|
LOC101507112
|
1
|
P: biosynthetic process
|
SAC3 family protein B
|
V
|
LOC101492009
|
1
|
P: response to chemical; P: response to endogenous stimulus; P: signal transduction; P: response to stress
|
protein TIFY 5A
|
V
|
LOC101489432
|
1
|
P: post-embryonic development; P: response to light stimulus; P: reproduction
|
protein EARLY FLOWERING 3a
|
V
|
LOC101510831
|
3
|
P: response to stress; P: DNA metabolic process
|
helicase-like transcription factor CHR28
|