Data Filtering and High Quality Clean Reads Mapped with Reference Genome
The base composition and quality value before and after filtering is shown in Additional file 1: Figure S1, Additional file 2: Table S1, and Additional file 3: Table S2. The Q20 value of all samples was >98.9% and the Q30 value 96%. The alignment rates to the reference genome for each sample, including the parental cultivars ‘Red Globe’ (R.G) and ‘Centennial seedless’ (C), and pooled seeded (S) and seedless (SL) progeny, were >95% (Table 1). The depth of sequencing coverage is 16X in ‘C’, 19X in ‘R’, 15X in ‘SL’, and 17X in ‘S’.
Table 1. Information of high quality clean reads mapped with reference genome
Sample
|
All Reads
|
Single Mapped Reads
|
Paired Mapped Reads
|
Unmapped Reads
|
Align Ratio(%)
|
C
|
57160754
|
136895
|
27239591
|
2544677
|
95.55
|
R
|
68761692
|
179104
|
32691359
|
3199870
|
95.35
|
SL
|
50978504
|
192890
|
24333081
|
2119452
|
95.84
|
S
|
56263426
|
224997
|
26903173
|
2232083
|
96.03
|
Identification and Classification of Single Nucleotide Polymorphisms (SNPs) and Small Indels Based on Genome Re-sequencing
To catalog genetic diversity in grapevine and its potential influence on ovule development, we carried out whole genome re-sequencing of the parental cultivars (‘Red Globe’ and ‘Centennial seedless’), and their pooled seeded and seedless progeny. A total of 6,379,614 SNPs and 745,433 small indels were identified. All of the identified variants were indexed based on their genomic position (Figure 2a). Most of the variants were found in the intergenic regions, followed by intronic, upstream, downstream and exonic regions. The least number of variants were found in the 5’ UTR regions. A total of 222,652 variants were located within exons. These were further classified based on their potential effects on gene function (Figure 2b). And more attention was given to those variants predicted to cause amino acid changes in protein coding genes (frameshift, substitution, premature termination, and loss of termination). Four kinds of transitions and eight kinds of transversions were characterized, with C to G transversion and G to A transition the most common. The ratio of transition to transversion was 0.48, which is close to the ideal ratio of 0.5.
Gene Ontology (GO) and Pathway Analysis of Functional Variants associated DEGs
Functional variations were identified from genome data and the number of genes containing these variants are shown in Table 2. In addition, a relative comprehensive statistical analysis was conducted in combination of both genome and transcriptome data, which used the same progeny population, and the number of functional variants containing genes identified as DEGs at least at one stage and at all three developmental stages were counted based on function classification of variants (Table 2). With respect to ovule development comparing the seeded and the seedless progeny at Stage 1, Stage 2 and Stage 3, a total of 2,425, 2,988 and 2,328 functional variants containing genes showed significant differential expression pattern during seeded and seedless progeny ovule development, respectively (Additional file 4:Table S3). At each ovule developmental stage, the number of up-regulated DEGs (seedless progeny/seeded progeny) were found to be higher than down-regulated ones. Moreover, the number of DEGs containing functional variations were found to be highest at Stage 2, irrespective of whether the gene was up- or down-regulated.
Table 2. Function classification of genes containing functional variants
Function classification
|
Associated genes
|
Associated DEGs
at least at one stage
|
Associated DEGs
at all three stages
|
frameshift deletion
|
454
|
124
|
35
|
frameshift insertion
|
309
|
91
|
24
|
nonsynonymous SNP
|
11381
|
3790
|
1147
|
stopgain
|
deletion
|
10
|
3
|
1
|
insertion
|
13
|
3
|
1
|
|
SNP
|
298
|
68
|
14
|
stopless
|
deletion
|
6
|
2
|
1
|
insertion
|
15
|
6
|
3
|
|
SNP
|
214
|
52
|
15
|
To explore the biological processes and pathways in which the genes with functional variations involved, we performed a GO and pathway analysis of the DEGs containing functional variants at three developmental stages. In general, results of both enriched GO and pathway were basically consistent and focused on aspects involved in seed development. A total of 162 GO terms were found to be significantly enriched at all three developmental stages (Figure 3a). The GO terms associated with up-regulated DEGs were mainly involved in biological processes like ‘plant defense’, ‘regulation of reproductive development’, ‘regulation of hormone balance’, ‘seed coat development’, ‘endosperm and seed development’, and ‘cell death’ (Figure 3b). On the other hand, GO terms associated with down-regulated DEGs were mainly related to ‘lignin biosynthetic process’, ‘secondary metabolic process’, ‘oxidation reduction’ and ‘iron ion transport’ (Figure 3b). A total of 124 pathways were found to be enriched at all three ovule developmental stages (Figure 4a), and these enriched pathways were mainly related to the regulation of hormone balance (‘cytokinins degradation’, ‘cis-zeatin biosynthesis’, ‘jasmonic acid biosynthesis’ and ‘ethylene biosynthesis from methionine’), seed coat development ( ‘cellulose biosynthesis’ and ‘suberin biosynthesis’), secondary metabolic biosynthesis (‘flavonoid biosynthesis’) and oxidation-reduction (‘glutathione-mediated detoxification’) (Figure 4b).
Functional and Expression Analysis of DEGs Containing Functional Variants
We focused further attention on functional variant-containing genes that were differentially expressed in the developing ovules, between seeded and seedless cultivars, at all three ovule developmental stages (Table 2). We estimated gene function based on their annotation, and focused on genes encoding potential disease resistance protein, protein kinase, transcription factor, cytochrome P450 and other factors affecting seed development.
Nonsynonymous SNP, frameshift deletion, frameshift insertion, stopgain and stopless were found to happen in genes encoding protein kinases (Figure 5a). The expression patterns of these DEGs with functional variants were almost similar at all three ovule developmental stages, i.e., mainly up-regulated in seedless progeny compared with seeded ones. We observed that functional variant (nonsynonymous SNP, frameshift deletion, frameshift insertion and stopless) happened in DEGs encoding disease resistance proteins(Figure 5b). Among which, the expression pattern of all the disease resistance related DEGs was almost same during three developmental stages, and most of the DEGs were up-regulated in the ovules of seedless progeny compared with seeded progeny. Functional variants (nonsynonymous SNP, frameshift deletion, frameshift insertion, stopgain and stopless) were also found to happen in DEGs encoding cytochrome P450-like proteins, and most of these DEGs were up-regulated in seedless progeny compared to seeded ones at all three developmental stages. It is noteworthy that, the expression pattern of these DEGs was almost the same during all developmental stages.
Moreover, many functional variants including nonsynonymous SNP, frameshift deletion, stopgain and stopless happen in DEGs identified as transcription factors (TF)(Figure 6). These TFs were classified into 15 families, among which MYB and BHLH were families with the most number of DEGs havining function variants. Expression of DEGs associated with 10 TF families (WRKY, BHLH, SBP, AP2, Myb, HB, TCP, HSF, CCAAT, and GRAS) were found to be up-regulated at all three ovule developmental stages in seedless progeny compared to seeded progeny. In contrast, a group of genes was down-regulated at all three ovule developmental stages in seedless progeny compared to seeded ones; these included one gene from each family of NAC, GATA, bZIP, ERF and two genes from MADS-box family.
Functional variants also happened in DEGs related to seed development, including heavy-metal-associated protein, cellulose-related protein, ankyrin repeat-containing protein, senescence-associated protein, hormone-related protein, calmodulin, methyltransferase, cell cycle, ABC transporter protein, thaumatin, aspartic proteinase and flavonoid synthase. Expression pattern of these DEGs is shown in Figure 7. Two kinds of functional variants were identified in heavy-metal-associated protein (nonsynonymous SNP and frameshift insertion) and methyltransferase (nonsynonymous SNP and frameshift deletion), while only one kind of function variant (nonsynonymous SNP) was identified in others. Expression of DEGs encoding cellulose-related protein, ankyrin repeat-containing protein, senescence-associated protein, calmodulin and cell cycle were up-regulated at all three ovule developmental stages in seedless progeny compared to seeded ones, while expression of DEGs encoding several of the other functions were down-regulated. Except one DEG encoding methyltransferase, all other DEGs showed consistency in expression pattern at all three stages of ovule development.
Identification of seedless candidate genes based on SNP-index analysis of two extreme pools
For each progeny bulk, the frequency of each SNP variant (SNP-index) was calculated, and a delta SNP-index was determined (Additional file 5: Figure S2). A number of 3 peaks on chromosomes 11, 13, and 17, as well as 3 valleys on chromosomes 6, 7, and 8 under the threshold for delta-SNP index were identified. A total of 7,923 SNPs were identified within the peak regions, while 360 SNPs were identified in the valley regions. A total of 13 nonsynonymous SNPs were identified in the valley regions, including 1 variant on Chromosome 6 associated with 1 gene, 6 variants on Chromosome 7 associated with 4 genes, and 6 variants on Chromosome 8 associated with 4 genes (Additional file 6:Table S4). In the peak regions, a total of 38 functional variants were identified (Additional file 7:Table S5), among which 35 variants (34 nonsynonymous SNP and 1 stopgain) were located on Chromosome 13 and were associated with 18 genes. The other 3 nonsynonymous SNP were found on Chromosome 17 and associated with 2 genes.
In general, based on the SNP-index analysis, a total of 33 genes were identified as potential preliminary candidates containing functional variant and some of them have more than one functional variant. We generated transcriptomics data using ovules from progeny of the same hybrid population as the genomic sample and examined the expression pattern of these genes during three different stages of ovule development. We defined DEGs in at least one of the three stages of ovule development with functional variant as the standard and nine candidate genes were further analyzed in this way and corresponding information can be found in Table 3 and Additional file 8: Figure S3. Among nine selected genes, G2, was down-regulated during all ovule developmental stages in the seedless progeny. Whereas, the expression of the other eight candidate genes was up-regulated at least during one ovule developmental stage in seedless progeny compared to the seeded ones. It is notable that the arginine-197-to-leucine substitution on Chromosome 18 in VviAGL11 [18] was also identified in our results, however, compared with other genes, this gene did not perform well in our SNP-index analysis, so not listed as a key candidate gene for further analysis. According to our transcriptomic analysis, VviAGL11 is not identified as DEGs, and our results are consistent with the previous findings, so we have not considered this gene for further research. Motif analysis of the mutation region was carried out, and results showed that mutation in G1 happened in ‘ML domain’, G2 and G3 in ‘Low complexity region’, and mutation in G6 caused stopgain (Additional file 9: Table S6).
Table 3. Identified candidate genes based on expression analysis from transcriptome
Gene ID
|
Gene name
|
Chr
|
Position
|
Ref
|
Mut
|
Description
|
Ratio(Seedless/Seeded)
|
Stage 1
|
Stage 2
|
Stage 3
|
GSVIVG01036048001
|
G1
|
chr6
|
21418457
|
C
|
G
|
Phosphatidylglycerol/phosphatidylinositol transfer protein
|
1
|
2.19
|
0.89
|
GSVIVG01011008001
|
G2
|
chr7
|
2257286
|
G
|
A
|
Putative auxin response factor 2
|
0.2
|
0.08
|
0.56
|
GSVIVG01011007001
|
G3
|
chr7
|
2266373
|
C
|
T
|
Serine protease
|
2.37
|
1.48
|
6
|
2266664
|
C
|
T
|
GSVIVG01011006001
|
G4
|
chr7
|
2278664
|
A
|
C
|
Monooxygenase, putative
|
5.99
|
3.15
|
6.07
|
GSVIVG01000977001
|
G5
|
chr13
|
11040377
|
A
|
T
|
Histidine kinase-like ATPase domain-containing protein
|
2.75
|
0.12
|
0.83
|
GSVIVG01000993001
|
G6
|
chr13
|
11222492
|
G
|
A
|
Tir-nbs-lrr resistance protein
|
188
|
3.51
|
33.3
|
11223120
|
T
|
C
|
11223228
|
C
|
A
|
11223476
|
C
|
T
|
GSVIVG01000994001
|
G7
|
chr13
|
11229443
|
G
|
A
|
Aluminum-activated malate transporter
|
27.97
|
2.56
|
6.88
|
GSVIVG01000995001
|
G8
|
chr13
|
11240814
|
G
|
A
|
Putative aluminum-activated malate transporter
|
10.62
|
2.88
|
5.59
|
GSVIVG01001005001
|
G9
|
chr13
|
11336347
|
T
|
C
|
Protein trichome birefringence-like 36
|
5.83
|
6.6
|
4.09
|
11337398
|
G
|
C
|
Expression analysis of candidate genes related to seedlessness
To further explore the potential roles of nine candidate genes in ovule development or abortion, their expression patterns were observed in different grapevine tissues (Figure 8) and in ovules at different development stages (Figure 9) from seeded and seedless cultivars. All candidate genes were expressed in reproductive tissues (flowers, tendrils and fruits). Both the G1 and G2 genes showed strong expression in reproductive tissues, compared with nutritional tissues, whereas the G8 gene was expressed more strongly in nutritional tissues (Figure 8). The expression pattern of G1, G8, and G9 genes were similar between seeded and seedless varieties, while G5, G6 and G7 genes were differentially expressed. For example, G5 expression was relatively high in fruit of ‘Red Globe’ and leaves of ‘Thompson Seedless’, while G6 and G7 were relatively highly expressed in the root of ‘Thompson Seedless’.
To further justify and validate our results, we analyzed expression of the nine candidate genes in two seeded (‘Kyoho’ and ‘Red Globe’) and two seedless cultivars (‘Thompson Seedless’ and ‘Flame Seedless’) (Figure 9). The expression pattern of most of the candidate genes was consistent with the transcriptome data, except that some genes showed differences at specific stages, likely related to cultivar difference (Figure 9). Six of the candidate genes exhibited a consistent expression pattern at all three ovule developmental stages between cultivars: four genes (G4, G7, G8 and G9) showed higher expression in seedless cultivars compared to seeded ones, while two genes (G2 and G5) showed the opposite results. In transcriptome data (Table 3), expression of G5 gene was first up-regulated at stage 1 then down-regulated at stage 2 and 3 (seedless progeny/seeded progeny), while G5 gene was down-regulated (seedless cultivars/seeded cultivars) at all three developmental stages in multiple cultivars. Despite G5 gene showing some difference, the expression pattern of the remaining five candidate genes was consistent with transcriptome data, which indicates these candidate genes may be related to seed phenotype differences.
SNaPshot analysis of 5 SNP in 40 grape progeny
A total of 3 SNPs in candidate genes (G2, G4, and G8) were selected to test our genome-sequencing results in 20 seeded and 20 seedless progeny (Table 4). Results showed that SNP in G4 failed in distinguish seed phenotype, while SNP in G2 and G8 showed 30% and 67.5% efficiency in identifying seeded and seedless progeny, respectively. And SNP in G8 may be important and close related to grape seed development.
Table 4. Genotyping results of candidate SNPs in 40 grape progeny
Gene
name
|
SNP
type
|
Genotype
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
|
16
|
17
|
18
|
19
|
20
|
21
|
22
|
23
|
24
|
25
|
26
|
27
|
28
|
29
|
30
|
31
|
32
|
33
|
34
|
35
|
36
|
37
|
38
|
39
|
40
|
|
G2
|
[C/T]
|
C
|
C
|
..
|
..
|
C
|
C
|
C
|
C
|
..
|
C
|
C
|
C
|
C
|
..
|
C
|
C
|
C
|
C
|
C
|
C
|
C
|
..
|
C
|
C
|
C
|
..
|
..
|
..
|
C
|
..
|
C
|
C
|
C
|
..
|
C
|
C
|
C
|
C
|
C
|
C
|
|
G4
|
[G/T]
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
T
|
|
G8
|
[A/G]
|
G
|
G
|
G
|
G
|
G
|
G
|
..
|
G
|
G
|
G
|
..
|
G
|
..
|
..
|
G
|
G
|
G
|
..
|
G
|
..
|
G
|
..
|
G
|
..
|
G
|
..
|
G
|
..
|
..
|
..
|
..
|
..
|
..
|
G
|
..
|
..
|
G
|
G
|
..
|
..
|
|
Note: ‘..’ Means the genotype consistent with the target SNPs. And sample 1-20 in grey indicate seedless progeny, while sample 21-40 seeded progeny.