1. Frequency and distribution of EST-SSRs in oblata
In this study, 104,691 unigenes with a total length of 89.3 Mb were scanned by the Simple Sequence Repeat Identification Tool (SSRIT) [34], and 10,988 potential EST-SSRs were discovered from 9,864 (9.4%) unigenes, with an average of one SSR per 8.13 kb. Among these, 977 unigenes contained more than one EST-SSR loci. Of these potential SSRs, di-nucleotide repeats were most abundant with a frequency of 32.86% (3,611) followed by penta-nucleotide (23.25%, 2,555), tri-nucleotide (18.08%, 1,986), hexa-nucleotide (15.06%, 1,655) and tetra-nucleotide repeats (10.75%, 1,181) (Table 1).
Table 1 Summary of EST-SSR searching results in S. oblata transcripts.
Searching items
|
Numbers
|
Total number of sequences examined
|
104,691
|
Total size of examined sequences (bp)
|
89,306,170
|
Total number of identified SSRs
|
10,988
|
Number of SSR containing sequences
|
9,864 (9.4%)
|
Number of sequences containing more than 1 SSR
|
977
|
Frequency of SSRs
|
1/8.13 kb
|
Di-nucleotide
|
3,611 (32.86%)
|
Tri-nucleotide
|
1,986 (18.08%)
|
Tetra-nucleotide
|
1,181 (10.75%)
|
Penta-nucleotide
|
2,555 (23.25%)
|
Hexa-nucleotide
|
1,655 (15.06%)
|
The major types among repeat motifs were shown in Figure 1a. The AT/TA (11.21%, 1,232) motif was the most common type, followed by TC/GA (8.2%, 903), AG/CT (7.3%, 798), CA/TG (3.3%, 368), AAAT/ATTT (3.1%, 340), AC/GT (2.8%, 306), AAAAT/ATTTT (2.5%, 278), and AAT/ATT (2.9%, 251) (Figure 1a). 96.6% of SSRs had length of 12 to 30 bp where 18 bp was the most common, while length of 3.4% of SSRs ranged from 31-75bp. Figure 1b indicates that three tandem repeats (31.1%, 3,418) were the most abundant followed by six (14.7%, 1,610), four (1,490, 13.6%), and five tandem repeats (13.1%, 1,440). However, motifs of more than 15 tandem repeats were notably rare (2.16%) (Figure 1b).
2. EST-SSR markers development and polymorphic microsatellite loci screening
A total of 2,042 EST-SSRs were selected for primer synthesis after the removal of ESTs with short flanking sequences or that were not suitable with primer design conditions. Among these, 932 (45.7%) primer pairs exhibited clear and repeatable bands, including 324 di-, 223 tri-, 88 tetra-, 161 penta-, and 136 hexa-nucleotides. Information about 932 EST-SSR primers is available in Table S1. In addition, 245 primer pairs produced fragments that were larger than expected. The remaining 865 primer pairs failed to produce any bands or produced multiple bands under different amplification conditions, which was likely due to assembly errors in sequences or primers. Thus, no further analysis was considered for them. All 932 primer pairs were selected for polymorphisms in eight S. oblata genotypes and 248 (12.1%) generated reproducible polymorphic products by PAGE, including 110 di-, 49 tri-, 18 tetra-, 47 penta-, and 24 hexa-nucleotides. The polymorphic ratio was 34.0, 22.0, 20.5, 29.2, and 17.6%, respectively.
3. Genetic diversity and population structure
Thirty polymorphic EST-SSR markers in accordance with Hardy-Weinberg equilibrium were used to evaluate the genetic diversity and population structure of 192 S. oblata individuals. The results showed that 234 alleles were detected, and the mean number of alleles (NA) was 7.8, ranging from 3 to 16 (Table 2). Furthermore, the observed heterozygosity (HO) and expected heterozygosity (HE) ranged from 0.21 to 0.87 (mean 0.52) and 0.25 to 0.89 (mean 0.56), respectively. Among these, the Ho was lower than HE, indicating that inbreeding mainly affected the cultivated population of S. oblata. The Shannon’s information index for these loci ranged from 0.53 to 2.33 with an average of 1.15. The polymorphic index content (PIC) ranged from 0.23 for SO711 to 0.88 for SO525 with an average of 0.51. Nearly 93% (28) EST-SSR markers showed high or medium levels (PIC > 0.25) of genetic information, and only two markers (SO310 and SO711) had a low polymorphism level (PIC < 0.25). This suggests that these loci embodied a wealth of genetic information and that could be used for genetic diversity research on Syringa germplasms.
Table 2 Polymorphism information of 30 EST-SSR markers in 192 individuals. Number of alleles (NA); number of effective alleles (NE); Shannon’s information index (I); observed heterozygosity (HO); expected heterozygosity (HE); and polymorphism information content (PIC); untranslated region (UTR); coding sequence (CDS). All 30 polymorphic markers were at Hardy-Weinberg equilibrium (significance is P ≤ 0.01).
Primer name
|
SSR position
|
NA
|
NE
|
I
|
HO
|
HE
|
PIC
|
SO060
|
CDS
|
14
|
4.34
|
1.81
|
0.65
|
0.77
|
0.74
|
SO095
|
UTR
|
8
|
2.41
|
1.13
|
0.56
|
0.59
|
0.52
|
SO139
|
CDS
|
4
|
2.30
|
0.92
|
0.53
|
0.57
|
0.47
|
SO208
|
UTR
|
10
|
3.54
|
1.66
|
0.63
|
0.72
|
0.69
|
SO212
|
UTR
|
9
|
2.07
|
1.07
|
0.45
|
0.52
|
0.48
|
SO284
|
UTR
|
9
|
1.63
|
0.85
|
0.41
|
0.39
|
0.36
|
SO296
|
CDS
|
9
|
3.39
|
1.55
|
0.66
|
0.71
|
0.67
|
SO310
|
UTR
|
7
|
1.34
|
0.59
|
0.21
|
0.26
|
0.24
|
SO328
|
UTR
|
14
|
4.78
|
1.82
|
0.60
|
0.79
|
0.76
|
SO336
|
UTR
|
8
|
3.92
|
1.57
|
0.80
|
0.75
|
0.71
|
SO364
|
CDS
|
5
|
1.80
|
0.84
|
0.44
|
0.44
|
0.40
|
SO376
|
UTR
|
8
|
3.96
|
1.54
|
0.69
|
0.75
|
0.71
|
SO381
|
UTR
|
4
|
2.47
|
0.99
|
0.55
|
0.60
|
0.51
|
SO387
|
UTR
|
3
|
1.39
|
0.53
|
0.27
|
0.28
|
0.26
|
SO413
|
UTR
|
10
|
2.35
|
1.12
|
0.53
|
0.58
|
0.52
|
SO415
|
CDS
|
5
|
2.12
|
0.92
|
0.54
|
0.53
|
0.45
|
SO469
|
UTR
|
9
|
3.41
|
1.40
|
0.57
|
0.71
|
0.66
|
SO504
|
UTR
|
4
|
2.13
|
0.90
|
0.54
|
0.53
|
0.45
|
SO508
|
UTR
|
16
|
7.06
|
2.23
|
0.87
|
0.86
|
0.84
|
SO525
|
CDS
|
16
|
8.78
|
2.33
|
0.69
|
0.89
|
0.88
|
SO528
|
UTR
|
5
|
1.37
|
0.53
|
0.31
|
0.27
|
0.25
|
SO540
|
CDS
|
5
|
1.52
|
0.60
|
0.42
|
0.34
|
0.30
|
SO557
|
CDS
|
13
|
4.82
|
1.86
|
0.80
|
0.79
|
0.77
|
SO649
|
CDS
|
6
|
2.49
|
1.05
|
0.54
|
0.60
|
0.52
|
SO696
|
UTR
|
3
|
1.52
|
0.54
|
0.30
|
0.34
|
0.29
|
SO711
|
CDS
|
6
|
1.33
|
0.53
|
0.28
|
0.25
|
0.23
|
SO783
|
UTR
|
10
|
3.17
|
1.4
|
0.57
|
0.69
|
0.63
|
SO813
|
CDS
|
5
|
1.86
|
0.91
|
0.40
|
0.46
|
0.42
|
SO833
|
CDS
|
6
|
1.41
|
0.61
|
0.26
|
0.29
|
0.27
|
SO889
|
UTR
|
3
|
1.94
|
0.69
|
0.45
|
0.48
|
0.37
|
Mean
|
|
7.8
|
2.89
|
1.15
|
0.52
|
0.56
|
0.51
|
The existence of population structure will lead to the increase of linkage disequilibrium (LD) level, which may result in the correlation between the target traits and unrelated loci. Thus, the analysis and adjustment of population structure is the premise of carrying out association analysis. The population structure of 192 individuals was analyzed based on 30 polymorphic markers via STRUCTURE 2.3.4 and a clear peak was obtained at the value K = 2 (Figure 2a) using the statistical model of Evanno et al. [35]. Accordingly, the 192 individuals can be divided into two subpopulations, that is, POP1 specified in red (41 individuals) and POP2 specified in green (151 individuals). As shown in Figure 2b, each individual is represented by a thin vertical line and classified according to its estimated membership probability (Q), which was used for the structure-based association mapping. As an alternative strategy to using the STRUCTURE algorithm, principal component analysis (PCA) is widely used to identify population subpopulations. The PCA separated the association population into two subpopulations (Figure S1), which the clustering results were similar to the clustering results obtained using STRUCTURE.
4. Phenotypic traits analysis and single‑marker associations
The variance degree of the target traits in population was an important parameter for association mapping. The values of variation coefficient of quantitative traits ranged from 19.72% to 30.22% and the statistical values of the distributions were presented in Table S2. Moreover, correlation analysis between different traits showed 14 significant correlations (P < 0.05), of which 12 showed a highly significant correlation (P < 0.01) (Table S3). Of these, inflorescence length, inflorescence width, corolla lobe length, corolla lobe width and corolla tube length all had highly significant positive correlations. In addition, the highly significant positive correlations were observed between corolla lobe (state) and corolla lobe (periphery). More details of the phenotypic correlations among the nine traits in the association population were presented in Table S3. Meanwhile, Q cluster analysis for nine phenotypic traits of 192 individuals showed that the population divided into two subgroups, which was similar to the results of STRUCTURE basing EST-SSR markers (Figure S2).
For association mapping, understanding the patterns of LD is an important prerequisite. 119 polymorphic markers with minor allele frequency (MAF) > 1% were used to analyze the LD level in the 192 cultivated S. oblata individuals. The results showed that the r2 ranged from 0.0001 to 0.5154 for all loci pairs. The LD level was low and most of the markers were in linkage equilibrium (r2 < 0.1; P < 0.001). 899 loci pairs had linkage disequilibrium (P < 0.001), and 830 had r2 > 0.005 (83.1%) (Figure 3). Of course, it was also found that there was a strong LD level among some SSR loci, such as markers SO015-SO428 (r2 > 0.3; P < 0.001).
The association analysis between 119 SSRs and nine traits was carried out based on the mixed linear model (MLM) model. A total of 1071 single-marker association tests were performed, of which 20 associations were significant (P < 0.01) following multiple test corrections using the false discovery rate (FDR) method at a significance level of Q < 0.01, involving nine traits with 17 SSRs. For each trait, the number of significant associations varied ranging from zero to four. These loci explained a phenotypic variance ranging from 0.36% to 20.76%, with an average rate of 5.69% (Table 3). Of these, four SSR markers were detected which were significantly associated with corolla lobe (state) and corolla lobe width, respectively. Corolla lobe (periphery) had three significant associations; inflorescence length, corolla lobe length and petal color had two significant associations each; one significant association each with inflorescence width, corolla tube and florescence were observed in the association population (Q < 0.01; Table 3). In the present study, three SSRs (marker SO104, SO695 and SO790) exhibited significant associations with multiple traits, suggesting the pleiotropic effect or the continuity of the genomic regions for certain traits. For eight of the 20 associations, the mode of gene action is consistent with under- or over dominance; the remaining 12 markers were separated between modes of gene action that were additive (7) or partially to fully dominant (6) (Table 3).
Table 3 Summary of significant SSR marker-trait pairs from the association test results in the S. oblata populations after correction for multiple testing errors. P value: significance level for association (significance is P < 0.01); Q value: a correction for multiple testing (FDR (Q) < 0.01); R2: percentage of the phenotypic variance explained; a: additive; d: dominance; Sp: standard deviation for the phenotypic trait under consideration. The algorithm and formulas for gene action were calculated as previously reported [36-37].
Trait
|
Locus
|
P value
|
Q value
|
R2 (%)
|
2a
|
d
|
d/a
|
2a/Sp
|
Inflorescence length
|
SO208
|
2.61E-05
|
6.53E-05
|
3.42
|
1.264
|
5.016
|
7.934
|
0.046
|
|
SO112
|
4.60E-08
|
2.48E-30
|
7.69
|
-5.468
|
0.889
|
-0.325
|
-0.200
|
Inflorescence width
|
SO627
|
6.21E-04
|
5.10E-29
|
7.29
|
-0.810
|
0.125
|
-0.308
|
-0.052
|
Corolla lobe length
|
SO608
|
3.60E-16
|
5.41E-26
|
4.30
|
-0.171
|
0.231
|
-2.696
|
-0.110
|
|
SO695
|
8.95E-05
|
1.80E-15
|
4.59
|
-0.367
|
-0.222
|
1.211
|
-0.235
|
Corolla lobe width
|
SO060
|
1.24E-31
|
1.15E-09
|
5.36
|
-0.705
|
0.116
|
-0.328
|
-0.705
|
|
SO311
|
5.18E-04
|
1.53E-06
|
3.65
|
0.052
|
0.055
|
2.104
|
0.052
|
|
SO531
|
3.45E-04
|
1.53E-06
|
0.36
|
4.043
|
0.362
|
0.179
|
4.043
|
|
SO695
|
6.50E-03
|
7.16E-05
|
3.68
|
-0.161
|
-0.049
|
0.609
|
-0.161
|
Corolla tube length
|
SO649
|
3.58E-05
|
9.40E-05
|
20.76
|
-0.059
|
0.008
|
-0.258
|
-0.025
|
Corolla lobe (state)
|
SO790
|
5.35E-07
|
1.49E-04
|
0.93
|
1.602
|
0.913
|
1.140
|
2.347
|
|
SO503
|
4.78E-04
|
7.16E-05
|
11.78
|
0.277
|
0.146
|
1.057
|
0.406
|
|
SO415
|
5.17E-05
|
7.35E-04
|
2.30
|
0.182
|
0.208
|
2.288
|
0.267
|
|
SO663
|
2.88E-10
|
7.40E-04
|
1.89
|
0.034
|
0.045
|
2.634
|
0.050
|
Corolla lobe (periphery)
|
SO104
|
1.60E-03
|
8.28E-04
|
4.61
|
0.794
|
0.196
|
0.493
|
0.989
|
|
SO505
|
8.11E-27
|
8.70E-04
|
9.58
|
0.085
|
0.706
|
16.610
|
0.106
|
|
SO331
|
6.96E-04
|
8.87E-04
|
7.08
|
0.290
|
0.022
|
0.150
|
0.361
|
Florescence
|
SO790
|
5.10E-30
|
1.78E-03
|
7.92
|
2.935
|
1.389
|
0.947
|
7.919
|
Petal color
|
SO104
|
7.54E-04
|
4.48E-03
|
2.62
|
0.074
|
-0.296
|
-8.008
|
0.056
|
|
SO805
|
4.60E-03
|
6.50E-03
|
4.00
|
0.066
|
0.189
|
5.756
|
0.050
|