Genotyping-by-sequencing derived SNP markers
Sequencing using Illumina NovaSeqTM 6000 generated an average of 4.31 million high-quality read tags for the 165 chile pepper genotypes. After further processing and quality control based on various filtering criteria, 75,839 SNP markers distributed across the 12 chromosomes of Capsicum were discovered. Out of this number, 66,750 SNP markers (88%) have known map positions in the Zunla-1 reference genome [8]. Only the markers with known positions were used for genetic diversity analysis. Average frequency of minor allele for the 66,750 SNP loci was 0.21, and the proportion of heterozygotes was 0.05. Across the SNP sites, the most common allele was the ‘G’ allele (23.84%), followed by ‘A’ (23.79%), ‘T’ (23.55%), and ‘C’ (23.52%). Altogether, 5.31% of the sites have ambiguous nucleotide calls. Chromosomes P3 (9,250 SNP markers), P1 (7,365), and P2 (6,987) had the highest number of markers, whereas P11 (3,915), P9 (4,024), and P5 (3,915) had the least number of SNP loci. In total, 38,587 (57.80%) of the SNP sites have transition substitutions, whereas 28,163 (42.20%) have transversions.
Analysis of molecular variance and principal components
Analysis of molecular variance using genomewide SNP markers revealed majority of variation to be among the Capsicum populations (76.08%) (Table 1). Variations among samples within a population accounted for 14.28%, whereas within sample variation was 9.64%. Principal components analysis (PCA) revealed four major groups based on species (Fig. 1a). The C. annuum and the chiltepins (C.annuum var. glabriusculum; considered as the progenitors of domesticated C. annuum var. annuum) formed a distinct cluster (Group I), whereas C. baccatum and C. chacoense formed the second group. The C. frutescens and C. chinense represented Groups III and IV, respectively. The first principal component (PC1) accounted for 53.9% of variation, whereas PC2 accounted for 6.3% of the total variation.
Results from the PCA were consistent with clustering based on a neighbor-joining (NJ) phylogenetic analysis for the Capsicum population (Fig.1b). A NJ genetic analysis for NMSU chile pepper varieties revealed two distinct clusters based on species (Fig. 2). The C. annuum varieties formed a separate group, whereas C. frutescens and C. chinense clustered together. Within the NMSU C. annuum group (Cluster I), there were seven subclusters differentiated based on their fruit or pod type. Group A consisted of the chile piquin, whereas the ornamental chile peppers comprised Group B. The jalapeno types comprised Group C, and Group D contained the serrano peppers. Groups E and F consisted of the cayenne and de arbol types, respectively. Finally, Group G comprised of the New Mexican chile peppers, including the paprika type. Cluster II (C. frutescens and C. chinense) comprised of the tabasco and habanero types, respectively, on separate branches.
Genetic diversity
Various measures of genetic diversity are presented in Table 2. The level of observed heterozygosity (Ho) across the population was 0.06. Both the C. annuum (Group I) and C. baccatum and C. chacoense (Group II) complexes had a Ho of 0.04. C. frutescens (Group III) and C. chinense (Group IV) had Ho values of 0.05 and 0.10, respectively. Inbreeding coefficient for the Capsicum population was 0.54. Within the groups, Group I (C. annuum) had the highest coefficient of inbreeding (0.70), followed by Group IV (C. chinense) (0.51). Group II (C. baccatum and C. chacaoense) had the least value for inbreeding coefficient (0.34). Gene diversity (Hs) was highest among the C. chinense (0.20), followed by the C.annuum (0.13), and C. frutescens (0.08). The whole Capsicum population had a Hs value of 0.12. Observed nucleotide diversity (π) across the whole population was 0.33. Within the species, C. chinense had the highest π (0.17), followed by the C. annuum var. annuum and C. annuum var. glabriusculum complex (0.12). Expected nucleotide diversity (θ) for the whole Capsicum panel was 0.18. Similarly, within the individual species, C. chinense had the highest value for θ, followed by the C. annuum and chiltepin complex with 0.19 and 0.13, respectively.
Tajima’s D statistic for the Capsicum population across all chromosomes was D= 2.85 (Fig. 3). Within the individual chromosomes, P8 had the greatest value for D (2.97), followed by P1 and P12 (D= 2.91). Chromosome P5 had the lowest value for Tajima’s statistic (D= 2.78). Negative values for D were observed for the individual species. Within the clusters, Group II (C. baccatum and C. chacoense) with D= -2.39 had the least value for Tajima’s coefficient, followed by Group III (C. frutescens) with D= -1.41. Group I (C. annuum and C. annuum var. glabriusculum) had a D value of -0.19, whereas Group IV (C. chinense) had a value of -0.39. Chile pepper varieties previously released by the NMSU Chile Pepper Breeding Program had a D value of -0.29.
Population structure and linkage disequilibrium
Inference for the best number of clusters, K using the Evanno criterion revealed K= 2 (ΔK = 6572.84) (Figs. 4a, b; Additional file 1, Table S1) to be the optimal number that best represents the Capsicum population. Cluster 1 comprised of C. frutescens and C. chinense (N= 44 genotypes), whereas cluster 2 consisted of the C. annuum, C. baccatum, and C. chacoense (N= 121) (Additional file 1, Table S2). In addition, K= 9 and K= 4 showed high ΔK relative to the other clusters, which indicates that these can also serve as alternative values to describe the genetic differentiation in the Capsicum population. For K= 4 (ΔK =110.73; Fig. 4c), C. annuum genotypes were divided into two clusters, where cluster 1 was an admixed of 71 genotypes, including 22 chiltepins and 49 ornamental, chile piquin, de arbol, jalapeno, and serrano types (Additional file 1, Table S3). Cluster 2 comprised of 43 C. annuum varieties which consisted of either the New Mexican or paprika types. C. baccatum, C. frutescens, and C.chacoense complexes were grouped in cluster 3, whereas cluster 4 consisted of the C. chinense genotypes.
Analysis of linkage disequilibrium (LD) identified more than 3.11 M intrachromosomal marker pairs across the 12 chromosomes of chile peppers (Additional file 1, Table S4). Mean values for LD coefficients (r2) ranged between 0.04 (P12) and 0.35 (P4). Average distance (in Mb) of all pairs was lowest for chromosomes P2 (0.59), P8 (0.70), and P3 (0.73). At least 80% of the pairs were in significant LD (P < 0.05) across all chromosomes, with chromosome P1 having the largest percentage of significant marker pairs (84.40%). Chromosome P2 had the least average distance of pairs in significant LD (0.61), followed by P8 and P3 (both with 0.77), and P6 (0.97). Total number of marker pairs in complete LD (r2=1.0) was 82,808 (2.65%). Chromosome P3 had the highest number of pairs in complete LD (13,720), followed by P8 and P2, with 10,386, and 9,062 marker pairs, respectively. Chromosome P1 had only 23 intrachromosomal pairs in complete LD. The average distance (of marker pairs in complete LD ranged between 0.40 (P1) and 2.12 Mb (P11). Analysis of LD decay by plotting r2 against distance revealed an extensive LD for the whole population, where LD starts to decay at ~5.59 Mb (Fig. 4d). Within the individual chromosomes, LD extends up to 14.78 Mb for chromosome P5. LD starts to decay at 0.07 and 0.38 Mb for the C. annuum and C. chinense complexes, respectively.