Genome re-sequencing and SNP calling
In this study, a total of 230 potato individuals were used for re-sequencing, among them, 87 individuals were collected from CIP (international potato center) and 143 individuals were coming from different Chinese potato research institutes (CPRI), which having different drought-resistance (Fig. 1). Through the whole genome resequencing, 17.40 billion paired-end reads were generated, and the average sequencing depth was 9.8× (Table S1 and Figure S1). After calling and filtering, a total of 3,986,623 single nucleotide polymorphisms (SNPs) (missing data rate < 20% and minor allele frequency (MAF) > 0.05) and 1,130,197 Indels were obtained (Table 1, Table S2 ~ 4 and Figure S2).
Table 1
Summary of the whole-genome variations from the re-sequencing of 230 potato individuals.
SNPs | Indels |
Exonic | 382,746 | Exonic | 26,998 |
Intergenic | 2,564,208 | Intergenic | 681,423 |
Intronic | 532,439 | Intronic | 200,358 |
Downstream | 162,257 | Downstream | 69,200 |
Upstream | 167,014 | Upstream | 68,515 |
Upstream/downstream | 20,028 | Upstream/downstream | 10,720 |
Splicing | 836 | Splicing | 746 |
3'UTR | 91,589 | 3'UTR | 44,888 |
5'UTR | 62,015 | 5'UTR | 26,599 |
Stop gain | 2,921 | Stop gain | 643 |
Stop loss | 570 | Stop loss | 107 |
Ts | 2,469,746 | Insertion | 9,858 |
Tv | 1,516,877 | Deletion | 16,390 |
Nonsynonymous | 164,488 | Frame-shift | 16,249 |
Synonymous | 218,258 | Non-frame-shift | 999 |
We found that 2,564,208 SNPs were in intergenic and 915,185 SNPs were in genes, and 382,746 SNPs occurred in exonic (9.6%). Within coding regions, nonsynonymous SNPs (164,488) were less than synonymous SNPs (218,258), and the ratio of nonsynonymous-to-synonymous is 0.74 (Table 1). A total of 26,998 (2.4%) Indels were in exonic, and the number of Insertion and Deletion were 9,858 and 16,390, respectively (Table 1 and Figure S2). Besides, we analyzed the whole genome SNP mutation types and the ratio of transition/transversion (Ts/Tv) is 1.63 (Table S3 and Figure S3). What’s more, 5,756 CNVs (copy number variants) and 6,486 SVs (structure variations) were identified in this study (Table S5). Finally, all the variations generating from re-sequencing were shown by a circos image (Fig. 2).
Selective sweep analysis
To identify the potential selective signatures related to drought-resistance in potato, we selected potatoes with significant differences in drought resistance as two groups based on the phenotypic traits under drought conditions, one was drought susceptible (DS) potato, and the other was drought resistant (DR) potato. Then, the selective sweep analyses were performed by population fixation statistics (FST) and nucleotide diversity between the DS group and DR group (πDS/πDR) in 40-kb sliding windows (a step of 5 kb). The windows with the top 5% of FST as well as πDS/πDR were considered as the selective sweeps (Fig. 5). A total of 680 common selective sweeps were identified between the DS group and DR group in this study (Table S9), and 560 annotated genes located in the selective-sweep regions (Table S10), which could potentially play important roles in the drought resistance of potato.
To further understand the functions of the 560 genes, we performed gene ontology analysis (GO) and KEGG enrichment analysis. The result of GO analysis indicated that these genes related to multiple biological process, cellular component, and molecular function (Figure S10), and many genes were involved in “Plant hormone signal transduction”, “Plant-pathogen interaction”, “Neurotrophin signaling pathway” and “Toll-like receptor signaling pathway” through KEGG analysis (Fig. 6). The genes encoding ZFP protein, E3 ubiquitin ligase, auxin-responsive protein, MYB and ERF transcription factor were found in the selective regions (Table S10), which revealed that the genes identified through selective sweep analysis probably play an important role in the drought resistance of potato.
Genome-wide association analysis for traits under drought conditions
Using the high-quality SNPs and phenotypic data under drought conditions, we further detected the genes related to drought resistance through GWAS analysis. In this study, three different statistical models were used to detected significantly association signals, including generalized linear model (GLM), mixed linear model (MLM) and fixed and random model circulating probability unification (FarmCPU). Based on the LD decay distance (about 35 Kb), the candidate genes were searched on downstream and upstream of the significantly association SNPs. The Manhattan plots and quantile-quantile plots of GWAS for 7 traits under drought conditions were displayed in Figure S11-S16, and the detailed information about candidate genes identified by three models were summarized in Table S11-S13.
Through three models, a set of significant SNPs were detected, and the number of the significant SNPs and associated genes were statistics (Table 2). For branch number (BrN), GLM and FarmCPU models identified 8334 and 9 significant SNPs, respectively. We found that the threshold value of the GWAS of BrN by GLM model was too looser. To reduce the false positives, we detected associated genes with a strict threshold value (top 0.01), and 193 genes associated with 55 significant SNPs were identified in this study (Table S11). one SNP (1:72674289) was the common loci, the related candidate gene PGSC0003DMG400000063 encoded a pollen-specific leucine-rich repeat extensin-like protein (Fig. 7B). Our result shown that the most of the accessions carrying 1:72674289-GG had higher BrN than the accessions carrying 1:72674289-AA (Fig. 7C). Three models identified 267, 7 and 43 associated loci of chlorophyll content (ChC), respectively, which detected 699, 20 and 156 candidate genes, 8 common genes were identified both by GLM and MLM models which encoding ERF transcription factor, auxin-responsive protein, embryonic abundant protein, etc., among them, two genes encoding unknown proteins also were identified by FarmCPU model (Figure S12). What’s more, the GWAS identified 280, 13 and 14 significant SNPs associated with plant height (PlH) using GLM, MLM and FarmCPU models (Figure S13). All the 13 significant SNPs identified by MLM model also were detected through GLM model, and 44 genes were found according to the common SNPs, which encoding serine/threonine-protein kinase, UDP-glycosyltransferase, F-box protein, calcineurin B-like protein and so on, and 5 genes were identified through three models at the same time.
Table 2
Summary of the significant sites and associated genes of traits in different models.
Conditions | Traits | Number of significant SNPs | | Number of associated genes |
Normal watering conditions | | GLM | MLM | FarmCPU | GLM& MLM | GLM& FarmCPU | MLM& FarmCPU | | GLM | MLM | FarmCPU | GLM& MLM | GLM& FarmCPU | MLM& FarmCPU |
BrN | 7243 | 16 | 11 | 6 | 2 | 0 | | 5176 | 107 | 36 | 51 | 8 | 0 |
ChC | 82 | 9 | 23 | 6 | 1 | 0 | | 274 | 44 | 97 | 30 | 8 | 0 |
MSD | 38 | 3 | 44 | 3 | 5 | 1 | | 143 | 11 | 134 | 11 | 10 | 6 |
PlH | 401 | 19 | 23 | 16 | 17 | 2 | | 925 | 80 | 72 | 59 | 43 | 6 |
PPN | 158 | 30 | 23 | 26 | 0 | 0 | | 369 | 118 | 151 | 98 | 0 | 0 |
PPW | 3465 | 0 | 60 | 0 | 33 | 0 | | 3123 | 0 | 125 | 0 | 29 | 0 |
RWC | 0 | 3 | 44 | 0 | 0 | 1 | | 0 | 10 | 133 | 0 | 0 | 6 |
StM | 70 | 3 | 19 | 3 | 1 | 0 | | 145 | 9 | 75 | 9 | 5 | 0 |
WRR | 314 | 20 | 34 | 11 | 14 | 5 | | 536 | 113 | 120 | 50 | 27 | 21 |
Drought conditions | BrN | 8334 | 0 | 9 | 0 | 1 | 0 | | 5390 | 0 | 37 | 0 | 1 | 0 |
ChC | 267 | 7 | 43 | 3 | 5 | 1 | | 699 | 20 | 156 | 8 | 8 | 2 |
PlH | 280 | 13 | 14 | 13 | 7 | 2 | | 612 | 44 | 67 | 44 | 50 | 5 |
PPW | 1419 | 0 | 33 | 0 | 19 | 0 | | 2633 | 0 | 104 | 0 | 60 | 0 |
RWC | 0 | 0 | 33 | 0 | 0 | 0 | | 0 | 0 | 148 | 0 | 0 | 0 |
SSI | 38 | 12 | 29 | 10 | 2 | 1 | | 130 | 50 | 182 | 40 | 3 | 4 |
VMF | 64 | 13 | 70 | 12 | 21 | 5 | | 206 | 98 | 189 | 83 | 39 | 23 |
Note: BrN: branch number, ChC: chlorophyll content, MSD: moisture saturation deficit, PlH: plant height, PPN: number of tubers per plant, PPW: weight of tuber per plant. RWC: relative water content, StM: stem number, WRR: water retention rate, SSI: stress susceptibility index, VMF: variety membership function. |
For weight of tuber per plant (PPW), 1419 and 33 significant loci were detected through GLM and FarmCPU models (Figure S14), while MLM model didn’t detect significant loci, and 19 common significant loci associated with 60 genes, including WRKY and MYB transcription factor, proline-rich receptor-like protein kinase, calcium-dependent protein kinase and so on. Our further study identified one interesting genetic variation (2:4478353) (Fig. 7D), the accessions carrying 2:4478353-AA were shown to have higher weight of tuber per plant (PPW) than the accessions carrying 2:4478353-GG (Fig. 7E). The significant SNP 2:4478353 was in the promoter region of the candidate gene PGSC0003DMG400004427, which encoded a Proline-rich receptor-like protein kinase (PERK). In the analysis of relative water content (RWC), only FarmCPU model detected 33 significant SNPs (Figure S15), and 148 genes located in the downstream and upstream of significant SNPs. What’s more, a total of 38, 12 and 29 significant SNPs were identified through the analysis of stress susceptibility index (SSI) using GLM, MLM and FarmCPU models, respectively (Figure S16). And 8:36470394 was the common significant SNP (Fig. 7f) locating in the genomic region of the candidate gene PGSC0003DMG400002204, which encoded a pollen-specific protein. The further analysis indicated that the accessions carrying 8:36470394-GG had higher stress susceptibility index (SSI) than the accessions carrying 8:36470394-AA (Fig. 7G).
We also performed genome-wide association analysis for variety membership function (VMF), almost all significant SNPs detected by MLM model also were detected by GLM model (Fig. 8A, B, C). And 83 common associated genes were identified through the above two methods, among them, 21 genes also were the associated genes identified by FarmCPU model, which encoding serine/threonine-protein phosphatase 2A (PP2A), salicylate carboxymethyl transferase, abscisic acid hydroxylase, AAA-ATPase, E3 ubiquitin-protein ligase, zinc finger protein (ZFP) and so on. The significant SNP (3:826504) was the common loci located in the genomic region of the related candidate gene PGSC0003DMG400013420, which encoded an ABC transporter (Fig. 7H). Our result shown that the accessions carrying 3:826504-GG had higher variety membership function (VMF) than the accessions carrying 3:826504-AA (Fig. 7I).
In addition, we found that 15 genes were identified associated with variety membership function (VMF) both of GLM model and MLM model, as well as located in selective regions (Table 3), which encoding serine/threonine-protein kinase, NAC domain-containing protein, adenylate isopentenyl transferase and so on. Among them, one gene encoded LBR (late blight resistance protein) was identified by all the three models, and the significant SNP 4:1861996 located in the exon region of LBR, and the genetic region of LBR had significantly FST value (Fig. 8D). Most interestingly, the different genotype of 4:1861996 had different drought resistance (Fig. 8E). The 69 potato individuals carrying 4:1861996-AA had significantly better drought resistance than the individuals carrying 4:1861996-AG, and the most of the individuals with 4:1861996-AA were drought resistant (DR) potatoes, while the individuals with 4:1861996-AG were drought susceptible (DS) potatoes.
To better understand the function of candidate genes identified by both the GWAS analysis and selective sweep analysis, the expression levels of 15 important candidate genes were verified by quantitative real time PCR (qRT-PCR) in the materials with stronger drought resistance. Based on the qRT-PCR results, we found that the expression levels of most of the candidate genes were up-regulated after drought treatment for 24 h. Besides, we found that some genes are up-regulated expression in both the shoot and root, such as PGSC0003DMG400005975, PGSC0003DMG400035020, PGSC0003DMG400005970 and so on (Fig. 9). But there were also some genes that were only up-regulated expression in the shoot, while the up-regulated expression levels in the root were not obvious, such as PGSC0003DMG400005978 and PGSC0003DMG400002468, which suggested that some candidate genes might take part drought response in different tissues.
Table 3
Summary of the candidate genes associated with VMF identified by GLM and MLM model.
Gene ID | Chromosome | SNP | Start | End | P value (GLM model) | P value (MLM model) | Gene function |
PGSC0003DMG400035020 | 4 | 1530957 | 1550680 | 1559669 | 5.00E-06 | 6.31E-06 | Unknown |
PGSC0003DMG400006008 | 4 | 1530957 | 1541367 | 1542478 | 5.00E-06 | 6.31E-06 | Adenylate isopentenyl transferase |
PGSC0003DMG400005978 | 4 | 1709353 | 1694262 | 1695176 | 7.12E-06 | 9.19E-06 | Unknown |
PGSC0003DMG400040065 | 4 | 1709353 | 1677219 | 1677845 | 7.12E-06 | 9.19E-06 | NAC domain-containing protein |
PGSC0003DMG400005975 | 4 | 1709353 | 1734305 | 1737409 | 7.12E-06 | 9.19E-06 | Serine/threonine-protein kinase |
PGSC0003DMG400006000 | 4 | 1709353 | 1730627 | 1733353 | 7.12E-06 | 9.19E-06 | R-like protein kinase |
PGSC0003DMG400005976 | 4 | 1709353 | 1710526 | 1718731 | 7.12E-06 | 9.19E-06 | Chaperone protein dnaJ |
PGSC0003DMG400006002 | 4 | 1709353 | 1708936 | 1709799 | 7.12E-06 | 9.19E-06 | Unknown |
PGSC0003DMG400005977 | 4 | 1709353 | 1700475 | 1701857 | 7.12E-06 | 9.19E-06 | Unknown |
PGSC0003DMG400005970 | 4 | 1861996 | 1858431 | 1862567 | 1.33E-06 | 5.45E-06 | Late blight resistance protein |
PGSC0003DMG400040027 | 0 | 32343039 | 32368061 | 32368494 | 3.5418E-06 | 7.72E-06 | Adenylate isopentenyl transferase |
PGSC0003DMG400045608 | 0 | 32343039 | 32366289 | 32366594 | 3.5418E-06 | 7.72E-06 | Adenylate isopentenyl transferase |
PGSC0003DMG400002468 | 0 | 32343039 | 32361429 | 32363751 | 3.5418E-06 | 7.72E-06 | Pentatricopeptide repeat-containing protein |
PGSC0003DMG400002467 | 0 | 32343039 | 32354306 | 32356707 | 3.5418E-06 | 7.72E-06 | Geraniol 8-hydroxylase |
PGSC0003DMG400002470 | 0 | 32343039 | 32350218 | 32351019 | 3.5418E-06 | 7.72E-06 | Geraniol 8-hydroxylase |
Genome-wide association analysis for traits under normal watering conditions
In this study, we also performed genome-wide association analysis for 9 traits under normal watering conditions using three different statistical models (GLM, MLM and FarmCPU). And the Manhattan plots and quantile-quantile plots were displayed in Supplemental Figure S17-S23, and the detailed information about candidate genes identified by three models were summarized in Table S14-S16.
We found that the most of the significant SNPs identified by MLM model also were identified by GLM model (Table 2). When the analysis of chlorophyll content (ChC), 6 common significant SNPs were detected (Fig. 10A), which identified 30 associated genes, including zinc finger protein, ERF transcription factor, glycosyltransferase, late blight resistance protein and so on. The significant SNP (4:8470510) was the common loci located in the genomic region of the related candidate gene PGSC0003DMG400023619, which encoded an ethylene-responsive transcription factor (Fig. 10C). Our result shown that the accessions carrying 4:8470510-GG had higher chlorophyll content (ChC) than the accessions carrying 4:8470510-AA (Fig. 10D). For moisture saturation deficit (MSD), all the three significant SNPs detected through MLM model also were detected through GLM model (Fig. 10B), and 11 genes were the common associated genes, which encoding serine/threonine-protein phosphatase, serine/threonine-protein kinase, ERF transcription factor and so on, among them, 6 genes also were identified through FarmCPU model. The common significant SNP 7:50992935 (Fig. 10E) locating in the promoter region of the candidate gene PGSC0003DMG400017294, which encoded an unknown protein. The further analysis indicated that the accessions carrying 7:50992935-CC had higher moisture saturation deficit (MSD) than the accessions carrying 8:36470394-AA (Fig. 10F).
What’s more, a total of 59 common genes were identified by the analysis of plant height (PlH) using GLM model and MLM model, and 6 genes were also identified by FarmCPU model. What’s more, a total of 158, 30 and 23 significant SNPs were identified through the analysis of number of tubers per plant (PPN) using GLM, MLM and FarmCPU models, respectively, and only GLM and MLM models detected 26 common significant SNPs, which associated with 98 genes encoding protein phosphatase 2A (PP2A), ERF, MYB and bHLH transcription factor, zinc finger protein (ZFP) and so on. Through the GWAS of water retention rate (WRR), 21 common associated genes were identified by three models at the same time, which encoded to leucine-rich repeat receptor-like protein kinase, WRKY transcription factor, xyloglucan glycosyltransferase, UDP-glycosyltransferase and so on.