Detection and evaluation of accession-specific variants
We performed genome resequencing analysis of 50 accessions from the Brassica rapa core collection, with the goal of developing markers specific to each accession. This core collection is composed of four different groups: non-pekinensis, Chinese, Japanese, and Korean breeding lines (Fig. 1 and Supplementary Table 1). The reads from the analysis of these accessions were mapped to the Brassica rapa reference genome (ver 3.0) [11] with the BWA-MEM (ver 0.1.17) using the default parameters. We detected a total of 4,925,742 SNPs from the 50 accessions (Table 1 and Supplementary Data 1). Our goal was to identify genetic variants from the B. rapa core collection. To this end, we constructed a variant-identification pipeline by combining the calling and filtering variants (Supplementary Figure 1). First, SNPs of individual accessions were detected and merged in the joint variant calling step. Then, homozygous alternative alleles for single accessions were identified as accession-specific SNPs by comparing the genotype of each individual accession in the core collection. To develop KASP markers, each accession-specific marker was evaluated by considering the non-redundant flanking sequences, overlapping of repeat sequences, and annotation of the SNPs. Finally, SNPs with unique flanking sequences without overlapping repeat sequences were identified as candidates for development of KASP markers. We identified 2,925 accession-specific SNPs as such candidates (Table 1). Almost all of these SNPs were in flanking sequences of genes and 2,806 of them, or approximately 95.9%, were in genic regions (Table 2). Of the 2,925 SNPs, approximately 456, or 15.6%, resulted in non-synonymous mutations, and 19 variants led to abnormal termination of translation. These genetic variants may be important in future investigation of trait-associated genes or markers. Our next step in the development of accession-specific markers was to validate the SNPs with genome resequencing analysis, which we did with Sanger sequencing (Fig. 2).
Eight flanking sequences of the accession-specific SNP candidates were selected from the four groups of the core collection. Primers for Sanger sequencing were designed (Supplementary Table 2). From the results of the Sanger sequencing, we concluded that 7 of the SNP candidates were specific to a single accession (Fig. 2 and Supplementary Figure 2-7). Amplification by PCR for Sanger sequencing failed in one flanking sequences (Supplementary Figure 8), leading us to conclude that SNPs with conserved flanking sequences were the best candidates for developing accession-specific markers with PCR. Also, candidate SNPs with highly conserved flanking sequences that are suitable for primers may be necessary for developing wide-ranging KASP markers that will apply to crops not in the core collection or to commercial cultivars. Clearly, determination of primer sites for KASP markers is important for the development of accession-specific KASP markers.
Development and evaluation of KASP markers
Our next venture was to develop accession-specific KASP markers for assessment of hybrid seed purity. Five of the accession-specific SNP candidates identified as described above were selected from individual accessions for further analysis. Primer sites played an important role in successful marker development, and conserved flanking sequences of SNPs in our core collections were surveyed (Fig. 3a). Flanking regions containing non-sequence sites, shown as N in the reference genome, were removed from the primer candidate sequences (Fig. 3b). Then, five flanking sequences in each accession-specific SNP were selected for further evaluation of KASP markers. It was necessary to consider the genomic position of the SNP in the development of a wide range of markers, as overlapping genomic positions among markers may lead to inefficiency or false positive results when seed purity is assessed. To avoid this redundancy, the genomic positions of five candidate SNPs from individual accessions were investigated and the positions unique to the accessions were selected (Fig. 4). In total, two SNPs in each accession were selected for validation of KASP markers (Supplementary Table 3). Many of the KASP markers that were in genic regions caused non-synonymous variation, although almost all accession-specific SNPs were detected in the flanking regions of genes (Table 2).
Validation of KASP markers was carried out using 50 accessions from core collection and 35 from non-core collections or commercial cultivars to determine their applicability to a wide range of seed purity assessments (Fig. 5, Table 3, and SupplementaryData2). Based on the results, we conclude that accession-specific markers were successfully distinguished in individual accessions in both the core collection and the outgroup (Fig. 5). We suggest that accession-specific markers developed using a large amount of individual resequencing data can be used to assess seed purity of seed from of non-sequenced accessions or cultivars. The accession-specific markers developed here should be useful in a wide range of seed purity assessments in crop breeding and commercial seed production.