Identifying the PFRU candidate region using GWAS
To perform a GWAS to identify the genomic region of PFRU, NGS short reads from 25 SF-type cultivars and 24 PF-type cultivars were obtained from the NCBI SRA (Supplementary Table 2). The NGS short reads of each cultivar were independently aligned to the publicly available strawberry genome “Camarosa genome”, which was generated using the SF-type cultivar ‘Camarosa’. A total of 17,536,173 positions revealed that at least one cultivar showed SNP. These SNP positions were applied to GWAS, which was performed using rrBLUP. The resulting GWAS led to the detection of PFRU-associated SNPs in 15 subgenomes across all chromosomes except chromosome Fvb5 (Supplementary Fig. 2). Because these results are inconsistent with previous reports that single loci of PFRU are located on chromosome Fvb4, we believe that the low accuracy of the genotyping at the SNP positions used for the GWAS caused a false-positive association. To exclude false-positive associations, we strictly redefined the genotype based on the SNP index values in the VCF file using rrBLUP and then used the VCF file to perform the GWAS again. Although the genotypes of 129,175,469 positions in each cultivar were redefined, the association was still detected in all subgenomes of chromosome Fvb4, as well as in Fvb3-1, Fvb7-1, and Fvb7-4 (Fig. 1a).
The candidate region for PFRU was therefore further narrowed using the supporting information of the frequency of the candidate SNP positions. In a frequency analysis for candidate SNPs with a P < 0.01 (Fisher’s exact test), the Fvb4-4 region 2.1–4.1 Mb was found to contain 5,640 candidate SNPs, which were detected as the highest significant peak (Fig. 1c). By contrast, Fvb4-1, -2, and − 3, the homoeologous chromosomes for Fvb4-4, only had 59, 64, and 82 candidate SNPs at their peak regions, respectively. Taken together, we concluded that PFRU is located on chromosome Fvb4-4 of the “Camarosa genome”. The region 0.07–4.06 Mb on Fvb4-4 contained the associated SNPs detected in the GWAS and was therefore defined as the candidate region. Moreover, to identify the tightly linked genomic interval within the 0.07–4.06 Mb on Fvb4-4, a linkage disequilibrium (LD) analysis was performed using the SNP positions with significant P-values in the GWAS (Fig. 2). As the result, two LD blocks divided at 3.29 Mb were detected.
Fvb3-1, Fvb7-1, and Fvb7-4 also had candidate SNPs in the GWAS, with sufficient numbers confirmed in the frequency analysis to imply that these regions may also be involved in flowering habit alongside the PFRU locus.
Mapping the genomic region for PFRU in the “Camarosa genome”
The location of the published SSR markers used for the linkage analysis for PFRU were confirmed using a BLASTn short analysis for the “Camarosa genome”. Here, we analyzed eight primers for four SSR markers: Bx083 and Bx215 mapped PFRU into a 7.3-cM interval (Perrotte et al. 2016), and FxaACA0218C and s2430859 mapped PFRU into a 1.1-cM interval (Honjo et al. 2016, 2020). These markers were independently developed to narrow the interval for PFRU using different segregating progenies from a crossing between cultivars, which explains why they mapped different intervals. The BLAST analysis showed that all primer sequences showed a high homology not only to Fvb4-4, but also to other chromosomes (Supplementary Table 4). Focusing on Fvb4-4, sequences homologous to the primers were only detected within the candidate PFRU region (0.07–4.06 Mb); however, three of the eight primer sequences showed no homology with Fvb4-4, possibly because the corresponding sequence in Fvb4-4 may not be assembled. The position for each of the markers revealed that the 2.75 − 3.71 Mb region of Fvb4-4, between Bx083 and Bx215, was the narrowest candidate region for PFRU (Fig. 2).
Phylogenetic relationships among the PFRU loci of strawberry cultivars
To understand the phylogenetic relationship between the PFRU genotypes of each strawberry cultivar, an unrooted phylogenetic tree was constructed for 50 cultivars using the ML method with the sequence for each genotype concatenated at the 14,698 SNP positions detected in any cultivar in the PFRU region (Fvb4-4: 2.75 − 3.71 Mb). In the phylogenetic tree, 18 of the 26 SF-type cultivars and 15 of the 24 PF-type cultivars were classified into two clades with short branches, named the SF and PF clades, respectively (Fig. 3). The remaining cultivars were located in independent branches.
We also constructed a phylogenetic tree using the SNPs detected within the corresponding PFRU-like regions in Fvb4-1, -2, and − 3 (Supplementary Fig. 3). Unlike the tree for the PFRU locus on Fvb4-4, the other three trees did not construct SF- and PF-specific clades. These phylogenetic trees provide further evidence that PFRU is located on Fvb4-4 rather than the other subgenomes.
Developing a co-dominant DNA marker for PFRU
Because the previously reported markers for PFRU, Bx083 and Bx215, were dominant, we developed a co-dominant marker that allowed us to confirm a heterozygous genotype in PFRU and for application during the selection process in breeding. Here, ARMS markers, which determine the genotype at an SNP position based on the annealing ability of a primer, were designed as DNA markers.
First, the PF-type-specific hetero SNP positions closely localized in PFRU were selected by a visual observation in Integrative Genomics Viewer (IGV) (Robinson et al. 2011). Here, the cultivars ‘Earliglow’, ‘Grenada’, and ‘Petaluma’ were used as the representative SF-type cultivars and ‘Albion’, ‘Diamante’, and ‘Monterey’ were the representative PF-type cultivars used for identifying the target SNP positions because they had sufficient amounts of short reads and were classified into the type-specific clades in the abovementioned phylogenetic tree. Next, two primer pairs were designed for SF- and PF-specific alleles at PF-specific hetero SNP positions. To avoid the amplification of the non-targeted subgenome, the primer specificity was confirmed by aligning the homologous region from each subgenome belonging to chromosome Fvb4 (Supplementary Fig. 4). Finally, the ARMS markers designed for the SF- and PF-specific alleles were mixed in a 1:4 volume, respectively, and used for genotyping as a co-dominant marker.
The developed marker was applied to genotype 13 SF-type and 15 PF-type cultivars obtained in Japan. All SF-type cultivars showed a single band, indicating a homozygous genotype for the SF allele. On the other hand, 15 PF-type cultivars contained the PF allele, of which 2 and 11 cultivars were homozygous and heterozygous genotypes, respectively (Fig. 4). We also applied this ARMS marker to the progenies from selfing of two PF-type cultivars, ‘Ooishi shikinari 2’ and ‘Yotsuboshi’. The genotype of 16 individuals obtained from selfing of ‘Ooishi shikinari 2’ was segregated to 1:2:1 by applying this ARMS marker, confirming that only one loci was targeted and amplification from other sub-genomes were circumvent (Supplementary Fig. 5a). Moreover, the16 individuals obtained from selfing of each seedling of ‘Yotsuboshi’, corresponding to F3 generation, showed the association between flowering habit and genotype except one individual, suggesting that this marker can be applied to practical breeding (Supplementary Fig. 5b).