Assembling a panel of genome-wide short segments for genotyping
Target fragments harboring polymorphic SNPs were derived from three sources, including (1) whole genome re-sequencing of two landraces, (2) GBS-derived polymorphisms in a global barley diversity panel (Milner et al. 2019), and (3) SNPs reported in a 50K Illumina Infinium iSelect SNP array (Bayer et al. 2017). Flanking sequences were extracted from the barley reference genome (Morex v2) and subjected to primer pickup. Out of 587 primer pairs tested for multiplex PCR reaction followed by high-throughput sequencing, 87 were discarded due to either a lower capacity of fragment capture, production of multiple fragments, or extensive/low PCR amplification efficiency. As a result, five hundred primer pairs were qualified applicable for barley multiplex PCR amplification (BarPlex v1.0) (Supplementary Table 2). The number of target fragments distributed on each chromosome varied from 54 to 84 (Fig. 1a), with higher density towards the telomeres (where a higher recombination rate is generally present) (Mascher et al. 2017).
We conducted four independent experiments with analysis of the 1,068 genotypes that included 51 wild barley (Hs), 248 Tibetan semi-wild barley, 345 landraces, 329 cultivars, as well as 95 F2 segregants (Table 1). The detection of target fragments in each experiment varied from 99.4% to 99.8% (Fig. 1b), with the average sequencing depth (x) between 467 and 1010 (Fig. 1c). For each of the 1,068 genotypes, the detection rate ranged from 96.4% to 100% with a mean of 99.7% (Fig. 1d), and the average sequencing depth across all samples was 757 (Fig. 1e). In addition, we analyzed the detection rate and sequencing depth of the 500 targets across all samples. Of these, 466 targets (93.2%) were detected in >99.5% samples, whereas only four target fragments were detected in less than 90% samples (Fig. 1f-g). We did not observe a significant difference in the number or size of detected fragments in the populations of wild, Tibetan semi-wild, or cultivated barley (Table 1). In comparison to barley landraces and cultivars, fewer polymorphic target SNPs were observed in the populations of H. spontaneum and Tibetan semi-wild barley. This is likely due to the origin of target SNPs which were identified from the population of cultivated barley. By identifying the polymorphic sites that were derived from the 91.1 kb of target sequences, a higher number of SNPs was observed in H. spontaneum, but not among Tibetan semi-wild barley. Considering the detection rate and sequencing depth, as well as the number of polymorphisms, we would like to conclude that BarPlex v1.0 is a robust and complexity-reduction assay applicable for genotyping in barley plants.
BarPlex v1.0 vs. genotyping-by-sequencing (GBS)
GBS is a complexity-reduced and cost-efficient genotyping approach that has been widely applied in genetic studies of crop plants with complex genomes (Milner et al. 2019; Poland et al. 2012). To test if BarPlex v1.0 works as efficient as GBS in barley, genotyping of 96 barley landraces or cultivars using GBS was conducted, yielding 0.7-Gb of clean sequences per accession (8-fold more than that in the study of sequencing >22,000 Genebank barley accessions) (Milner et al. 2019). GBS unlocked 24,195 qualified polymorphic sites, while 1,372 polymorphisms were detected using BarPlex v1.0 (Fig. 2a). In each sample, 99.2%-100.0% (mean = 99.7%) of the target fragments were detectable using BarPlex v1.0, whereas the detection rate using GBS after quality control (removing missing rate >20%) was 81.5% - 97.6% (Fig. 2b). The sequencing depth using BarPlex v1.0 was 310-fold higher than that of GBS (Fig. 2c). Each of the SNPs in BarPlex v1.0 was detected in 83.3% - 100% (mean = 99.9%) of samples, whereas this number in GBS arranged from 80.2% to 100% with a mean of 91.5% (Fig. 2d-e). Based on the principal component analysis (PCA) with the dataset from either BarPlex v1.0 or GBS, a similarity of unlocking the population structure diversity was observed (Fig. 2f-g). With these results, we would like to conclude that BarPlex v1.0 is a reliable and highly accurate assay informative for genotyping in barley.
Applications for heterozygosity discrimination, variety pedigree, and linkage mapping and GWAS
We further checked if polymorphisms revealed by BarPlex v1.0 were applicable in genetic studies of barley. Here, 495 target SNPs, as well as their residing fragments (90.3 Kb in total) representing 3,220 polymorphic sites were included. First, the heterozygosity revealed by quantifying the allele frequency on polymorphic sites was analyzed (Fig. 3a). All the individuals in the F2 population remained heterozygous at a proportion of polymorphic sites (mean = 6.5%). In comparison, only ten samples of 973 in natural materials were found to remain heterozygous (cutoff: ≥4%). Moreover, we analyzed the heterozygosity revealed by target SNPs. An overestimate in comparison to that revealed by polymorphic sites, was observed (Fig. 3a).
Second, comparisons of various landraces/cultivars values using both target SNPs and polymorphic sites revealed interesting results. “Morex”, a six-rowed US malting variety released in 1978, has been used as the reference genome sequence in barley (Mascher et al. 2017). “Morex” seeds provided by two breeding units in the downstream valley of Yangtze River in China differed from the original “Morex” (Fig. 3b). In contrast to this, three Chinese landraces/historic cultivars which have been cultivated widely during last century (“Chi Ba Da Mai”, “Xiu Ning Ai Jiao Da Mai”, and “Chi Ba Huang”) were found to have a higher identity (Fig. 3c). Comparison of the two-rowed malting barley cultivar “Zao Shu 3 Hao” (Kanto Nijo 3), which was introduced from Japan in 1960s, to its Co60-radiation variant “Yan Fu Ai Zao 3” revealed a semi-dwarf mutation. Both “Zao Shu 3 Hao” and “Yan Fu Ai Zao 3” are founder varieties that were widely cultivated in 1970s and 1980s in China, and their genetic similarity was traceable using BarPlex v1.0 (Fig. 3d). The historic cultivar “Yu Da Mai 1 Hao” was reported to be developed from a mutation of “Zao Shu 3 Hao”, whereas the genotyping result suggested it may be derived from an out-pollination rather than a mutation (Fig. 3d). Notably, the polymorphic sites from the 90.3 kb of sequences showed a better resolution on variety pedigree discrimination than 495 target SNPs.
Third, we further examined if BarPlex v1.0 is applicable for preliminary genetic mapping. Cultivated barley is traditionally classified into hulled or naked barley, according to the adherence of the caryopsis in fully mature grains (Taketa et al. 2008). The naked trait (NUD) is inherited qualitatively and controlled by a loss-of-function allele of the ethylene response factor (ERF) gene that resides on chromosome 7HL (Lei et al. 2020; Taketa et al. 2008). We conducted a genome-wide association study (GWAS) in 973 barley accessions (excluding 95 F2 lines out of 1068 samples) with 3,220 polymorphic sites, and a single peak associating with the NUD locus was revealed (Fig. 4a). In addition, linkage mapping using a bi-parental F2 population was conducted by genotyping with BarPlex v1.0 (Fig. 4b-c). Here, 108 target SNPs and 442 polymorphic sites between both parents were revealed on the seven chromosomes (Table 1). The naked trait was delimited to a 40.2 cM genetic interval spanning approximately 20 Mb (Fig. 4d) where the previously-identified NUD locus was present (Taketa et al. 2008).
Collectively, these results indicated that BarPlex v1.0 is applicable in multiple barley genetic studies.
Tibetan semi-wild barley represented genetic similarity with Chinese landraces but lower diversity
Tibetan semi-wild barley, also referred as Tibetan weedy barley (Zeng et al. 2018), has a characteristic brittle rachis that is also seen in wild barley H. spontaneum of the Near East. This has raised a hypothesis that Tibet may be an independent domestication center of native wild barley (Aberg, 1938; (Dai et al. 2014). Although this hypothesis was questioned by re-sequencing and phylogenetic analyses of the brittle rachis domestication genes Btr1 and Btr2 (Pourkheirandish et al. 2018), the population diversity of Tibetan semi-wild barley (mainly including H. agriocrithon), if compared to wild or cultivated barley population, remained ambiguous. Using the polymorphic sites revealed from BarPlex v1.0, we investigated the genetic diversity in a large panel of barley accessions including 51 H. spontaneum, 248 representative Tibetan semi-wild barley, 345 Chinese barley landraces, and 329 cultivars. A higher number of polymorphic sites were detected in the H. spontaneum population than in Tibetan semi-wild barley (2,652 vs. 1,496), even though the analyzed population size of H. spontanerum was considerably smaller (51 vs. 248) (Table 1). With the polymorphic sites, principal component analysis (PCA) revealed that H. spontaneum constitutes a group isolating from domesticated barley, while the Tibetan semi-wild barley was grouped with Chinese barley landraces (Fig. 5a). A rapid decrease in genetic diversity (specifically, nucleotide diversity, π) from Chinese barley landraces to Tibetan semi-wild barley was also observed (Table 2). Selections represented by the estimated significant negative values of Tajima’s D, as well as Fu and Li’s D* and F* suggested a founder effect occurring in Tibetan semi-wild barley population. Statistical analysis using pre-selected target SNPs derived from cultivated barley may give an underestimation of the genetic diversity within the H. spontaneum population (Table 1), thus resulting in a deviation of the genetic relationships in principal component analysis (Fig. 5b). The genetic diversity and the PCA analysis in a large panel of Tibetan semi-wild barley, Chinese barley landraces, and H. spontaneum inferred that Tibet was not an independent center of domestication for the native wild barley.