Population structure and phylogeographic history of wild soybean (Glycine soja) have implications for its conservation

doi:10.21203/rs.3.rs-2580996/v1

Download PDF

Research Article

Population structure and phylogeographic history of wild soybean (Glycine soja) have implications for its conservation

https://doi.org/10.21203/rs.3.rs-2580996/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Glycine soja Sieb. & Zucc., the wild ancestor from which the important crop plant soybean was bred. G. soja provides important germplasm resources for the breeding and improvement of cultivated soybean crops, however the species is threatened by habitat loss and fragmentation, and is experiencing population declines across its natural range. Understanding the patterns of genetic diversity in G. soja populations can help to inform conservation practices. Additionally, wild soybean has a wide distribution across the Sino-Japanese Floristic Region, and therefore provides an opportunity to investigate the effects of fluctuations in the Quaternary climate on the current genetic structure of plant species in this region. In this study, we analyzed the genetic diversity and differentiation of G. soja populations and investigated the phylogeographic history of the species. We obtained 147 wild soybean accessions collected from 16 locations across the natural range of the species from China, Korea and Japan. Samples were analyzed using SLAF-seq (Specific-Locus Amplified Fragment Sequencing), and we obtained a total of 56,489 highly consistent population SNPs. Our results suggested that wild soybean harbored relatively high diversity and that populations of the species were highly differentiated. The 147 wild soybean individuals clustered into two groups: lineage I comprised five populations from Northern and Central China as well as a population from Japan. Lineage II comprised the other 11 populations, which had Northeast Chinese, Korean and Yangtze River origins. Analysis of gene flow suggested that historical migrations of wild soybean may have occurred, from south to north across the East-Asia land-bridge. The phylogeographic history of wild soybean provides us with new insights into the migration patterns of herbaceous plants across the Sino-Japanese Floristic Region.

Glycine soja

SLAF-seq

genetic diversity

population structure

phylogeography

The Quaternary climatic oscillations are known to have affected the modern distributions and genetics of many northern hemisphere plant taxa, in particular European and North American species (Hewitt 2000; Hewitt 2004). Data from fossils as well as phylogeographic analyses suggests that temperate plant taxa from these areas, in particular tree and shrub taxa, underwent lateral range shifts during glacial, inter-glacial and post-glacial periods (Petit et al. 2003). The Sino-Japanese Floristic Region (SJFR) in the East Asiatic Floristic Kingdom is home to an extremely diverse temperate flora (Wu & Wu 1995). The greater part of this region was not covered by extensive ice-sheets during the last ice age, and glaciers developed only in high in the mountains, but the severe climatic oscillations during the Quaternary are thought to have greatly affected the modern distributions and population genetics of the native plant species (Harrison et al. 2001).

How East Asian temperate species responded to the climatic cycling of the Quaternary is still debated. Previous studies suggest that during the Last Glacial Maximum (LGM), most temperate forest flora remained in forest stands in northern China, but that there was extensive postglacial northward dispersal from southern forest refuges. The steppe vegetation and desert flora of this region is thought to have expanded eastward towards the seashore during the LGM, and temperate deciduous forest is believed to have covered the East China Sea (ECS) land-bridge, which connected southern Japan and the Korean peninsula with East China (the CJK region) at this time (Yu et al. 2000; Harrison et al. 2001). Phylogenetic data suggests that the genetic structure of plant populations at the time, and the patterns of plant distribution in this region, were affected profoundly by climate fluctuations during the Quaternary, however the distributions of these plant species showed only very limited, if any, latitudinal expansions and contractions (Qiu et al. 2011b). Instead, multiple cryptic refuges are thought to have existed, and plant taxa suffered from range fragmentation, vicariance, and population isolation in the CJK region (Li et al. 2008; Qiu et al. 2009a; Qiu et al. 2009b), and northern (Chen et al. 2008; Tian et al. 2009; Bai et al. 2010) southwestern (Wang et al. 2008b; Yuan et al. 2008; Liu et al. 2009; Zhang et al. 2011) and subtropical China (Gao & Innan 2008; Qiu et al. 2009a; Qiu et al. 2009b; Guan et al. 2010) in the SJFR region. However, phylogeographic studies have been carried out on only few taxa from the SJFR, with the studied taxa being mainly temperate trees or shrubs with relatively limited geographic distributions (Aizawa et al. 2007a; Gao et al. 2007; Chen et al. 2008). At same time, most previous phylogeographic studies investigated cpDNA, mtDNA or uniparentally inherited DNA markers, and the effects of the Pleistocene climate oscillations on the distribution and current genetic structure of widely distributed herbaceous species have only rarely been addressed. More phylogeographic studies are needed to investigate climate shifts in this region, in particular the effect of much lower levels of precipitation on the genetic structure and distribution of plant taxa.

Glycine soja Sieb. & Zucc., the wild soybean, is the ancestor from which the important crop plant soybean was bred (Smil 2000). Wild soybean has a wide distribution throughout the SJFR, between 24° and 53° N, and between 97° and 143° E. The species grows as a weed in cultivated land, on banks and in wetlands, from sea level to 2650 meters altitude (Lu 2004). Outcrossing rates are thought to range from 2.4% to 19% (Kiang et al. 1992; Fujita et al. 1997), with the highest rates attributed to elevated pollinator visitation (Fujita et al. 1997). Outcrossing rates are thought to be lower on average; for example, the mean outcrossing rate of 77 wild soybean populations in Japan was estimated to be 3.4% (although for seven populations the estimated outcrossing rate was over 10%) (Kuroda et al. 2006). Mean seed dispersal distances are only 10 m (Jin et al. 2003). Short distance dispersal occurs mainly through pod dehiscence (Oka 1983), while longer dispersal may be mediated by water or birds (Kiang et al. 1992; Choi et al. 1999; Kuroda et al. 2006).

The evolutionary relationships between different G. soja populations have been investigated in the past mainly through study of isozymes, DNA loci, SSRs and morphological characters (Dong et al. 2001; Li & Nelson 2002; Zhao et al. 2005; Wang & Takahata 2007; Wang et al. 2008a; Li et al. 2009; Zhao et al. 2009; Lee et al. 2010; Wang et al. 2010; Wang & Li 2011; He et al. 2012; Wang et al. 2012; Wang et al. 2014; Nawaz et al. 2017; Zhao et al. 2018). Several molecular marker-based studies have discussed phylogeographic issues such as geographical origins and patterns of dispersal (Choi et al. 1999; Kuroda et al. 2006, 2008; Kuroda et al. 2010; He et al. 2012), but the geographic sampling employed in these studies was limited. One previous study used nuclear microsatellites and a chloroplast locus in combination with ecological niche modeling in a multidisciplinary approach to investigate the demographic history of G. soja (He et al. 2016) The distribution of the wild soybean during the LGM was found to be limited to southern and central China, and may have experienced extensive range expansion into northern East Asia following the end of the LGM. However, the genetic diversity in northeast China is very high. It is not clear whether marker selection is insufficient or rapid radiation mutation is experienced. The limited number of polymorphic microsatellite sites used did not result in high resolution of the soybean populations.

Another method increasingly employed in the investigation of plant evolution and genetic origins is single nucleotide polymorphism (SNP) arrays, based on high- density whole genome sequencing (Melegh et al. 2017; Rahmatalla et al. 2017). Specific-locus amplified fragment sequencing (SLAF-seq) is able to generate large datasets of single nucleotide polymorphisms (SNPs) s(Sun et al. 2013), which have greater power than previous techniques to investigate the genetic structure of plant population (Narum et al. 2013). In herbaceous species, particularly those species which experienced significant contractions in available habitat following glacial cycling, neutral processes including changes in effective population size and allopatric divergence are expected to be of particular importance in driving population structure (Maggs et al. 2008). However, loci associated with environmental variables have been found in many studies (Yoder et al. 2014), which suggests that non-neutral processes may also have affected the observed patterns genetic diversity.

For this study, we obtained 147 wild soybean accessions collected from 16 locations across the natural range of the species from China, Korea and Japan. Samples were analyzed using SLAF-seq (Specific-Locus Amplified Fragment Sequencing) to address: (1) the genetic diversity and population genetics of G. soja and any possible implications for the conservation of this species; (2) the effect of Pleistocene climate oscillations on the current distribution of wild soybean.

Plant materials, preparation of DNA and construction of SLAF library, and high-throughput sequencing

Leaf samples were taken from 12 Chinese populations, two Japanese populations and two Korean populations of wild soybean, across the known distribution of the species. All samples were collected directly from the wild. Young, healthy leaves were collected from individuals separated by at least 15 m, and were dried immediately in silica gel. Between nine and ten individuals were collected from each population except DQ3, from which only three individuals were collected (Table 1). Voucher specimens were collected from each population and were deposited in the herbarium of the Yunnan Agricultural University.

Total genomic DNA was extracted from each sample following the cetyltrimethyl ammonium bromide (CTAB) method (Porebski et al. 1997). The concentration and quality of the resulting DNA were examined with electrophoresis on a 1% agarose gel and with spectrophotometry on an ND-2000 (NanoDrop, Wilmington, DE, USA). We used a modified SLAF-seq strategy in our experiment, with fragment sizes (including adaptors and indexes) ranging from 364 bp to 444 bp. The DNA was then cleaned and digested into fragments using the enzymes RsaI+HaeIII (NEB, Ipswich, MA, USA). High-throughput sequencing was performed on an Illumina HiSeqTM-2500 (Illumina, Inc., San Diego, CA, USA) at the Biomarker Technologies Corporation in Beijing.

Sequencing data grouping, genotyping, and genetic diversity analysis

In order to reconstruct the loci, the raw data were analyzedusing the Stacks1.0 pipeline (Catchen et al. 2011; Catchen et al. 2013b). Data were sorted and demultiplexed according to sample barcodes using process_radtags. Raw, low-quality reads (phred score ≤10) were discarded and the reads were filtered to remove adapter contamination. The program ustacks (stack depth parameter (-m) = 5; a mismatch parameter (-M) = 2, maximum stacks per locus = 3) was then used to group sample data into loci. The loci data were then merged into a catalog in cstacks. The alleles in each sample were determined by comparing the loci from each sample to the catalog in sstacks.

Species level genetic diversity in G.soja was assessed in the program populations, with all 147 samples treated together as a single population. A locus was required to be present in at least 67% of all samples in order to be eligible for inclusion in this analysis. Analysis of population level genetic diversity was conducted with each collection area treated as a population and loci were required to be present in all individuals (r = 1) in at least six populations (p = 6).

Population genetic analyses and linkage disequilibrium

The populations program in Stacks was used to calculatepopulation genetic statistics for each SNP (number of private alleles; observed heterozygoisty (H_O); expected heterozygosity (H_E); nucleotide diversity (π); Wright's F statistics F_ISand F_ST) (Frankham et al. 2002; Catchen et al. 2013a). The inbreeding coefficient F_IS was measured for each population to investigate potentially hidden population structures within each population, we examined the inbreeding coefficient F_IS (Wright 1978; Hartl & Clark 2007). We calculated the average F_ST for pairwise comparisons between all sampled populations in order to investigate the genetic relatedness of the populations. The F_STvalues were then used reconstruct a neighbor-joining tree using Mega6.0 (Tamura et al. 2013).

Structure format files containing the SNP data were output from the populations program in Stacks to allow analysis of population level genetic structure (Pritchard et al. 2000; Hubisz et al. 2009). Similarly, data were exported as Genepop format files to allow estimation of gene flow among populations using Genepop v4.0 (http://genepop.curtin.edu.au/). In order to avoid tight linkage SNPs (Catchen et al. 2013b), only the first SNP at each locus was written into the Genepop and Structure files using the parameters r = 1 and p = 6.

The output generated by the populations program was then analyzed using Structure2.3 (Pritchard et al. 2000). Following an initial burn-in of 10000 steps, 100000 iterations were run, with 10 replicates for each value of K (1-16, where K is the number of genotypic groups). The optimal K for each analysis was calculated from delta K (Evanno et al. 2005), using Structure Harvester. A principal coordinates analysis (PCA) was also run in order to asses genetic relationships between study individuals in SNPRelate (Zheng et al. 2012), based on the Euclidian distances between individual genotypes.

A nonlinear regression of linkage disequilibrium (LD) between polymorphic sites against distance (bp between sites) was run to estimate LD decay with physical distance. A cut-off value of r²= 0.1 was used for the evaluation of LD decay for each population, with the r²value for a marker distance of 0 kb assumed to be 1. Distances between the SNPs and r² were plotted as the LD-Decay curve. r² is usually larger where SNPs are closer, and smaller when SNPs are far apart. The LD decay distance (LDD) is the distance during which r² reduced to half of its maximum value Low recombinant frequencies within a particular distance tend to result in longer LDDs while higher recombinant frequencies within the same distance result in shorter LDDs. Plink (Purcell et al. 2007) was used to calculate the LD between pairs of polymorphic sites based on the squared correlation of allele frequency.

2.4 Gene flow and migration events between populations

Genepop v4.0 was used to estimate species level gene flow (Nm), with pairwise population level Nm values calculated using the formula Nm = (1-F_ST) / 4 F_ST(Wright 1950) using the F_ST values calculated in populations. Maximum likelihood trees describing the historical relationships between the study populations and to infer potential migration events between them were generated in TreeMix v1.13 (Pickrell et al. 2012). TreeMix was run iteratively with the migration parameter set to -5 and the SNP block size parameter set to 10.

SLAF sequencing and SNP discovery

The genome of the cultivated soybean (G. max) was used for program prediction in this project, and the RsaI+HaeIII enzyme was selected for enzyme digestion. SLAF label length ranged from 364-444bp. Once the data had been cleaned, the clean reads derived from each sample ranged between 453 and 2202 Mb for each individual, with most reads being about 800 Mb long. The average number of reads assigned to each individual was 3,859,551, with minimum and maximum read numbers per individual of 2,266,715 and 11,010,066, respectively (Table S1). Phred quality scores were high (30 ≥ 89.82%) and the GC content was found to range between 37.9% and 41.4% (Table S1). A total of 1,784,121 SLAFs were predicted, of which 548,804 were heterozygous SLAF tags. The average number of SLAF labels obtained by each individual was 202,663, with an overall average depth of 11.9x. A total of 2,436,305 SNPs were discovered. SNPs which fulfilled the following criteria were then discarded: (1) those with a minor allele frequency < 1% (2) those with a Hardy-Weinberg Equilibrium p value < 1 × 10⁻⁵; and (3) those missing more than 10% of their genotype data. Individuals missing more than 10% of the genotyped data were also discarded. A total of 56,489 SNPs were retained for downstream genetic diversity analysis. The SNPs showed a largely even distribution throughout the genome (Fig. 1).\

Genetic diversity at the species and population levels

The observed heterozygosity (Ho) was 0.0157 for all loci polymorphic at the species level, with the expected heterozygosity (He) being 0.1459, nucleotide diversity (π), 0.1465, and the inbreeding coefficient (FIS), 0.8533. When considering all nucleotide positions, including the non-polymorphic ones, the observed heterozygosity decreased to 0.0004 when the non-polymophic nucleotide positions were included in the analysis, with the expected heterozygosity decreasing to 0.0035, the nucleotide diversity decreasing to 0.0035, and the inbreeding coefficient decreasing 0.0205 under the same conditions.

Statistical analyses for each population are given in Table 2 and Figure 2. Across the loci that showed polymorphism in one or more populations, average observed heterozygosity (Ho) was found to range between 0.0199 (DQ) and 0.0460 (KR), expected heterozygosity (He) between 0.0119 (DQ) and 0.3492 (KR), nucleotide diversity (π) between 0.0130 (JK) and 0.3789 (KR), and inbreeding coefficient between -0.0003 (JK) and 0.0230 (KR).

If all nucleotides, including nonpolymorphic nucleotides were considered, the observed heterozygosity was found from 0.0005 to 0.0016, with the expected heterozygosity from 0.0003 to 0.0121. The observed nucleotide diversity ranged between 0.0003 and 0.0131, and the inbreeding coefficient from -0.0003 to 0.0230. The number of private alleles observed for each population ranged between 1755 (HH) and 12083 (QQHE). From all the measures, the highest genetic diversity was found in the KR population, followed by QQHE. The lowest nucleotide diversity and heterozygosity was seen in the JK population, with the lowest observed heterozygosity found in the DQ population.

Population structure analysis and Linkage disequilibrium

The average pairwise F_STvalues between different populations were used to reconstruct an UPGMA neighbor-joining tree in Mega v6.0 (Fig. 3). The 147 wild soybean accessions clustered into two groups: lineage I comprised five populations from Northern and Central China as well as a population from Japan. Lineage II comprised the other 11 populations, which had Northeast Chinese, Korean and Yangtze River origins. Analysis of gene flow suggested that historical migrations of wild soybean may have occurred, from south to north across the East-Asia land-bridge. The phylogeographic history of wild soybean provides us with new insights into the migration patterns of herbaceous plants across the Sino-Japanese Floristic Region.

The 147 wild soybean accessions clustered into two groups: lineage I comprised five populations (SY, TJ, JN, WN, WH) from Northern and Central China as well as a population from Japan (JK). Lineage II comprised the other 11 populations (HEB, QQHE, CC and DQ from Northeast China; KO, KR and JT from the Korean Peninsula; NJ, YW and HH from the Yangtze River Region). The Korean and Japanese populations did not form independent lineages, with Japanese population falling into Lineage I, and the Korean populations clustering with Lineage II. Individuals sampled from population SY formed two groups.

To further investigate the population structure of the sampled wild soybean populations, we analyzed the 56,489 generated SNPs in Structure2.3 using the “admixture” and “correlated alleles frequencies” models. Changes in LnP(D) and delta K were assessed following, suggesting that the model that best fit our data was K = 10 (Fig. 4A). Populations DQ, CC and QQHE were found to be most genetically similar to each other, from the posterior probabilities (Fig. 4B). Three individuals from population SY were found to be genetically distinct from other individuals in SY. The JT population is highly genetically diverse. The YW and NJ populations are genetically similar.

The principal coordinates analysis was largely consistent with both the reconstructed UPGMA phylogenetic tree and with the Structure analysis (Fig 5). PC1 showed clear divergence between Lineages I and II, although population WH was separated from Lineage I. PC2 divided Lineage II into two clusters: Cluster I comprised populations YW, HH and NJ, and Cluster II comprised populations KO, KR, JT, CC, DQ, QQHE, and HEB.

Linkage disequilibrium decay curves of the 16 G. soja populations are given in Fig 6. Each colored line represents the observed LD data for a single population. A clear and rapid decline of LD is observed to occur with distance, with the LD in all populations decaying rapidly to half its initial value within about 250 kb.

Genetic differentiation and gene flow among populations

The total number of migrants (Nm) per generation was predicted to be 0.18 (Genepop analysis). Figure 3 shows the calculated pairwise population Wright's F_ST values and the pairwise gene flow (Nm) of the 16 studied G. soja populations. Genetic differentiation between populations, as calculated from the F_ST values, was found to be quite high. The DQ and JK populations were the most divergent, with an F_STvalue of 0.67, and populations YW and JN were the least divergent with a value of 0.016. The pairwise population Nm values calculated from the Wright's F_ST values ranged between 0.125 and 2.108. The greatest gene flow (2.108) was between the JN and YW populations, followed by that between the SY and YW populations (2.002). The smallest level of gene flow (0.125) was calculated between the JK and DQ populations, followed by that between the JK and CC populations (0.141). Generally, the levels of gene flow between populations of wild soybean were quite low (Table 3).

To describe the historical relationships between these populations and to investigate potential migration events between them, we ran a TreeMix analysis on the 16 sampled wild soybean populations. The results obtained suggest that population splits have occurred and that there has been gene flow between populations. On the TreeMix output (Fig. 7), the DQ and HEB populations cluster together as one group, and there is strong gene flow from the CC population towards QQHE. Populations TJ, JN and WN clustered together as a single group, and there was strong historical gene flow from this cluster towards the QQHE and JT populations, and modern gene flow from the TJ to the SY population. Overall, the general trend in gene flow was from the south towards the north, with the populations TJ, JN and WN also contributing gene flow. In summary, the general migration patterns seem to have been from the south towards the north.

Comparison of different molecular markers in revealing genetic diversity and differentiation in populations of wild soybean

The genetic diversity and differentiation in G. soja have been investigated using several different molecular markers in the past. The diversity and structure of 11 populations of wild soybean were tested by Wang et al (Wang & Li 2013) using nuclear microsatellite markers (SSRs), giving H_O = 0.029; H_E = 0.0324. Analyses of SSRs and a chloroplast locus were conducted by He et al (He et al. 2016), gave H_O = 0.0324 and H_E = 0.426. Zhao et al (Zhao et al. 2006) used AFLP, ISSR and SSR to investigate wild soybean populations, with H_E = 0.353 (AFLP), 0.226 (ISSR) and 0.157 (SSR). In the current study, we applied the high throughput sequencing technology SLAF-seq to investigate the genetic diversity of wild soybean populations across the known distribution of the species. We obtained a value of H_O = 0.0157 and H_E = 0.1459. Different markers behaved differently in our study. Because SLAF-seq markers are genome-wide DNA tags (small fragments near specific restriction sites), they should represent the sequence characteristics of the entire genome. SLAF-seq markers are therefore believed to accurately reflect the true level of genetic diversity.

Levels of genetic differentiation between different geographical regions of the G. soja range, while significant, are lower than are predicted for an annual, selfing plant with limited seed dispersal (Sergei et al. 2016; Zhang et al. 2020). Most genetic variation occurs within rather than between populations of G. soja, which is unexpected for a selfing species. While natural seed dispersal in G. soja is estimated to be less than an average of 4.5 m, possible long-distance seed dispersals of up to 200 km have been suggested on the basis of molecular data (Kuroda et al. 2006). This may explain the low levels of observed genetic differentiation in wild soybean populations.

Historical demography

Previous phylogeogaphic studies suggested that following the Quaternary glacial and inter-glacial cycles in East Asia, no, or only limited northward-southward dispersal events took place. Instead, these plant taxa survived in multiple cryptic refugia during the glaciation (Qiu et al. 2011a) However, our previous SSR data and ecological niche modelling analyses (He et al. 2016), suggested that wild soybean was restricted in range to southern and central China during the LGM and following the LGM the species expanded its range significantly into northern East Asia. In this study, the SLAF data suggested that gene flow between wild soybean populations may have occurred across the East Asia land-bridge, which would agree with our previous findings. Gene flow was found to have occurred from the south towards the north. However, the genetic diversity index suggested that the KR and QQHE populations have high genetic diversity. This is not consistent with the idea that there was a large-scale northward range expansion in this species, because recolonized regions would be expected to show reduced genetic diversity. Therefore, it is possible that wild soybean populations survived in micro-refugia in Northeast of China. It has been suggested that the Changbai Mountain region suffered glaciation only above about 2000 meters during the late Pleistocene. If this is the case, the climate at lower elevations may have been mild enough during the Pleistocene glaciations that certain plant taxa could have survived in microclimatic habitats. The presence of refugia in Northeast of China has been suggested by several recent phylogeographic studies (Aizawa et al. 2007b; Hu et al. 2008). However, the current distribution of wild soybean suggests that there was not only one single refuge during the glacial periods of the Pleistocene, and wild soybean populations may have existed in multiple refugia, at least in Northeast of China and Korea.

Higher sea levels during and after the periods of glaciation would have meant that the CJK region was split by the East China Sea (ECS), but that there would have been a land-bridge formed by the exposed ECS basin when the sea levels decreased by c. 85-130/140 m during the glacial periods (Millien-Parra & Jaeger 1999). Temperate deciduous forest is thought to have covered the exposed land bridge during these times (Zhang et al. 2020). The temperate flora of the area is therefore likely to have been separated and restricted to disjunct refugia during warmer times, but to have had opportunities for admixture during the glacial periods. Previous phylogeographic studies investigating Kirengeshoma (Qiu et al. 2009b), Platycrater arguta (Qiu et al. 2009b) and Croomia (Li et al. 2008) all suggested deep allopatric-vicariant differentiation of disjunct lineages in the CJK region (Qiu et al. 2011a). In contrast with the previously studied taxa, wild soybean shows low divergence between different regions in CJK. Populations from northeastern China, southern Japan and the Korea Peninsula are genetically close. The high genetic differentiation observed between the different regions of the CJK area in previous phylogenetic studies and the low differentiation we found in wild soybean may result from the different habits of the study taxa. Wild soybean has a wide distribution and is sometimes able to colonize the high salt habitat along the sea shore. Because of this, wild soybean might have had greater opportunity to migrate across the land-bridge and mix with other populations than did taxa with only limited distribution. Further taxa with different ranges and habits should be sampled to further investigate the biogeographical history of the CJK region. The gene flow we observed between the 16 study populations in our research provided further support for the East Asia land-bridge diffusion theory.

The Japanese populations JK and JT contained individuals from several different lineages, which suggested that these populations might have been formed from several different colonization events. We think wild soybean may have been introduced to Japan through long distance dispersal events mediated by migratory birds. Another possibility is that unconsidered factors, such as human-mediated dispersal or hybridization with the cultivated G. max, are influencing the population structure of the wild species. Wild soybean widely has a wide distribution across Japan, and more population sampling is necessary to resolve the origin of Japanese wild soybean.

Implications for conservation

Two major goals in conservation include the preservation of genetic diversity and evolutionary potential and the prevention of inbreeding depression (Rauch & Bar-Yam 2005). Currently, two main methods are used to determine populations that should receive priority protection. The first method is to use genetic variation to determine priority, but a problem with this method is that it is easy to ignore the genetic differentiation between populations, and unique alleles present in populations with low genetic variation are not effectively protected. The second method is based on genetic differentiation and considers evolutionary significant units. In this method, priority is given on the basis of the degree of genetic differentiation, that is, the more unique the population is, the more valuable it is to protect. However, it can be difficult to identify evolutionary significant units for groups with unclear pedigrees or geographical models.

Our SLAF data suggest that although wild soybean resources have been seriously damaged and that a large number of populations have disappeared, G. soja retains high genetic diversity at the species level. However, some populations were found to have only very low levels of genetic diversity. For example, the nuclear diversity of the CC, DQ and JK populations was below 0.0008. In contrast, other populations were found to be highly diverse. In the KR population, for example, the nuclear diversity was 0.0131. The populations with harboring high genetic diversity should be considered first in the protection of wild soybean. Conservation of the original habitat, i.e. in situ protection, should be adopted for these populations.

All wild soybean populations studied here could be divided into two lineages, and these two lineages should be treated separately when formulating protection policies. Wild soybean has undergone significant habitat fragmentation in recent years, and human activities have led to the extinction of the species in many areas. The wild populations comprising Lineage I were often very difficult to find, even in areas from which soybean had previously been reported. The variation represented by various wild varieties are important for the study of the origin and evolution of the species, as well as for the breeding of cultivated varieties. However, certain varieties of wild soybean, for example those with gray hairs, white flowers, and light green pods, or with yellow and brown pod have disappeared from the vast Huanghuai River basin. It is thought that land development and the construction of flood prevention dams are the reasons behind these disappearances. The collection of wild soybean resources which are on the verge of extinction has therefore become urgent.

The most serious damage to Lineage II has been reported from northeastern China, in areas such as Anbang River in Jixian County, Heilongjiang Province. In 1981, tens of thousands of square meters of wild soybeans were growing along the Anbang River, but this area is now farmland, and the wild soybean population has disappeared. Lack of understanding of the importance of these unique resources, indiscriminate farming practices, over-harvesting, overgrazing, as well as rural urbanization and construction of economic development zones has resulted in a nationwide decrease in wild soybean numbers, and the species is now considered to be endangered. In order to actively rescue the endangered plants, the establishment of a "wild soybean original habitat nature reserve" is necessary, so that this important plant can continue to have ecological and social benefits.

Certain areas have begun to realize the importance of wild soybeans. In 2005, the Wuqing District of Tianjin City was listed as a wild soybean original habitat protection site and was officially included in the national "protection circle". Furthermore, in 2005, experts from the Chinese Academy of Agricultural Sciences (CAAS) discovered a natural population of wild soybean plants covering an area of about 3000 m² in Tahe County. This area was designated as a "wild soybean original habitat nature reserve" by the environmental protection department. However, original habitat nature reserves are insufficient for the complete protection of wild soybean, and the protection of the species needs to be strengthened.

Author Contributions：J. M. and X. J. L. directed most of the experimental and analytical work and wrote the manuscript. S. L. H. designed the analytical workflow and revised the manuscript and directed some analytical work, X. J. L., and Y. X participated in the experimental work.

Financial support：The project was supported by the National Natural Science Foundation of China (Project numbers 32060083 & 31500459), and the Fund of Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations (Project number: PSESP2021F02).

Conflicts of Interest：The authors declare no conflicts of interest.

Data archiving statement：The sequencing data generated in this study for the 147 samples is currently being submitted to the NCBI Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra) under the BioProject accession PRJNA798174 with Run accession numbers from SRR17650031 to 17650177.

Acknowledgements ：The authors thank Okada Hiroshi and Naoko Ishikawa for the collection of the Japanese samples and Chunghee Lee for the collection of the Korean samples.

Aizawa M, Yoshimaru H, Saito H, Katsuki T, Kawahara T, Kitamura K, Shi F, Kaji M (2007a) Phylogeography of a northeast Asian spruce, Picea jezoensis, inferred from genetic variation observed in organelle DNA markers. Molec Ecol, 16, 3393-3405. https://doi.org/10.1111/j.1365-294X.2007.03391.x
Aizawa M, Yoshimaruth H, Saito H, Katsuki T, Kawahara T, Kitamura K, Shi F, Kaji M (2007b) Phylogeography of a northeast Asian spruce, Picea jezoensis, inferred from genetic variation observed in organelle DNA markers. Molec Ecol, 16, 3393-3405. https://doi.org/10.1111/j.1365-294X.2007.03391.x
Bai WN, Liao WJ, Zhang DY (2010) Nuclear and chloroplast DNA phylogeography reveal two refuge areas with asymmetrical gene flow in a temperate walnut tree from East Asia. New Phytol, 188, 892-901. https://doi.org/10.1111/j.1469-8137.2010.03407.x
Catchen J, Bassham S, Wilson T, Currey M, O’Brien C, Yeates Q, Cresko WA (2013a) The population structure and recent colonization history of Oregon threespine stickleback determined using RAD-seq. Molec Ecol, 22, 2864–2883. https://doi.org/10.1534/g3.111.000240
Catchen J, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013b) Stacks: an analysis tool set for population genomics. Molec Ecol, 22, 3124-3140. https://doi.org/10.1111/mec.12330
Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3-Genes Genom Genet, 1, 171-182. https://doi.org/10.1111/mec.12354
Chen KM, Abbott RJ, Milne RI, Tian XM, Liu JQ (2008) Phylogeography of Pinus tabulaeformis Carr. (Pinaceae), a dominant species of coniferous forest in northern China. Molec Ecol, 17, 4276-4288. https://doi.org/10.1111/mec.12354
Choi IY, Kang JH, Song HS, Kim NS (1999) Genetic diversity measured by simple sequence repeat variations among the wild soybean, Glycine soja, collected along the riverside of five major rivers in Korea. Genes & Genetic Systems, 74, 169-177.
Dong YS, Zhuang BC, Zhao LM, Sun H, He MY (2001) The genetic diversity of annual wild soybeans grown in China. Theor Appl Genet, 103, 98-103. https://doi.org/10.1007/s001220000522
Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software structure: a simulation study. Molec Ecol, 14, 2611-2620. https://doi.org/10.1111/j.1365-294X.2005.02553.x
Frankham R, Ballou JD, Briscoe DA (2002) Introduction to conservation genetics. Cambridge University Cambridge.
Fujita R, Ohara M, Okazaki K, Shimamoto Y (1997) The extent of natural cross-pollination in wild soybean (Glycine soja). J Hered, 88, 124-128. https://doi.org/10.1093/oxfordjournals.jhered.a023070
Gao D, Wang Q, Wu Y, Xu H, Yu Q, Liu J (2007) Microsatellite DNA loci from the typical halophyte Thellungiella salsuginea (Brassicaceae). Conserv Genet, 9, 953-955. https://doi.org/10.1007/s10592-007-9403-2
Gao LZ, Innan H (2008) Nonindependent domestication of the two rice subspecies, Oryza sativa ssp indica and ssp japonica, demonstrated by multilocus microsatellites. Genetics, 179, 965-976. https://doi.org/10.1534/genetics.106.068072
Guan RX, Chang RZ, Li YH, Wang LX, Liu ZX, Qiu LJ (2010) Genetic diversity comparison between Chinese and Japanese soybeans (Glycine max (L.) Merr.) revealed by nuclear SSRs. Genet Resources Crop Evol, 57, 229-242. https://doi.org/10.1007/s10722-009-9465-8
Harrison S, Yu G, Takahara H, Prentice I (2001) Palaeovegetation (Communications arising): diversity of temperate plants in east Asia. Nature, 413, 129-130. https://doi.org/10.1038/35093166
Hartl DL, Clark AG (2007) Principles of population genetics. Sinauer Associates, Sunderland, Massachusetts.
He SL, Wang YS, Li DZ, Yi TS (2016) Environmental and Historical Determinants of Patterns of Genetic Differentiation in Wild Soybean (Glycine soja Sieb. et Zucc). Sci Rep, 6, 22795.
He SL, Wang YS, Volis S, Li DZ, Yi TS (2012) Genetic diversity and population structure: Implications for conservation of wild soybean (Glycine soja Sieb. et Zucc) based on nuclear and chloroplast microsatellite variation. Int J Mol Sci, 13, 12608-12628. https://doi.org/10.1038/srep22795
Hewitt G (2000) The genetic legacy of the Quaternary ice ages. Nature, 405, 907-913. https://doi.org/10.1038/35016000
Hewitt GM (2004) Genetic consequences of climatic oscillations in the Quaternary. Philos T Roy Soc B, 359, 183-195. https://doi.org/10.1098/rstb.2003.1388
Hu LJ, Uchiyama K, Shen HL, Saito Y, Tsuda Y, Ide Y (2008) Nuclear DNA microsatellites reveal genetic variation but a lack of phylogeographical structure in an endangered species, Fraxinus mandshurica, across north-east China. Ann Bot, 102, 195-205. https://doi.org/10.1093/aob/mcn074
Hubisz MJ, Falush D, Stephens M, Pritchard JK (2009) Inferring weak population structure with the assistance of sample group information. Mol Ecol Resour, 9, 1322-1332. https://doi.org/10.1111/j.1755-0998.02591.x
Jin Y, He TH, Lu BR (2003) Fine scale genetic structure in a wild soybean (Glycine soja) population and the implications for conservation. New Phytol, 159, 513-519. https://doi.org/10.1046/j.1469-8137.2003.00824.x
Kiang YT, Chiang YC, Kaizuma N (1992) Genetic diversity in natural populations of wild soybean in Iwate prefecture, Japan. J Hered, 83, 325-329. https://doi.org/10.1093/oxfordjournals.jhered.a111225
Kuroda Y, Kaga A, Tomooka N, Vaughan D (2010) The origin and fate of morphological intermediates between wild and cultivated soybeans in their natural habitats in Japan. Molec Ecol, 19, 2346-2360. https://doi.org/10.1111/j.1365-294X.2010.04636.x
Kuroda Y, Kaga A, Tomooka N, Vaughan DA (2006) Population genetic structure of Japanese wild soybean (Glycine soja) based on microsatellite variation. Molec Ecol, 15, 959-974.
Kuroda Y, Kaga A, Tomooka N, Vaughan DA (2008) Gene flow and genetic structure of wild soybean (Glycine soja) in Japan. Crop Sci, 48, 1071-1079. https://doi.org/10.2135/cropsci2007.09.0496
Lee JD, Shannon JG, Vuong TD, Moon H, Nguyen HT, Tsukamoto C, Chung G (2010) Genetic diversity in wild soybean (Glycine soja Sieb. and Zucc.) accessions from southern islands of Korean peninsula. Pl Breed, 129, 257-263. https://doi.org/10.1111/j.1439-0523.2009.01757.x
Li EX, Qiu YX, Yi S, Guo JT, Comes HP, Fu CX (2008) Phylogeography of two East Asian species in Croomia (Stemonaceae) inferred from chloroplast DNA and ISSR fingerprinting variation. Molec Phylogen Evol, 49, 702-714. https://doi.org/10.1016/j.ympev.2008.09.012
Li XH, Wang KJ, Jia JZ (2009) Genetic diversity and differentiation of Chinese wild soybean germplasm (G. soja Sieb. & Zucc.) in geographical scale revealed by SSR markers. Pl Breed, 128, 658-664. https://doi.org/10.1111/j.1439-0523.2009.01625.x
Li ZL, Nelson RL (2002) RAPD marker diversity among cultivated and wild soybean accessions from four Chinese provinces. Crop Sci, 42, 1737-1744. https://doi.org/10.2135/cropsci2002.1737
Liu JQ, Tian B, Liu RR, Wang LY, Qiu Q, Chen KM (2009) Phylogeographic analyses suggest that a deciduous species (Ostryopsis davidiana Decne., Betulaceae) survived in northern China during the Last Glacial Maximum. J Biogeogr, 36, 2148-2155. https://doi.org/10.1111/j.1365-2699.2009.02157.x
Lu BR (2004) Conserving biodiversity of soybean gene pool in the biotechnology era. Plant Spec Biol, 19, 115-125. https://doi.org/10.1111/j.1442-1984.2004.00108.x
Maggs CA, Castilho R, Foltz D, Henzler C, Jolly MT, Kelly J, Olsen J, Perez KE, Stam W, Väinölä R, Viard F, Wares J (2008) Evaluating signatures of glacial refugia for North Atlantic benthic marine taxa. Ecology, 89, S108-S122.
Melegh BI, Banfai Z, Hadzsiev K, Miseta A, Melegh B (2017) Refining the South Asian Origin of the Romani people. Bmc Genet, 18, 82. https://doi.org/10.1186/s12863-017-0547-x
Millien-Parra V, Jaeger JJ (1999) Island biogeography of the Japanese terrestrial mammal assemblages: an example of a relict fauna. J Biogeogr, 26, 959-972. https://doi.org/10.1046/j.1365-2699.1999.00346.x
Narum SR, Buerkle CA, Davey JW, Miller MR, Hohenlohe PA (2013) Genotyping-by-sequencing in ecological and conservation genomics. Molec Ecol, 22, 2841-2847. https://doi.org/10.1111/mec.12350
Nawaz MA, Yang SH, Rehman HM, Baloch FS, Lee JD, Park JH, Chung G (2017) Genetic diversity and population structure of Korean wild soybean (Glycine soja Sieb. and Zucc.) inferred from microsatellite markers. Biochem Syst Ecol, 71, 87-96. https://doi.org/10.1017/S1479262114000239
Oka H (1983) Genetic-Control of Regenerating Success in Semi-Natural Conditions Observed among Lines Derived from a Cultivated X Wild Soybean Hybrid. Journal of Applied Ecology, 20, 937-949.
Petit RJ, Aguinagalde I, de Beaulieu JL, Bittkau C, Brewer S, Cheddadi R, Ennos R, Fineschi S, Grivet D, Lascoux M, Mohanty A, Muller-Starck GM, Demesure-Musch B, Palme A, Martin JP, Rendell S, Vendramin GG (2003) Glacial refugia: Hotspots but not melting pots of genetic diversity. Science, 300, 1563-1565. https://www.science.org/doi/10.1126/science.1083264.
Pickrell JK, Pritchard JK, Tang H (2012) Inference of population splits and mixtures from genome-wide allele frequency data. Plos Genet, 8, e1002967. https://doi.org/10.1038/npre.2012.6956.1
Porebski S, Bailey LG, Baum BR (1997) Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Mol Biol Rep, 15, 8-15. https://doi.org/10.1007/bf02772108
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics, 155, 945-959. https://doi.org/10.1093/genetics/155.2.945
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M, Bender D, Maller J, Sklar P, Bakker P, Daly MJ (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81, 559-575. https://doi.org/10.1086/519795
Qiu Q, Ma T, Hu Q, Liu B, Wu Y, Zhou H, Wang Q, Wang J, Liu J (2011a) Genome-scale transcriptome analysis of the desert poplar, Populus euphratica. Tree Physiol, 31, 452-461. https://doi.org/10.1093/treephys/tpr015
Qiu YX, Fu CX, Comes HP (2011b) Plant molecular phylogeography in China and adjacent regions: Tracing the genetic imprints of Quaternary climate and environmental change in the world’s most diverse temperate flora. Molec Phylogen Evol, 59, 225-244. https://doi.org/10.1016/j.ympev.2011.01.012
Qiu YX, Qi XS, Jin XF, Tao XY, Fu CX, Naiki A, Comes HP (2009a) Population genetic structure, phylogeography, and demographic history of Platycrater arguta (Hydrangeaceae) endemic to East China and South Japan, inferred from chloroplast DNA sequence variation. Taxon, 58, 1226-1241. https://www.jstor.org/stable/27757014
Qiu YX, Sun Y, Zhang XP, Lee J, Fu CX, Comes HP (2009b) Molecular phylogeography of East Asian Kirengeshoma (Hydrangeaceae) in relation to Quaternary climate change and landbridge configurations. New Phytol, 183, 480-495. https://doi.org/10.1111/j.1469-8137.2009.02876.x
Rahmatalla SA, Arends D, Reissmann M, Said Ahmed A, Wimmers K, Reyer H, Brockmann GA (2017) Whole genome population genetics analysis of Sudanese goats identifies regions harboring genes associated with major traits. Bmc Genet, 18, 92. https://doi.org/10.1186/s12863-017-0553-z
Rauch EM, Bar-Yam Y (2005) Estimating the total genetic diversity of a spatial field population from a sample and implications of its dependence on habitat area. P Natl Acad Sci USA, 102, 9826-9829. https://doi.org/10.1073/pnas.0408471102
Smil V (2000) Magic beans. Nature, 407, 567-567. https://doi.org/10.1038/35036653
Sun XW, Liu DY, F. ZX, Li WB, Zheng HK (2013) SLAF-seq: An Efficient Method of Large-Scale SNP Discovery and Genotyping Using High-Throughput Sequencing. Plos One, 8, e58700. https://doi.org/10.1371/journal.pone.0058700
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: Molecular evolutionary genetics analysis version 6.0. Molecular Biology and Evolution, , 30, 2725-2729. https://doi.org/10.1093/molbev/mst197
Tian B, Liu R, Wang L, Qiu Q, Chen K, Liu J (2009) Phylogeographic analyses suggest that a deciduous species (Ostryopsis davidiana Decne., Betulaceae) survived in northern China during the Last Glacial Maximum. J Biogeogr, 36, 2148-2155. https://doi.org/10.1111/j.1365-2699.2009.02157.x
Volis, S.; Ormanbekova, D.; Shulgina, I. (2016) Fine-scale spatial genetic structure in predominantly selfing plants with limited seed dispersal: A rule or exception? Plant Divers. 38, 59-64. https://doi.org/10.1016/j.pld.2016.03.001
Wang KJ, Li XH (2011) Genetic differentiation and diversity of phenotypic characters in Chinese wild soybean (Glycine soja Sieb. et Zucc.) revealed by nuclear SSR markers and the implication for intraspecies phylogenic relationship of characters. Genet Resources Crop Evol, 58, 209-223. https://doi.org/10.1007/s10722-010-9563-7
Wang KJ, Li XH (2013) Genetic diversity and gene flow dynamics revealed in the rare mixed populations of wild soybean (Glycine soja) and semi-wild type (Glycine gracilis) in China. Genet Resources Crop Evol, 60, 2303-2318. https://doi.org/10.1007/s10722-013-9998-8
Wang KJ, Li XH, Li FS (2008a) Phenotypic diversity of the big seed type subcollection of wild soybean (Glycine soja Sieb. et Zucc.) in China. Genet Resources Crop Evol, 55, 1335-1346. https://doi.org/10.1007/s10722-008-9332-z
Wang KJ, Li XH, Liu Y (2012) Fine-Scale Phylogenetic Structure and Major Events in the History of the Current Wild Soybean (Glycine soja) and Taxonomic Assignment of Semi-Wild Type (Glycine gracilis Skvortz.) within the Chinese Subgenus Soja. J Hered, 103, 13-27. https://doi.org/10.1093/jhered/esr102
Wang KJ, Li XH, Yan MF (2014) Genetic differentiation in relation to seed weights in wild soybean species (Glycine soja Sieb. & Zucc.). Plant Syst Evol, 300, 1729-1739. https://doi.org/10.1007/s00606-014-0998-8
Wang KJ, Takahata Y (2007) A preliminary comparative evaluation of genetic diversity between Chinese and Japanese wild soybean (Glycine soja) germplasm pools using SSR markers. Genet Resources Crop Evol, 54, 157-165.
Wang M, Li RZ, Yang WM, Du WJ (2010) Assessing the genetic diversity of cultivars and wild soybeans using SSR markers. Afr J Biotechnol, 9, 4857-4866. https://doi.org/10.1038/hdy.2008.61
Wang MX, Zhang HL, Zhang DL, Qi YW, Fan ZL, Li DY, Pan DJ, Cao YS, Qiu ZE, Yu P, Yang QW, Wang XK, Li ZC (2008b) Genetic structure of Oryza rufipogon Griff. in China. Heredity, 101, 527-535. https://doi.org/10.5897/AJB09.1410
Wright S (1950) The Genetical Structure of Populations. Ann Eugenic, 15, 323-354. https://doi.org/10.1111/j.1469-1809.1949.tb02451.x
Wright S (1978) Variability within and among natural populations. Evolution and the genetics of populations. University of Chicago, Chicago. p.^pp.
Wu ZY, Wu SG (1995) A proposal for a new ﬂoristic kingdom (realm) – the E. Asiatic kingdom, its delimitation and characteristics. In:Zhang AL, Wu SG (eds) Proceedings of the First International Symposium on Floristic Characteristics and Diversity of East Asian Plants. China Higher Education Beijing. p.^pp. 3-42.
Yoder JB, Stanton-Geddes J, Zhou P, Briskine R, Young ND, Tiffin P (2014) Genomic Signature of Adaptation to Climate in Medicago truncatula. Genetics, 196, 1263-1275. https://doi.org/10.1534/genetics.113.159319
Yu G, Chen X, Ni J, Cheddadi R, Guiot J, Han H, Harrison SP, Huang C, Ke M, Kong Z, Li S, Li W, Liew P, Liu G, Liu J, Liu Q, Liu KB, Prentice IC, Qui W, Ren G, Song C, Sugita S, Sun X, Tang L, VanCampo E, Xia Y, Xu Q, Yan S, Yang X, Zhao J, Zheng Z (2000) Palaeovegetation of China: a pollen data-based synthesis for the mid-Holocene and last glacial maximum. J Biogeogr, 27, 635-664. https://doi.org/10.1046/j.1365-2699.2000.00431.x
Yuan QJ, Zhang ZY, Peng H, Ge S (2008) Chloroplast phylogeography of Dipentodon (Dipentodontaceae) in southwest China and northern Vietnam. Molec Ecol, 17, 1054-1065. https://doi.org/10.1111/j.1365-294X.2007.03628.x
Zhang G, Li J, Zhang J, Liang X, Wang T, Yin S (2020) A high-density SNP-based genetic map and several economic traits-related loci in Pelteobagrus vachelli. Bmc Genomics, 21, 700. https://doi.org/10.1186/s12864-020-07115-7
Zhang P, Li JQ, Li XL, Liu XD, Zhao XJ, Lu YG (2011) Population Structure and Genetic Diversity in a Rice Core Collection (Oryza sativa L.) Investigated with SSR Markers. Plos One, 6. https://doi.org/10.1371/journal.pone.0027565
Zhao H, Wang Y, Fu X, Liu X, Yuan C, Qi G, Guo J, Dong Y (2018) The Genetic Diversity and Geographic Differentiation of the Wild Soybean in Northeast China Based on Nuclear Microsatellite Variation. International Journal of Genomics, 2018, 1-9. https://doi.org/10.1155/2018/8561458
Zhao LM, Dong YS, Li B, Hao S, Wang KJ, Li XH (2005) Establishment of a Core Collection for the Chinese annual wild soybean (Glycine Soja). Chin Sci Bull, 50, 989-996. https://doi.org/10.1360/982004-657
Zhao R, Cheng Z, Lu WF, Lu BR (2006) Estimating genetic diversity and sampling strategy for a wild soybean (Glycine soja) population based on different molecular markers. Chin Sci Bull, 51, 1219-1227. https://doi.org/10.1007/s11434-006-1219-9
Zhao R, Xia H, Lu BR (2009) Fine-scale genetic structure enhances biparental inbreeding by promoting mating events between more related individuals in wild soybean (Glycine soja; Fabaceae) populations. Amer J Bot, 96, 1138-1147. https://doi.org/10.3732/ajb.0800173
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012) A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics, 28, 3326-3328. https://doi.org/10.1093/bioinformatics/bts606

Table 1 location and habitat of populations of G. soja sampled

Population	Location	Altitude (m)	Latitude (°)	Longitude (°)	Location
WH	Huhan, Hubei province	301	30.533	114.445	Wet land
WN	Weinan, Shanxi province	379	34.453	109.520	Along road
HH	Huaihua, Hunan province	890	27.715	110.81	Along road
YW	Yiwu, Zhejiang province	72	29.338	120.038	Wet land
NJ	Nanjing, Jiangsu province	18	32.065	118.814	Beside lake
JN	Jinan, Shandong province	29	34.646	116.867	barren mountain
TJ	Tianjin, Hebei province	8	39.080	117.010	barren land
SY	Shenyang, Liaoning province	57	41.758	123.386	Along road
CC	Changchun, Jilin province	125	43.871	125.241	Aside field
HEB	Haebin, Heilongjiang province	137	45.784	126.564	beside river
DQ	DQ, Heilongjiang province	132	46.526	125.15	Wet land
QQHE	QQHE, Heilongjiang province	137	47.285	123.968	Beside field
JK	Kanagawa, Japan	12	34.959	137.139	Wet Land
JT	Tokyo, Japan	35	34.828	135.770	Wet Land
KO	Gangwon-do, South Korea	520	37.588	128.409	Wet Land
KR	Gangwon-do, South, Korea	340	37.913	128.499	Wet Land

Table 2 Genetic diversity statistics for the 16 populations.

Pop ID	Private	Polymorphic Loci%	Obs Het		Exp Het		Pi (π)		F_IS
Pop ID	Private	Polymorphic Loci%	All pos.	Variant pos.	All pos.	Variant pos.	All pos.	Variant pos.	All pos.	Variant pos.
HH	1755	0.30	0.0005	0.0225	0.0014	0.0557	0.0017	0.0693	0.0020	0.0829
WH	6398	0.54	0.0006	0.0252	0.0019	0.0799	0.0021	0.0859	0.0039	0.1595
WN	4317	0.43	0.0005	0.0204	0.0016	0.0674	0.0017	0.0722	0.0031	0.1286
YW	8434	0.82	0.0005	0.0205	0.0027	0.1106	0.0029	0.1181	0.0067	0.2774
NJ	2180	0.28	0.0005	0.0215	0.0011	0.0470	0.0012	0.0502	0.0014	0.0600
JN	5203	0.63	0.0005	0.0203	0.0022	0.0918	0.0024	0.0980	0.0049	0.2042
TJ	10842	0.68	0.0005	0.0204	0.0023	0.0976	0.0025	0.1038	0.0054	0.2262
SY	4891	0.77	0.0005	0.0203	0.0027	0.1123	0.0029	0.1193	0.0063	0.2618
CC	1805	0.26	0.0005	0.0206	0.0008	0.0314	0.0008	0.0334	0.0015	0.0608
HEB	5185	0.34	0.0005	0.0210	0.0012	0.0512	0.0013	0.0546	0.0021	0.0883
DQ	1793	0.18	0.0005	0.0199	0.0005	0.0199	0.0005	0.0211	0.0007	0.0277
QQHE	12083	0.79	0.0007	0.0293	0.0026	0.1101	0.0028	0.1173	0.0058	0.2418
JK	2351	0.07	0.0005	0.0200	0.0003	0.0119	0.0003	0.0130	-0.0003	-0.0126
JT	3356	0.26	0.0005	0.0203	0.001	0.0428	0.0011	0.0475	0.0015	0.0631
KO	2431	0.36	0.0005	0.0214	0.001	0.0398	0.001	0.0420	0.0022	0.0893
KR	2409	2.75	0.0016	0.0460	0.0121	0.3492	0.0131	0.3789	0.0230	0.6665
Total			0.0004	0.0157	0.0035	0.1459	0.0035	0.1465	0.0205	0.8533

Note：private, private allele number; Ho, observed heterozygosity; He, expected heterozygosity; π, nucleotide diversity; F_IS, inbreeding coefficient of an individual relative to the subpopulation.

Table 3 F_st and Nm between populations collected in this study.

	HH	WH	WN	YW	NJ	JN	TJ	SY	CC	HEB	DQ	QQHE	JK	JT	KO	KR
HH		0.541	0.415	1.644	0.282	1.162	0.809	1.313	0.249	0.360	0.314	1.059	0.194	0.341	0.303	1.162
WH	0.316		0.676	0.964	0.536	0.766	0.796	0.994	0.505	0.485	0.355	0.988	0.318	0.437	0.570	0.719
WN	0.376	0.270		0.861	0.686	0.673	0.673	0.970	0.597	0.394	0.293	0.805	0.251	0.355	0.715	0.704
YW	0.132	0.206	0.225		0.697	2.108	1.265	2.002	0.612	0.766	0.856	1.658	0.723	0.958	0.666	0.723
NJ	0.470	0.318	0.267	0.264		0.536	0.541	0.796	0.572	0.288	0.202	0.646	0.160	0.248	0.742	0.693
JN	0.177	0.246	0.271	0.106	0.318		1.026	1.523	0.475	0.603	0.683	1.322	0.501	0.676	0.515	0.712
TJ	0.236	0.239	0.271	0.165	0.316	0.196		1.187	0.483	0.727	0.496	1.342	0.479	0.653	0.531	0.701
SY	0.160	0.201	0.205	0.111	0.239	0.141	0.174		0.697	0.727	0.723	1.417	0.656	0.881	0.742	0.697
CC	0.501	0.331	0.295	0.290	0.304	0.345	0.341	0.264		0.256	0.179	0.575	0.141	0.216	0.669	0.653
HEB	0.410	0.340	0.388	0.246	0.465	0.293	0.256	0.256	0.494		0.236	0.842	0.201	0.313	0.296	0.676
DQ	0.443	0.413	0.460	0.226	0.553	0.268	0.335	0.257	0.583	0.514		0.633	0.125	0.223	0.221	0.643
QQHE	0.191	0.202	0.237	0.131	0.279	0.159	0.157	0.150	0.303	0.229	0.283		0.606	0.818	0.627	0.708
JK	0.563	0.440	0.499	0.257	0.610	0.333	0.343	0.276	0.639	0.554	0.667	0.292		0.180	0.184	0.704
JT	0.423	0.364	0.413	0.207	0.502	0.270	0.277	0.221	0.536	0.444	0.529	0.234	0.582		0.264	0.800
KO	0.452	0.305	0.259	0.273	0.252	0.327	0.320	0.252	0.272	0.458	0.531	0.285	0.576	0.486		0.666
KR	0.177	0.258	0.262	0.257	0.265	0.260	0.263	0.264	0.277	0.270	0.280	0.261	0.262	0.238	0.273

Note: The upper triangle represents Nm and the The bottom triangle represents Fst.

No competing interests reported.

SupplementaryMaterial.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Population structure and phylogeographic history of wild soybean (Glycine soja) have implications for its conservation

Status:

Version 1

Abstract

Figures

Introduction

Materials And Methods

Results

Discussion

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1