2. Microsatellite genotyping
A total of 762 individuals were genotyped using 15 microsatellite loci (Lde01 to Lde15) which were previously isolated from SLF 26. In a preliminary test, all loci developed in the previous study 26 were polymorphic among most population samples and were included in the henceforth analyses.
Total genomic DNA was extracted from single individuals using LaboPass ™ Tissue Genomic DNA mini Kit (COSMOGENETECH, Korea) according to the manual protocol. To avoid the contamination of exotic genomic DNA (bacteria, parasites, endosymbionts, etc.), we only used muscle tissues extracted from ptero-thoracic parts. The muscle tissues were left in the lysis buffer with protease K solution at 55°C for 24 hours and the cleared cuticle dehydrated. Microsatellite amplifications were performed using AccuPower® PCR PreMix K-2037 (BIONEER, Korea) in 20 µl reaction mixtures containing 0.5 µM forward labeled with a fluorescent dye (6-FAM, HEX, or NED), reverse primers and 0.05 µg of DNA template. PCR was performed using a GS482 thermo-cycler (Gene Technologies, Essex) according to the following procedure: initial denaturation at 95°C for 5 m, followed by 34 cycles of 95°C for 30 seconds; annealing at 56°C for 40 seconds; extension at 72°C for 45 seconds, and a final extension at 72°C for 5 m. PCR products were visualized by electrophoresis on a 1.5 % agarose gel with a low range DNA ladder to check for positive amplifications. Automated fluorescent fragment analyses were performed on the ABI PRISM 377 Genetic Analyzer (Applied Biosystems), and allele sizes of PCR products were calibrated using the molecular size marker, ROX labeled-size standard (GenScan™ ROX 500, Applied Biosystems). Raw data on each fluorescent DNA products were analyzed using GeneMapper® version 4.0 (Applied Biosystems).
3. Data analysis
Allele data analysis results were extracted to the text file format, and then, processed in GENALEX 6.503 27 through Microsoft office Excel 2013 (Microsoft). Observed (HO) and expected heterozygosity (HE) values among loci were estimated using GENEPOP 4.0.7 28 among the population datasets as well as between the USA and Asian datasets. Levels of significance for Hardy–Weinberg equilibrium (HWE) and linkage disequilibrium tests were adjusted using sequential Bonferroni correction for all tests involving multiple comparisons 29. Deviations from HWE were tested for heterozygote deficiency or excess. MICRO-CHECKER (Van Oosterhout) 30 was used to test for null alleles 31 and identify possible scoring errors because of the large-allele dropout and stuttering. The program FSTAT 2.93 was used to estimate the gene diversity (HS), a mean number of alleles (NA), and allelic richness (RS).
Groupings based on some biogeographical (spatial) and temporal groups were tested independently with analysis of molecular variance (AMOVA) 32 in ARLEQUIN 3.5.1.2, with significance determined using the non-parametric permutation approach described by Excoffier, et al. 32. We also used ARLEQUIN for calculations of pairwise genetic differentiation (FST) values 33, in which 38 populations were assigned by each local collection. Exact test of population differentiation was done as optioned by 100,000 Markov chains, 10,000 Dememorisation steps, and 0.05 significance level.
To examine genetic relationships between populations of SLF, we used principal coordinate analysis (PCoA) on a genetic distance matrix based on codominant-genotypic distance 34,35 provided in GENALEX 6.503 27. Principal Coordinate Analysis (PCoA) is a multivariate technique that allows one to find and plot the major patterns within a multivariate data set (e.g., multiple loci and multiple samples) 34,35.
The program BOTTLENECK 1.2.02 36 was used in the attempt to recognize in our samples the effect of a recent bottleneck, separately for each population. Two mutation models, considered appropriate for microsatellites 37, were applied as the strictly stepwise mutational model (SMM) and the two-phase model (TPM). For the TPM, a model that includes both 90% SMM and 10% TPM was used for 20,000 iterations. Significant deviations in observed heterozygosity over all loci were tested using a nonparametric Wilcoxon signed-rank test 36.
We performed two analyses to confirm dispersal and Isolation-by-Distance (IBD) using two different datasets of the Korean and Chinese. The Korean dataset included 17 regional populations collected only in 2010 and 2011 while the Chinese dataset consists of all 12 regional populations (Table 1). First, the level of the nation-scale spatial autocorrelation 35,38,39 was analyzed. An individual-by-individual genetic distance matrix 34,35 was generated along with a subsequent geographic distance matrix. The autocorrelation coefficient, r, is calculated through a multivariate approach (i.e., across all loci) and equals the genetic similarity between two individuals. The statistical significance of the spatial autocorrelation hypothesis is determined by generating 95% confidence intervals from 10,000 random permutations. Values of r that lie within the confidence intervals fail to reject the null hypothesis of no spatial genetic structure. Restricted dispersal is suggested when r > 0; alternatively when r < 0, large-scale dispersal is implicated 39. For further evaluation, support for the estimation of r is given through the bootstrapping procedure. According to the geographic scale, we performed analyses at different distance size classes, ‘10 and 20 km’ in the Korean dataset and ‘20 and 40 km’ in the Chinese dataset, with 999 permutations and bootstrapping of r with 1,000 times. Second, the significance of the regression of codominant genotypic distance (GD) on geographic distance (GGD) was tested using the Mantel procedure 40, permuting 1,000 times. Spatial autocorrelation and Mantel tests were run using the program GENALEX v6.503 27. In addition, the population relative (r) test, with 99 permutations for 100 bootstraps, was performed to confirmed the correlation between GD and GGD for 30 regional populations from Korea (17), China (12), and Japan (1) in GENALEX.
STRUCTURE 2.3.4 was used to analyze the genetic structure of 38 SLF populations using a Bayesian clustering approach 41. We set the number of clusters (K) from 1 to 11 and conducted 5 independent runs for each value of K. Each run consisted of a burn-in period of 30,000 steps, followed by 500,000 Markov chain Monte Carlo (MCMC) repetitions with a model allowing admixture. The ΔK value calculated as ‘∆K = m(|L′′(K)|) / s[L(K)]’ was obtained using the ad hoc quantity, which is calculated based on the second-order rate of change of the likelihood 42. To correctly perform this process, ∆K was calculated using the online resource STRUCTURE HARVESTER 0.6.94 43 that explained the structure in data. Visualization of the STRUCTURE results was conducted using DISTRUCT 1.1 44. In addition, GENECLASS 2 was used to perform the assignment/exclusion tests, which were used for the detection of genetic signatures of dispersal and immigration 45. For each individual of population, the program estimated the probability of belonging to any other reference population or to be a resident of the population where it was sampled. The sample with the highest probability of assignment was considered as the most likely source for the assigned genotype. We used a Bayesian method of estimating pop. allele frequencies 46 with Monte-Carlo resampling computation (10,000 simulated individuals) in order to infer the significance of assignments (type I error, alpha = 0.01) 47.
To estimate the relative likelihood of alternative introduction scenarios of the SLF, an approximate Bayesian computation (ABC) was performed for microsatellite data as implemented in DIYABC 1.0.4 48. DIYABC allows the comparison of complex scenarios involving bottlenecks, serial or independent introductions, and genetic admixture events in introduced populations 6. The parameters for modeling scenarios are the times of split or admixture events, the stable effective population size, the effective number of founders in introduced populations, the duration of the bottleneck during colonization, and the rate of admixture 49. The software generates a simulated data set used to estimate the posterior distribution of parameters in order to select the most likelihood scenario 49. DIYABC generates a simulated data set that is then used to select those most similar to the observed data set, and so-called selected data set (nδ), which are finally used to estimate the posterior distribution of parameters 48. The DIYABC 1.0.4 analysis was conducted on the purpose of testing the initial introduction process of SLF from the source to invasive regions. Since multiple introductions were occurred based on the results of PCoA, STRUCTURE, and GENECLASS2 (see Results), several populations could be limitedly estimated as the source and invasive relationships. Therefore, the ABC analyses were used to confirm the various scenarios for simulating the introduction route between the source (China) and invasive populations (Korea) which were selected from a large number of population collections in 2010. We set one source group (SG), Shanghai (CN10-SH), and two invasive groups, a northern invasive group (NG) of South Korea (KR10-SL, KR10-IN, KR10-SW) and a middle group (MG) of South Korea (KR10-CA, KR10-NS, KR10-BA). Four scenarios (1–4) were estimated with comparison to each other in the DIYABC (Fig S1 supplementary information). We produced 1,000,000 simulated data sets for each scenario. We used a generalized stepwise model (GSM) as the mutational model for microsatellites, which assumes increases or reductions by single repeat units 48. To identify the posterior probability of these three scenarios, the nδ = 40,000 (1%) simulated datasets closest to the pseudo-observed dataset are selected for the logistic regression, which are similar to the nδ = 400 (0.01%) ones for the direct approach 49. The summary of statistics was calculated from the simulated and observed data for each of the tested scenarios such as the mean number of alleles per locus (A), mean genetic diversity for each group and between group, genetic differentiation between pairwise groups (FST), classification index, shared alleles distance (DAS) and Goldstein distance.