Insect samples
WBPH individuals were sampled in various geographic regions (China and Southeast Asia) (Table 1, Fig. 1). These comprised six sites in Yunnan Province; four sites in Shandong Province; two sites in Laos, Cambodia, and Vietnam, respectively; and one site in Myanmar and Thailand, respectively (Table 1). Samples were put in 95% ethanol and stored at -20 °C until DNA extraction.
Mitochondrial Coi Sequencing
Insect DNA was extracted using the TIAMamp Micro DNA Kit (Tiangen, Beijing, China) according to the manufacturer protocol. The mtDNA COI gene was amplified using primers 2195-MF(5’-CTGGTTYTTTGGTCATCCRGARGT-3’) [16] and a newly designed reverse primer 2830-R(5’-CAATCAGCATAATCTGAATATCG-3’) (Sangon Biotech, Shanghai), which amplified a 635-bp fragment. The PCR reactions were performed in 20 µl solutions containing 1 × buffer, 0.32 mM of each dNTP, 1.0 mM of each primer, 1.0 unit of Taq DNA polymerase, and 2 µl of template DNA. PCR was performed under the following conditions: initial denaturation at 95 °C for 5 min, followed by 35 cycles of 1 min at 94 °C for denaturation, 1 min at 54 °C, for annealing and 1 min at 72 °C for elongation, and final extension at 72 °C for 5 min. The PCR products were electrophoresed in a 1.0% agarose gel in TAE and were sequenced bi-directionally. Sequencing quality was evaluated, and sequencing results were manually corrected using BioEdit 7.2.6 software [17], followed by BLAST for homology comparison in NCBI. The alignment of sequences was performed using multiple sequences of the Clustal W algorithm in MEGA7.0 [18].
2b-rad Sequencing And Genotyping
The 2b-RAD sequencing and genotyping were outsourced to Shanghai OE Biotech Ltd. (Shanghai, China). Libraries were constructed following the 2b-RAD protocol [13]. Briefly, library preparation began with digestion of DNA samples. The BsaXI restriction enzyme (New England BioLabs, Ipswich, MA, USA) was used to prepare RAD libraries. Next, library-specific adaptors and the digestion products were linked with T4 DNA ligase. Ligation products were amplified by PCR, and the target band was excised from a 2% agarose gel. Finally, the paired-end RAD tags were sequenced on the Illumina Hiseq Xten platform (Illumina, San Diego, CA, USA). Quality filtering was conducted as follows: raw reads were trimmed to remove adaptors, and the terminal 2-bp positions were discarded to eliminate artifacts that might have arisen by ligation. Ambiguous bases (N) or reads of low quality (> 10 bp with quality less than Q20) were removed. SNPs were determined, and genotypes were called using a maximum-likelihood statistical model implemented in the software Stacks v1.32 [19].
Genetic Variation Analysis Based On Mitochondrial Data
Numbers and distribution of haplotypes, composition of haplotypes in each population, numbers of unique haplotypes, within-population mean number of pairwise differences, and nucleotide diversity were assessed using DnaSP v.5.10 [20]. The statistical parsimony network (also known as the TCS network) of haplotypes was analyzed using Popart ver. 1.7 [21, 22].
Genetic Variation Analysis Based On Snp Data
The genotype data contained information for each locus and each individual. The primary SNP loci number was 13,565, which could genotype all 133 individuals. We used Plink version 1.07 [23] to filter SNPs for genetic analysis. SNPs were filtered to meet the following criteria: (a) SNPs that were included in at least 80% samples of a population, (b) SNPs with a minor allele frequency (MAF) higher than 0.05, and (c) loci with strong deviations from the Hardy-Weinberg equilibrium (HWE, P < 0.0001) were removed. We excluded samples from populations that had too many missing data from further analyses reducing our sample size to 129 individuals. The final filtered SNP dataset had 1,108 SNP loci and was used for all downstream analyses. The parameters for population genetic analyses, that is, percentage of polymorphic loci (%poly), Shannon's information index (I), observed heterozygosity (HO), unbiased expected heterozygosity (uHE), and fixation index (F), were estimated by using GenALEx 6.5 [24, 25]. Hardy–Weinberg equilibrium (HWE), heterozygosity excess and deficit were tested by GENEPOP version 4.2.1 [26].
Population structure
We evaluated population genetic structure using five different approaches: (i) measuring genetic differentiation (FST) among populations, (ii) Principle Coordinate Analysis (PCoA) (iii) hierarchical analyses of molecular variance (AMOVA), (iv) Bayesian model-based clustering, and (v) Isolation by distance (IBD).
For mtDNA data, the pairwise Fst were calculated using Arlequin v.3.5.1.2 [27] and using the Tamura-Nei model [28]. For SNP data, the pairwise Fst were calculated using GenALEx. Principal coordinates analysis (PCoA) was used to find and plot the major pattern within a genetic distance matrix dataset. The PCoA using GenALEx, performed on genetic distance, was used to display genetic divergence among the populations. To determine the proportion of genetic variation that could be attributed to differences between sampling sites, hierarchal analyses of molecular variance (AMOVAs) were performed. A hierarchical AMOVA was performed using Arlequin, with 1,000 permutations. Populations were grouped corresponding to two major criteria, i.e., geographical area, and population genetic structure, to test genetic homogeneity in different hierarchies.
The Bayesian approach was used to determine genetically distinct groups (or clusters) using the program STRUCTURE v.2.3.1 [29-32]. We set the length of the Burnin period at 10,000 and number of MCMC Reps after Burniin was 20,000. We set the K value from 1 to 7 and for each K the number of iterations was 10. To estimate the group number, we used the online calculation developed by [33]. We examined the change in Ln P(D) using the deltaK approach [34]. Because the STRUCTURE software showed results of each 10 replications in the case of K = n, we used CLUMPP to average these results [35]. All of the data were visualized through DISTRUCT v.1.1 [36].
To estimate the admixture between geographic populations, we used a Bayesian assignment method as implemented in Geneclass2 [37]. This analysis identifies putative first-generation migrants among populations. To calculate individual probabilities of assignment to each population, we used the Monte-Carlo resampling method [38] with 1,000 simulated individuals at probability thresholds of α = 0.05. Isolation by distance (IBD) analysis was performed using Mantel tests (1,000 permutations) in GenAlex to find the correlation between genetic and geographic distances.