Sample collection
284 yearling fish {<120 mm standard length; Uchida (2005)} were collected in 2018 from eelgrass beds in the Kuma River estuary (K-KMA, N = 96) in Urado Bay and the Shimanto River estuary (K-SMT, N = 46) in Kochi Prefecture, as well as from the Shiomi River estuary (M-SHI, N = 96) and the Oyodo River estuary (M-OYD, N = 46) in Miyazaki Prefecture (Fig. 1; Supplementary Table1). All individuals were collected by using a coarse-meshed beach seine (4-mm × 4-mm-square mesh, 10 m wide and 2 m deep with a 3-m-long purse-bag at its center). Histograms of the standard length at each sampling site are shown in Supplementary Figure 1. Fin clippings were taken noninvasively from all 284 yearling fish, and after the fin clipping and the measurement of standard length all fish were released. This research was conducted with the permission of the Miyazaki and Kochi prefectural governments.
DNA extraction and ddRAD-seq
We used all 192 individuals from K-KMA and M-SHI for genetic analysis. Total genomic DNA was extracted from the fin clippings by using a DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany). The concentration of extracted DNA was quantified by using a Quantus fluorometer (Promega, Madison, WI, USA) and a Qubit dsDNA HS Assay Kit (Invitrogen, Carlsbad, CA, USA) and adjusted to 10 ng/μL. ddRAD-seq was conducted in accordance with the protocol of Peterson et al. (2012), with some modifications. The restriction enzymes EcoRⅠ (6-bp recognition site) and MseⅠ (4-bp recognition site) are frequently used for ddRAD-seq, but we used the restriction enzymes BglⅡ (6-bp recognition site) and MseⅠ to increase the total number of SNPs. Sequencing of 51-bp single-end reads was done by using a Hiseq 2500 sequencing system (Illumina, San Diego, CA, USA) on the 192 samples. Additionally, we prepared ddRAD-seq library in the same procedure on all 92 samples from K-SMT and M-OYD to investigate the population genetic structure between sampling sites within the same prefecture. Sequencing of 151-bp paired-end reads was done by using a Hiseq X system (Illumina) on these 92 samples.
Genotyping of sequence data
The RAD libraries were demultiplexed by using process_radtags in Stacks-2.54 (Catchen et al. 2011). Sequence reads were preprocessed by using the Trimmomatic 0.39 read-trimming tool (Bolger et al. 2014), with the following parameters: ILLUMINACLIP TruSeq3-PE-2.fa:2:30:10, LEADING:19, TRAILING:19, SLIDINGWINDOW:30:20, AVGQUAL:20, and MINLEN:51 or 151 (depending on the sequencing length). After the preprocessing, samples with fewer than 1,500,000 reads were excluded (one sample from M-SHI and one from K-KMA), and the remaining reads were mapped to the draft akame reference genome (Hashiguchi et al. in preparation) by using Nextgenmap 0.5.5 (Sedlazeck et al. 2013) with the default settings. Sorted individual BAM files were created with Samtools 1.9 (Li et al. 2009), and SNP calling was achieved with Stacks by using the reference mapping pipeline ref_map.pl in Stacks (with the default parameters). We used PLINK v1.90 (Purcell et al. 2007) to filter out SNPs with minor allele frequency (MAF) < 0.1, missing data > 0.01, and deviation from Hardy–Weinberg equilibrium (HWE) at P < 0.001. Finally, filtered SNPs were pruned for linkage disequilibrium (LD) in PLINK by using the indep-pairwise 50 5 0.2 command. The genotype data set was exported to genepop and structure format by using PGDSpider v 2.1.1.5 (Lischer and Excoffier 2012) and used in further analyses, except in the case of the genetic diversity.
Genetic diversity
We calculated genetic diversity indices at the sampling-site level with all polymorphic loci and all nucleotide positions. To determine the genetic diversity indices, we used loci that were present in at least 80% of individuals and at no fewer than three sampling sites, along with a minor allele count of > 1 in all individuals. Genetic diversity indices, namely the ratio of polymorphic sites, the number of private alleles (Ap), nucleotide diversity (π), expected heterozygosity (He), and observed heterozygosity (Ho), were calculated by using Stacks software.
Sibship reconstruction
We used Colony v2.0.6.5 (Jones and Wang 2010) to reconstruct the sibships of sampled individuals in order to evaluate the family bias in juvenile sampling and to obtain information about migration distance and spawning area, which are closely related to the population genetic structure of akame. Colony software analyses were run by assuming polygamy for both sexes, using the full-likelihood method with no sibship prior and with the genotyping error rate set to 0.05. The run length was medium, and the precision of the assignment was very high. Only full siblings that had a probability of assignment above 0.99 were considered as true full siblings. Also, we used the package sequoia (Huisman 2017), implemented in R (R Core Team 2020), and we compared the results of the two methods to validate the accuracy of the estimated sibships. The error rate of the sequoia analysis was set to 0.05, and the parameter MaxSibIter = 10.
Population genetic structure
To evaluate the population genetic structure we used Structure v2.3.4 (Prichard et al. 2000). The dataset was analyzed by using K (number of clusters) from 1 to 8, with 10 independent iterations, 100,000 Markov chain Monte Carlo (MCMC) repetitions, and a burn-in of 100,000. The most likely number of populations (K) was estimated by using the ΔK method of Evanno et al. (2005) in Structure Harvester (Earl and von Holdt 2012); we determined the optimal K representing the highest ΔK. The software CLUMPP v1.1.2 (Jakobsson and Rosenberg 2007) was used to average the results of the 10 iterations with the identified optimal number of clusters; the results were displayed graphically in order of individual ID in each site by using distruct v1.1 (Rosenberg 2004). Other investigations of population genetic structure, namely principal component analysis (PCA) and discrimination analysis of principal components (DAPC), were performed by using the dudi.pca and dapc functions, respectively, in the package adegenet v2.1.1 (Jombart 2008), implemented in R. The find.clusters function was used to determine the number of groups (K), with the optimal K selected as that with the lowest BIC (Bayesian information criterion) value. Cross-validation by using the xvalDapc function in the adegenet package was used to identify the optimal number of principal components to retain in DAPC, with default parameters except for n.rep=100. We also calculated the pairwise FST (fixation index) between each site pair by using the package Genepop (Rousset 2008) implemented in R to detect genetic differences between the sampling sites. Anderson and Dunham (2008) reported that analysis by using Structure and a sample that includes family groups may produce a false population structure, even when a population structure is absent. Therefore, all of the above individual-based population genetic structure analyses were conducted once again after the exclusion of full-sib individuals from Colony to eliminate the effects of family bias due to juvenile sampling on the genetic analyses.
Detection of migrants and recent gene flow
Structure software was also used to identify migrants and those individuals with migrant or mixed ancestry. Sampling location information was used in the analysis to help with the clustering process. We categorized the samples into two groups (Kochi and Miyazaki prefectures) in accordance with their sampling sites. Structure was run with K = 2 from a clustering analysis without population information. As we had no information about migration, the migration rate (MIGPRIOR) was assigned as an initial condition in accordance with the method of Prichard and Wen (2004). Migrant ancestry (GENSBACK) was set to 1, and the other parameters were the same as for the runs of Structure analysis without population information.
Recent gene flow between the Kochi and Miyazaki populations was assessed by using BayesAss v3.0.4 software (Wilson and Rannala 2003). The MCMC mixing parameter values for migration rate, allele frequency, and inbreeding coefficient were set to 0.08, 0.19, and 0.012, respectively, to archive the recommended acceptance rates between 20% and 60%, as recommended by Wilson and Rannala (2003). We performed 10,000,000 MCMC iterations, with 1,000,000 iterations to discard as burn-in and sampling every 1,000 iterations.
Estimation of population size
The effective number of breeders (Nb) was estimated for the Kochi and Miyazaki populations by using the sibship assignment (SA) method available in the program Colony, as well as the LD method (Waples 2006) by using NeEstimator v2.1 (Do et al. 2014). Nb corresponds to the effective population size (Ne) estimated from a single age cohort; it is equivalent to the effective number of adults that contributed offspring to the cohort (Wang 2016). The SA method infers Nb from the sibship frequencies with multilocus genotypes of a sample captured at random from a single cohort (Wang 2009). The parameters of estimation were the same as those used in the sibship analysis.