Sampling design and Population characterization
A total of 472 individuals of the swimming crab Liocarcinus depurator (Portunidae) were sampled from the Aegean Sea (Eastern Mediterranean) to South Wales (Northeast Atlantic) (Fig. 1, Table 1). The sampling strategy was designed to include 12 localities, covering putative oceanographic processes which might influence species connectivity. Samples were collected from Wales (WALE; 53.534 N, -4.131 W), Cantabric Sea (CANT; 43.766 N, -7.323 W), Cadiz (CADI; 36.499 N, -6.731 W), Malaga (WALB; 36.547 N, -4.507 W), Almeria (EALB; 36.703 N, -2.840 W), Alacant (ALAC; 38.407 N, 0.002 E), Valencia (VALE; 39.203 N, 0.118 E), Ebro Delta (DELT; 40.079 N, 0.702 E), Central Catalonia (CCAT; 41.972 N, 2.865 E), Gulf of Lion (LION; 43.117 N, 3.655 E), Greece (GREE; 38.117 N, 21.183 E) and Adriatic (ADRI; 45.125 N, 12.445 E). The sampling scheme, with areas encompassing a broad geographic zone has been shown to be adequate in recent genetic studies carried out in the area25,45,46. Localities along the Iberian Peninsula coast were sampled on board R/V ‘Cornide de Saavedra’ during the MEDITS_ES47, MEDITS_FR 2009, DEMERSALES 2010 and ARSA48,49 fishery research surveys. Each haul was performed along a fixed isobath during day-time hours. Samples from Wales, Adriatic Sea and Greece were donated by researchers from Bangor University-Wales (UK), the Istituto Centrale per la Ricerca Scientifica e Tecnologica applicata al Mare (Italy) and the Institute of Marine Biological Resources (Greece) respectively.
After collection, muscle tissue was preserved in 100% ethanol and total genomic DNA extraction was performed with Chelex 10%50. Whenever possible, individual crabs were characterized using both morphometric measurements and multiple genetic markers (mtDNA and/or SSRs). A fragment of the cytochrome oxidase I (COI) gene was amplified and sequenced following previously described methods12. Haplotype sequences were deposited in GenBank under accession numbers (JN564801-JN564803; JN564806; JN564808; JN564809; JN564819; JN564821; JN564825; JN564827; JN564828; KU941957-KU941974; OR414606-OR414631). Eleven microsatellite markers were amplified through two multiplex reactions51. Crab carapace shape was quantified using geometric morphometric techniques; a total of 350 adult male specimens (> 30 mm carapace width measured between the end of the fifth antero-lateral spines) were used (Table 1). Whole specimens were taken to the laboratory, cleaned, and digitized using a HP Scanjet 5590P flat scanner, previously calibrated to avoid angular distortions (Fig. S1).
Diversity analyses
According to the classification of landmarks given by various authors, the landmarks considered for the description of the L. depurator shell shape (Fig. S1) were type 1 (positions 1–14, 17), semi-landmark (position 15) and pseudo-landmark (position 16)52–54. Only the left side of the shell was used to avoid duplication of equivalent landmarks and the computational problems associated with shell symmetry55,56, as well as to avoid intra-individual variation57.
The shape coordinates of both landmarks and semi-landmarks were then rotated, scaled (to unit centroid size) and translated through Generalized Procrustes Analysis in order to separate size and shape components of form variation and to obtain the relative warps using tpsRelw v1.4958. Carapace structure was thus characterized through a Principal Components Analysis (PCA) on the Procrustes coordinates59.
COI sequences were visually inspected, aligned and trimmed with BioEdit v7.0.160. Nucleotide diversity (p), haplotype diversity (h) and their standard deviations were calculated for each area using DnaSP v561. For microsatellites (SSR), the genetic diversity was assessed in terms of allele frequencies, observed (HO) and expected heterozygosity (HE) using GeneAlEx 6.4162. Allelic richness was estimated using FSTAT63.
Population differentiation
To determine patterns of morphometric differentiation among sampled areas, a multivariate analysis of variance (MANOVA) was performed on the relative warp scores. After assessing the degree of variation among areas, differentiation among populations and within areas was also determined, undergoing pairwise comparisons with the post-hoc Tukey’s Honest Significant Difference test. A canonical variate analysis (CVA) was also performed on the relative warp scores to provide an ordination of all the specimens in a morphological space64. Mahalanobis distances (D2) were estimated among the population centroid pairs defined by the CVA. Significance was evaluated using the multivariate Hotelling T2 test as implemented in PAST v2.1265. All statistical analyses were carried out using the R packages66 shapes v1.2.7 (https://cran.r-project.org/web/packages/shapes/index.html), Morpho v2.11 (https://cran.r-project.org/web/packages/Morpho/index.html) and Vegan v2.6-4 (https://cran.r-project.org/web/packages/vegan/index.html).
The Median Joining algorithm was used to construct a COI haplotype network using PopArt67. Pairwise genetic differentiation among sampling sites was estimated measuring GammaST values and its significance was obtained using the Snn statistic as implemented in DnaSP v561. Geographic distances used for isolation by distance analyses were measured along a 200m isobar using the Karto v5.2 software68. For microsatellite markers, genetic differentiation among all pairs of populations was determined using Wright’s F-statistic (FST) implemented in GENEPOP v4.169. The False Discovery Rate (FDR) correction was applied when evaluating the significance of pairwise comparisons in order to avoid Type I errors70.
The number of genetically differentiated groups (K) was detected using STRUCTURE v2.3.3.71. Twenty runs with 200000 Markov Chain Monte Carlo (MCMC) iterations each were carried out in order to quantify the standard deviation (SD) of the likelihood of each K with a range of Ks between 1 and 10. Furthermore, the log probability of the data (LnP(D)) as a function of K across the 20 runs was plotted in order to look for the value of K capturing the major structure in the data.
Cline Analyses
The Q values from the STRUCTURE analyses, carried out independently for mtDNA and SSR data, were used to summarize the variation in genetic markers (SI Appendix). For morphometric variables, an equivalent dimensionality reduction was obtained through Principal Components Analysis54. Each genetic and morphometric variable was rescaled to run from 0 to 1 over the original range values72. The presence of spatial gradients in latitude or longitude was explored using a 2-dimensional linear model of the form:
The LinearModelFit function in Mathematica v10.173 was used to fit the scaled variables for successive coordinate values and to estimate 95% confidence intervals (95%CI).
The Atlantic-Mediterranean contact zone was further analysed by fitting parametric curves to scaled phenotypic and genetic data, so that parameter values can be directly related to the processes of interest (e.g. natural selection). A logistic model of the form:
was used to fit the rescaled variables for successive values of x (spatial location). Spatial location in kilometers was estimated along the minimum path connecting the Iberian Peninsula localities sampled from CADI to CCAT by using the FindShortestTour function as implemented in Mathematica v10.173. Cline width is defined as the inverse of the maximum gradient of a cline74. The width of a cline is then approximately proportional to when selection is low (σ is the gene flow, measured as the standard deviation of parent-offspring distances along one dimension, and s is the selection coefficient; weak selection, s < 0.1, is assumed). Estimates of cline parameters and their confidence intervals (CI) were obtained through the LogitModelFit function73.
Time in generations since secondary contact for a neutral cline is inversely proportional to the variance of parent-offspring dispersal distance2:
Therefore, independent estimates of dispersal can be used to calculate the time since secondary contact under the assumption of neutrality. Liocarcinus depurator is known to be a short-lived species (lifespan = 2-3 years), attaining maturity during the first year of life, and with generation time of approximately 1 year76,77. Gene flow is measured as the average dispersal distance of individual crabs per generation, which is estimated to range from 60 to 171 km42 (see discussion). If a cline is not neutral but maintained by dispersal and selection, the strength of selection can be inferred from the ratio between its width and the variance of parent-offspring distance: The precise relation depends on the way selection acts, but an order of magnitude estimate can be made without knowing such details. Calculations were performed in Mathematica v10.173.
Next-generation sequencing data
Transcriptome sequencing data for 2 specimens of L. depurator collected from Mediterranean waters (Sète, France), was downloaded from the EBI database (Experiment: SRX565184). Total RNA had been extracted from gill tissue78, with cDNA generated by retrotranscription after mRNA enrichment, and amplified by random-primer PCR. Fragments were sized to ca 500 bp before adaptor ligation and RNA-Seq was completed on an Illumina HiSeq 2000 sequencer. The RNA sequences were mapped on the new Ldep11 contig in order to identify areas being transcribed.