Sample collection
We comprehensively sampled leaves of 581 and 114 trees from the Hahajima and Imoutojima Islands (hereafter collectively referred to as the Hahajima Islands), respectively (Fig. 1, Fig. S1). These leaf samples were used to provide data for adult trees. Adult trees included nine populations taken from the Hahajima Islands that were sampled by Sugai et al. (2019) (i.e., SHHA, SHHB, SHHC, SHHD, SHHE, SHHF, SHHG, SHIA, and SHIB). Tree locations were recorded using a GPS receiver (Garmin GPSmap 60CSx). Upon harvest, leaf samples were desiccated using silica gel for DNA extraction.
During this study, we identified four ecotypes in the Hahajima Islands by their distinctive morphological and genetic features. These ecotypes were named “G: Glabrescent,” “T: Tall,” “D: Dwarf,” and “M: Middle” (Table 1). To investigate the current gene flow among ecotypes at the seed stage, in 2013 naturally pollinated seeds were sampled from 76 maternal trees from five sites (i.e., 12–19 trees per site) dominated by each ecotype (Fig. 1, Table S1). For ecotype G, we sampled seeds from two sites, Gn and Gs. Seeds of C. subpubescens are small (i.e., ~2 mm) and it is therefore difficult to extract DNA directly from seeds; we therefore extracted DNA from germinated seedlings. Cold stratification for seeds were performed at 4°C in wet moss for six months and were then germinated at room temperature. We obtained a total of 1,260 seedlings from five sites (201–304 seeds per site; Table S1). These were preserved at −20°C until DNA extraction.
DNA extraction and genotyping
Genomic DNA was extracted from sampled leaves and seedlings using a modified CTAB method. Genotypes of each sample were characterized by the 17 EST-SSR markers listed in Table S2, which were developed for C. subpubescens (Setsukoet al. 2018). PCR was carried out in 6 µl reaction mixtures containing ca. 1 ng genomic DNA, 2.5 µl Type-it Multiplex PCR Master Mix (Qiagen, Hilden, Germany), and 0.2 µM of each primer. PCR conditions were as follows: 95°C for 5 min, then 35 or 38 cycles of 94°C for 30 s, 55°C or 60°C for 90 s, 72°C for 90 s, followed by final extension at 60°C for 30 min. PCR fragments were then separated using a 3130 Genetic Analyzer (Applied Biosystems, CA, USA) and genotyped using GeneMarker software (SoftGenetics, PA, USA).
Characteristics of EST-SSR markers
To check whether each EST-SSR locus met the requirements for population genetic analyses, we used BayeScan 2.1 (1,000,000 simulations) (Foll 2012) to identify outlier loci, which we defined as those with excessively high or low FST compared to neutral expectations. The existence of null alleles was checked using Micro-Checker version 2.2.3 (Van Oosterhoutet al. 2004) and linkage disequilibrium between loci in each population was tested using GENEPOP version 4.7 (Raymond and Rousset 1995; Rousset 2008). For these analyses, seven populations in the Hahajima Islands that did not have a pattern of admixture in Sugai et al. (2019), listed in Table S2, were used because the adult trees we sampled did not always aggregate as a population (Fig. 1) and population admixture might bias these results.
Genetic analysis
We used the Bayesian clustering program STRUCTURE version 2.3.4 (Falushet al. 2007; Pritchard et al. 2000) to identify genetic groups of adult C. subpubescens trees in the Hahajima Islands, then checked whether these genetic groups corresponded to morphological ecotypes. This program assigns individuals to K subpopulations (clusters) based on an admixture model and a correlated allele frequencies model. We used runs involving 100,000 Markov chain Monte Carlo (MCMC) iterations after a burn-in period of 50,000 iterations. The analysis was repeated 30 times for each value of K from 1 to 10. The optimal value of K was selected by assessing the likelihood distribution (mean Ln P(K)) and ΔK values (Evannoet al. 2005). Next, we checked for the existence of minor clusters using the online version of CLUMPAK (Kopelmanet al. 2015). For this analysis, we defined the genetic cluster with the largest Q value as the ecotype of adult trees. Trees assigned at Q ≥ 0.9 to each cluster were then considered to be pure adult trees, and trees with Q < 0.9 were considered to be hybrid adult trees (Kato et al. 2014; Li et al. 2021). The hybridization rate of each ecotype was calculated as the percentage of hybridized trees relative to the total number of trees of that ecotype. For hybrid adult trees, largest and second largest Q values for each type of tree were used to estimate which ecotypes interbreed and formed the hybrid. Hybrid adult trees with 0.4 ≤ largest Q < 0.6 was defined as first or second filial generation hybrids (i.e., F1 or F2), and hybrids with 0.6 ≤ largest Q < 0.9 were considered to be backcross hybrids (Liet al. 2021).
The results of our STRUCTURE analysis suggested that ecotype M resulted from hybridization between ecotypes G and T (see results). Therefore, we also conducted a principal coordinate analysis (PCoA) and a HIest (Fitzpatrick 2012) analysis using pure adults of ecotypes G, T, and M to test hybridization origin possibilities for ecotype M. PCoA was conducted using GenAlEx version 6.501 (Peakall and Smouse 2012). HIest is a program that estimates ancestry (S) and interclass heterozygosity (HI), where both S and HI range from 0 to 1. S is similar to the Q value estimated by STRUCURE and was defined in terms of ancestry of ecotype T—i.e., values of 0 and 1 indicate that a putative hybrid is actually a pure ecotype G or T, respectively. Expected values of HI for pure ecotypes, first filial generation hybrids (F1), second filial generation hybrids (F2), and first-generation backcross hybrids (BC1) were 0, 1, 0.5, and 0.5, respectively. By plotting S and HI on a triangular plot, we can see the hybrid status of the two ecotypes. The reference allele frequencies of ecotypes G and T were calculated using pure adult ecotypes G and T. When calculating allele frequencies for the alleles detected only in ecotype M, these alleles were assumed to be detected only once in both ecotypes G and T. Parameters were estimated using the maximum likelihood method with a simulated annealing algorithm of 10,000 iterations, surf options, and a start-grid value of 100. To evaluate the power of the genetic markers used, we simulated 1,000 individuals each of pure ecotype G, pure ecotype T, F1, F2, BC1 to G (BC1G), and BC1 to T (BC1T) using a sample function of R and assuming an infinite population size.
Current hybridization rates among ecotypes on the seed level were estimated using naturally pollinated seeds sampled at five sites (Fig. 1). We assessed the Q values of each seed using the USEPOPINFO option in STRUCTURE. This analysis was performed using pure adult trees (POPFLAG = 1) and seeds sampled from pure adult trees (POPFLAG = 0). Allele frequencies were updated using only the reference data with POPFLAG = 1. K was fixed at four and we ran models 30 times with 100,000 MCMC iterations after a burn-in period of 50,000 iterations. STRUCTURE analysis of adult trees revealed that 27 of 76 maternal trees were hybrids (Table S1), and therefore we eliminated seeds of hybrid trees from this analysis. We therefore used a total of 847 seeds (i.e., 85–272 seeds per site) from 49 pure maternal trees (i.e., 6–15 trees per site). We defined seeds with an assigned Q ≥ 0.9 for each cluster as pure seeds, and those with Q < 0.9 as hybrid seeds. Hybridization rates for naturally pollinated seeds were calculated as the percentage of the number of hybridized seeds relative to the total number of seeds. For hybrid seeds, the ecotype showing the largest Q value other than that of the ecotype of the maternal tree was defined as the ecotype of the pollen parent.
Paternal correlation was estimated among all seed-parent pairs from the obtained genotypes of maternal trees and seeds using POLDISP (Robledo-Arnuncioet al, 2007). Then, we compared the paternal correlation within sites and between sites. For this analysis, we used all 1,260 seeds genotyped from all 76 maternal trees, including seeds from hybrid trees (Table S1), since paternal correlation is the probability of seeds sired by the same paternal tree in different maternal trees, thus paternal correlation is unaffected by whether the maternal tree is pure or hybrid.
Ecotype characteristics
To investigate the habitat of each ecotype, we extracted the forest type of adult trees that were categorized as pure ecotypes from a vegetation map (Fig. 1) sourced from the Biodiversity Center of Japan (1999-), and obtained elevation and slope data for all pure adult trees using ArcGIS Desktop version 10.8.2 (ESRI Japan, Tokyo, Japan). Elevation and slope values were extracted from a 10 m mesh digital elevation model provided by the Geospatial Information Authority of Japan. We used the medium and fine categories of the vegetation map, which indicated the dominant species, physiognomy, and geographical conditions. For the extracted vegetation categories of C. subpubescens on Hahajima Island, mesic scrub (i.e., a forest height of 1–2 m), dry scrub (1–6 m), and mesic forest (4–20 m) accounted for 69% of all total categories. The rest included 19% that was plantation forest and a remaining 10% that was Freycinetia formosana scrub and alien grassland of Kalanchoe pinnata and other species. We classified these into three major forest types: mesic forest, mesic scrub, and dry scrub (Shimizu, 1992). In this scheme alien grasslands were classified as dry scrub and F. formosana scrub was included in the mesic scrub category based on the ecological characteristics and habitats of each species. Plantation forests were excluded from our analyses since the original forest type was unknown. In addition, we also investigated the forest height, the presence or absence of overstory trees of each adult C. subpubescens pure ecotype, and the species of overstory tree, if any. The relative photosynthetic photon flux density (rPPFD) was calculated from the following equation by simultaneously measuring the photosynthetic photon flux density (PPFD) above the canopy of each adult C. subpubescens pure ecotypeand at a nearby open site: rPPFD = (PPFD above the canopy / PPFD at open site) × 100.
To characterize the leaf morphology of each ecotype, we sampled two to five intact leaves from a total of 28 pure adult trees (i.e., seven trees per ecotype). Moreover, we also examined the following 11 leaf traits: total length, blade length, width of leaf blade, hair density on the upper and lower surface of the leaf (i.e., number of hairs per 4 mm2), number of serrations per 30 mm, thickness of leaf blade, leaf area (LA), leaf mass per area (LMA), ratio of blade length to total leaf length, and the ratio of leaf blade width to length; where applicable, these measurements were taken as described by Kawakubo (1986). These characteristics were subjected to principal component analysis (PCA) to test the morphological aggregation of leaves of each ecotype.
To characterize size distribution of adult trees of each ecotype, we measured the maximum stem length, the maximum diameter at breast height (DBH), and counted the number of stems per tree. Our samples included a total of 81 trees (i.e., 13–24 trees per ecotype) that were categorized as pure ecotypes.
Pre- and post-mating reproductive barriers
To determine whether pre-mating isolation exists among ecotypes, the flowering phenology of 57 trees (i.e., 9–18 trees per ecotype) that were categorized as a pure ecotype were investigated. The number of flowing cymes was counted for each tree once a month for eight months (i.e., May 2014 to January 2015).
To determine whether post-mating isolation exists among ecotypes, we conducted artificial inter-crossings via pollination between different ecotypes, and intra-crossing via pollination within the same ecotypes. These experiments used plants derived from cutting seedlings raised in a greenhouse. For inter-cross pollination, a total 16 cymes from five maternal plants of ecotype D were crossed with two paternal plants of ecotype G (G × D). For intra-cross pollination, a total of 18 cymes from five maternal plants of ecotype D were crossed with eight paternal plants of ecotype D (D × D, Table S3). Fruit set rates were calculated for each cyme using the following equation: (number of fruits / number of flowers in the cyme) × 100. Next a total of 240 seeds from 10 pairs of inter-crosses and 384 seeds from 16 pairs of intra-crosses were sown. Their germination was monitored for six months (Table S3). The germination rate for each crossing pair was calculated using the following equation: number of germinated seedlings / number of sown seeds) × 100. Ninety-six germinated seedlings from each cross type were then transferred to pots and their mortality was tracked for one year in a laboratory environment with LED lighting and regular watering (Table S3). The mortality rate for each crossing pair was calculated using the following equation: (number of dead seedlings / number of seedlings transferred to pots) × 100.
During the course of mortality tracking, we compared differences in soil moisture requirements among ecotypes. In addition to seedlings derived from artificial crossings (i.e., G × D and D × D), natural pollinated seeds from site Gn were sown and grown under the same conditions. Subsequently, an EST-SSR analysis was performed for seedlings from site Gn using the same method as for the adult trees, and only pure ecotype G seedlings were used for this experiment. Three months after transferred to pots, watering was temporarily stopped and the soil moisture content at the moment when seedlings began to wilt was measured using a soil moisture sensor (SM300, Delta-T Devices Ltd, Cambridge, UK). The volumetric soil water content (q% vol.) was calculated using the following equation: q = −27.8V 5 + 30.3V 4 − 0.7V 3 − 9.0V 2 + 3.8V. Here, V is the measured voltage value. This equation was obtained from the relationship between V and the volumetric water content of soil used for cultivation of seedlings.