Assessment of palm SSR marker transferability to B. aethiopum and evaluation of their capacity for characterizing genetic diversity
Of the 80 microsatellite markers selected from the three model palm species E. guineensis, P. dactylifera and C. nucifera and tested for amplification on B. aethiopum DNA, 18 (22.5%) generate amplification products (Table 1). No amplification is observed using the 11 C. nucifera markers, whereas 7 (15.9%) and 11 (44%) of the P. dactylifera and E. guineensis markers, respectively, show a successful amplification. None of the amplification products generated with P. dactylifera primers display genetic polymorphism in our B. aethiopum test panel. Among E. guineensis-derived SSR markers however, two, namely ESSR566 and ESSR652, display polymorphism. However, it must be noted that depending on the DNA sample the ESSR566 primer pair generates a variable number of amplicons with distinct sizes, which may be an indication that more than one locus is targeted.
Table 1
Summary of SSR markers transferability assessment
Species of origin | Number of SSR markers tested | Number of successful amplifications (% of markers) | Number of polymorphic amplicons (% of amplifications) |
Cocos nucifera | 11 | 0 (0) | 0 (0) |
Phoenix dactylifera | 44 | 7 (15.9) | 0 (0) |
Elaeis guineensis | 25 | 11 (44.0) | 2 (18.2) |
Total | 80 | 18 (22.5) | 2 (11.1) |
Overall, during this phase of the study we detect polymorphism in our B. aethiopum test panel with only 2 (11.1% of successfully amplified markers, 2.5% of total) of the palm SSR primer pairs assayed. Only one of these markers, namely ESSR652, enables unambiguous detection of microsatellite locus polymorphism in B. aethiopum, and might therefore be used for studying genetic diversity in this species.
De novo identification of microsatellite sequences in the B. aethiopum genome and assessment of potential SSR markers
In order to enable a more precise evaluation of genetic diversity in B. aethiopum, we developed specific B. aethiopum markers from de novo sequencing data. A total of 23,281,354 raw reads with an average length of 250 bp have been generated from one MiSeq run. Raw sequence reads have been trimmed resulting in 21,636,172 cleaned-up reads, yielding 493,636 high-quality reads after filtering (Q > 30) from which 216,475 contigs have been assembled.
From the contigs, the QDD software identifies a total of 1,618 microsatellite loci (Additional file 1), of which 1,327 (82.01%) are perfect (i.e. repeat size 4 bp or smaller and repeat number 10–20). Among the perfect microsatellite loci, 83.86% are composed of di-nucleotidic repeat units, 13.06% of tri-nucleotidic units, 2.39% of tetra-nucleotidic repeats and 0.67% of repeats with five nucleotides and over. From these, we selected SSR markers composed of di- (AG) or tri- nucleotide repeats, using the following criteria for specific amplification of easily scorable bands: primer lengths ranging from 18 to 22 bp, annealing temperatures 55–60 °C, and predicted amplicon sizes 90–200 bp.
The characteristics of the 57 selected primer pairs and the results of the test amplifications are presented in Table 2. Successful amplification of B. aethiopum DNA is obtained for 54 (94.7%) primer pairs and of these, 34 (60.0% of amplifying couples) show no polymorphism. The remaining 20 primer pairs enable the amplification of polymorphic products, however nine of them yield complex, ambiguous amplification profiles that prevent their use for reliable detection of genetic variation. As a result, 11 putative B. aethiopum SSR markers (representing 20.4% of primer pairs associated with successful amplification and 55.0% of those detecting polymorphic products in our study) are both polymorphic and unambiguously mono-locus in our amplification test panel and may therefore be used for further analyses.
Table 2
List of selected primer pairs targeting putative B. aethiopum microsatellite loci and assessment of their polymorphism detection ability.
Locus name | Repeat motif | Primer sequences (5'-3' orientation) | Expected amplicon size (bp) | Amplification product |
MBo01 | [AGG]7 | CCTATCCTTCCATCCCGATCG | 90 | complex, polymorphic |
TTGCCGTGAATCAGCCTCAA |
MBo02 | [ATC]7 | GGGAGAACAAGGATAACAGCAG | 115 | single locus, monomorphic |
TCCATTTCATCACTAGCTCGGT |
MBo03 | [AGG]7 | CTCCGAGCCCTAGCAACTTT | 131 | single locus, monomorphic |
TCTGGATGACGAAACCTTCACA |
MBo04 | [ACC]7 | GATGTGGCCGCTCTGATCTC | 192 | single locus, monomorphic |
ACATGCTGGCAAGGTATTCT |
MBo05 | [AAG]7 | GTCCTAGCACGCTGGCATTA | 202 | single locus, monomorphic |
TGGGTTGCCAATGAACCCTT |
MBo06 | [ATC]7 | TGGCCATTCAACTGCTTCAC | 202 | single locus, monomorphic |
GAATCTAGCACCAGCAAACCC |
MBo07 | [AAG]7 | GGCACTGGAGTCCACATCAA | 239 | single locus, monomorphic |
TCCTTCTGTACTGGCATCTCT |
MBo08 | [AGG]8 | TGATTGTTTCCTCTTCCCTCCT | 90 | single locus, monomorphic |
TTAATGAGCCGAAGAGGAGCC |
MBo09 | [AGG]8 | TCCCTCACTCCCATCCTCTC | 163 | single locus, monomorphic |
ACTCCACTCCTTCCCTCATACA |
MBo10 | [AAC]8 | GTTAAAGACGCAGGGCTGGA | 166 | single locus, monomorphic |
CCCACTTAGTGAGATAAGACTTGA |
MBo11 | [ATC]8 | GCATCACATGGTTTCAGGCT | 219 | single locus, monomorphic |
GCTCAACCATCGGCAGTGTA |
MBo12 | [ATC]9 | GGAGGAAAGGTTGCCCTAGAA | 102 | single locus, monomorphic |
TCTCAACCTGATGTCATTGCA |
MBo13 | [AAG]9 | CAGGTTGCATCGGCCCATT | 103 | complex, polymorphic |
GGAGCCTAATGCACCCAGAG |
MBo14 | [AAC]9 | ATGGCCGATCCCACTTAGTG | 117 | single locus, monomorphic |
GAGAGAACGGCAATAATTTATGCA |
MBo15 | [AAG]10 | GCTGAAGAGGATGAAGAAGAAGC | 92 | complex, monomorphic |
TCATCATCTCCCTCTCCTTCT |
MBo16 | [AGG]10 | CAGCACTGGCCTCACAGC | 118 | single locus, monomorphic |
CCGTCGATCAGTTGTTGGAGA |
MBo17 | [ATC]10 | ACACAATGACCTTTCGCTGA | 124 | single locus, monomorphic |
CCAAACAGGACCTTATGCCA |
MBo18 | [AAG]10 | ACATCCTCTCCTTCATCTCCTT | 187 | complex, polymorphic |
GTTCCTACAATGCTTGGCGC |
MBo19 | [AAG]10 | TGCTATCACCCAATATCTAGGCT | 202 | single locus, monomorphic |
ACAGTCAACAACTACCATACTGC |
MBo20 | [AAG]10 | TGTGGTTAAAGCAATGGAAGCA | 229 | single locus, monomorphic |
GCCGAACTCCTACTCTCATACG |
MBo21 | [AAG]11 | ACAACAGAAGATCAGTATACGTTCT | 171 | single locus, monomorphic |
TTGAGGAATCATGCTTGTCAGT |
MBo22 | [AAG]14 | AGAAGAATTCGGTTAGGTCACAA | 108 | single locus, monomorphic |
AGATAACATGGGTAAGAATTGCCT |
MBo23 | [AAT]5 | TGAGTTCTTGTCTTGTCTTCGT | 100 | single locus, monomorphic |
GGTTTGGGACACCCTTCAGG |
MBo24 | [AAT]9 | AAAGTCATGTCTGGGTGATGAA | 90 | single locus, monomorphic |
ATGATGAGCACAGCTACAACTCT |
MBo25 | [AAT]6 | TCTTCAGGTGACAAGCAACA | 96 | single locus, monomorphic |
CCTGGGCATGGAGATAGCAT |
MBo26 | [AAT]7 | CCATAGGCCAGCCCACTATA | 134 | single locus, monomorphic |
ACCCTTTCTTCTTCCTCATTTGT |
MBo27 | [AAT]7 | TCTCTATTGCTTGGTGATCCC | 103 | single locus, monomorphic |
TCCAACAAGGGATGGTTATCATG |
MBo28 | [AAT]8 | GCCTTGAGAGTGGAAGAGGC | 205 | single locus, monomorphic |
TCTCTTCTTTGCGCCCTCAT |
MBo29 | [AAT]16 | AGACATGTAGAGGTGGGACT | 211 | single locus, monomorphic |
TCTGTATGAGAGACGTGTTACAGT |
MBo30 | [AAT]8 | TGACCATAACAAGCTACCAGGT | 146 | single locus, monomorphic |
GGTGGAAGCTATTGATATTGCATGT |
MBo31 | [AAT]10 | TGACAATGATGCATGCGATAACA | 187 | single locus, monomorphic |
GCATCACCCATGTCCTTTAGC |
MBo32 | [AAT]10 | TCCGAGGGCAGTATTTGTCG | 117 | single locus, monomorphic |
CACTATTTCGGAAACCTAAGCCC |
MBo33 | [AAT]17 | GCACACTTTGTATCCGACGC | 147 | single locus, monomorphic |
CAGGGATAGTAACCGTCAGGG |
MBo34* | [AG]28 | GTGGCACCTCTGCGGTTT | 192 | single locus, polymorphic |
CGAGATGGAAGCACCTGGAG |
MBo35* | [AG]24 | AGCATGCTTTCTGCTTCATGTG | 137 | single locus, polymorphic |
CCTTTCCCTGACTGCATTGC |
MBo36 | [AG]23 | TCGGAAGTCGAATGTGGCAG | 180 | no amplification |
TCGGAAGAGTGGTCAATCATGG |
MBo37 | [AG]23 | GCTCTACTCCCAGAGACGGA | 142 | complex, polymorphic |
AACAGTCGACGGAATGCTCA |
MBo38* | [AG]20 | AGTCCTCACTGCTGGTGGTA | 130 | single locus, polymorphic |
TCCTTGAATAGTCCATCTTGCA |
MBo39 | [AG]19 | AACGCAGGTTAAGAGGCTCC | 168 | complex, monomorphic |
CCTCCTGGTGCAACCCTTAC |
MBo40 | [AG]19 | TGTGGAGTGTGAGTCGATGG | 193 | complex, polymorphic |
GGCTGCATAATCTCATCACGC |
MBo41* | [AG]18 | TTCTCCACCAGCCTCACAAC | 184 | single locus, polymorphic |
ATACGGCCCATCAACCCTTC |
MBo42 | [AG]18 | CCTGGTGGTACATGTGGTCA | 136 | complex, polymorphic |
TGTGGCACATTCATTTCTGAAGG |
MBo43 | [AG]18 | AGTTTGTTCTGTGTGTTGTCAC | 137 | no amplification |
GCACACATCTTGCTTTGAAGAC |
MBo44 | [AG]17 | AACACACTTTAAATCGACTTCTTCA | 193 | complex, polymorphic |
CACGGCTGCCATGTGAGG |
MBo45 | [AG]17 | TAGATCGGAAGTCAGGCCC | 193 | no amplification |
AGAGAAGTGGGAGGAGAGGTC |
MBo46 | [AG]17 | GCCGATATTAGCTTCTTCTTGGC | 154 | single locus, monomorphic |
GCCTTGTTGATCCCGTTTCAC |
MBo47 | [AG]16 | GGCACCTGACGCCTCTTT | 188 | single locus, monomorphic |
TCACTTCGACTCAATTGTATCCAT |
MBo48 | [AG]16 | AGGACAAAGAGATGAGAAGCCT | 92 | complex, polymorphic |
ACCAATTCCCAGTTAGTTGACCA |
MBo49* | [AG]16 | CATCACCCATTCTCTCTGCCT | 141 | single locus, polymorphic |
GAGAAACCATCCGCACCTCA |
MBo50* | [AG]15 | AGAAGTCATCTTGAGGGCCC | 150 | single locus, polymorphic |
TTGCTAGAATGATACACAAATTGCT |
MBo51* | [AG]15 | TGTGCTATTTGTTGGGAATGCA | 191 | single locus, polymorphic |
GCAAGCTCATGTTCTAGTTTCAAGT |
MBo52* | [AG]15 | ACACATCCTACATGAATAGACCTCC | 122 | single locus, polymorphic |
TCTTGTCATAGCCTAGATTCCCT |
MBo53 | [AG]15 | AGGTTTAAGGGTTTGGGTTAGGG | 131 | single locus, monomorphic |
GGTGGAGTAAGTTTGAGGGTCA |
MBo54* | [AG]11NNN[AG]15 | CATATGCTGATACAAGAGAGAGGG | 124 | single locus, polymorphic |
ACCTTATAAGCAGGATCCAGACA |
MBo55 | [AG]15 | TGGAATCAACCTTGGGTCTACA | 198 | complex, polymorphic |
TCGTCGGTCTTCTAGCCACT |
MBo56* | [AG]15 | ACCAAGATCAAGCACGAGGA | 103 | single locus, polymorphic |
AGGATCACCCTTTCTTTCTTTCT |
MBo57* | [AG]15 | GGGTTCAATCCTGATGAGAGCA | 136 | single locus, polymorphic |
ACCGTTCGATCAACCATGGT |
Loci for which single-locus SSR polymorphism has been detected within our test panel of seven B. aethiopum individuals are signaled by an asterisk (*). |
Conventionally, microsatellite motifs are displayed under the form [N1N2]x or [N1N2N3]x for dinucleotide and trinucleotide loci, respectively, where N1, N2 and N3 represent nucleotides included in the elementary unit of the motif and x is the number of unit repetitions. Expected amplicon size is as predicted by QDD. |
Microsatellite-based characterization of genetic variation of B. aethiopum in Benin
The previously defined set of 11 B. aethiopum-specific SSR markers has been used for the characterization of genetic diversity in our full panel of 180 individual samples from nine locations distributed across Benin. As shown in Table 3, among our sample set the number of alleles per microsatellite locus ranges from 2 for marker Mbo41 to 6 for markers Mbo34, Mbo35, and Mbo50, with an average value of 4.27, whereas expected heterozygosity (He) values range from 0.031 (marker Mbo56) to 0.571 (marker Mbo35). Using these markers, the analysis of genetic diversity (Table 4) shows that the number of polymorphic markers detected at the microsatellite loci investigated ranges from 8 (sites of Togbin and Malanville) to 10 (Savè, Agoua, Pendjari, Pingou and Trois Rivières), with a mean value of 9 ± 0.865. With the exception of Savè, Hounviatouin and Malanville, 1 to 3 private alleles of the targeted microsatellite loci are observed in most sampling locations. Regarding the genetic parameters, the number of effective alleles (Ne) ranges from 1.447 to 2.069 with an average number of 1.761. He values range from 0.263 (Hounviatouin) to 0.451 (Savè) with an average value of 0.354 whereas the observed heterozygosity (Ho) varied from 0.234 (Togbin) to 0.405 (Pingou) with an average value of 0.335. Negative values of Fixation index (F) are obtained for Pingou, Malanville and Trois rivières whereas positive F values, indicating a deficit of heterozygosity, are observed in all other sites investigated.
Table 3
Characteristics of 11 polymorphic microsatellites markers used for genetic diversity analysis of B. aethiopum
Locus name | Number of alleles scored/locus | Expected Heterozygosity (He) | Observed Heterozygosity (Ho) |
Mbo34 | 6 | 0.520 | 0.383 |
Mbo35 | 6 | 0.571 | 0.522 |
Mbo38 | 5 | 0.458 | 0.513 |
Mbo41 | 2 | 0.343 | 0.356 |
Mbo49 | 4 | 0.167 | 0.146 |
Mbo50 | 6 | 0.548 | 0.542 |
Mbo51 | 3 | 0.320 | 0.304 |
Mbo52 | 3 | 0.201 | 0.232 |
Mbo54 | 4 | 0.26 | 0.435 |
Mbo56 | 3 | 0.031 | 0.034 |
Mbo57 | 5 | 0.296 | 0.263 |
Table 4
Mean diversity parameters for each of the nine B. aethiopum sampling sites.
Geo-climatic region | Site | Number of polymorphic markers | Na | Ne | Number of private alleles | Ho | He | F |
Guineo-Congolian (South) | Togbin | 8 | 2.273 | 1.584 | 3 | 0.234 | 0.288 | 0.145 |
Hounviatouin | 9 | 2.182 | 1.447 | 0 | 0.272 | 0.263 | 0.007 |
Sudano-Guinean (Center) | Savè | 10 | 2.909 | 2.069 | 0 | 0.384 | 0.451 | 0.134 |
Biguina | 9 | 2.364 | 1.770 | 2 | 0.345 | 0.374 | 0.064 |
Agoua | 10 | 2.273 | 1.722 | 1 | 0.329 | 0.358 | 0.059 |
Sudanian (North) | Pendjari | 10 | 2.818 | 1.900 | 3 | 0.368 | 0.396 | 0.055 |
Pingou | 10 | 2.364 | 1.906 | 1 | 0.405 | 0.390 | -0.063 |
Malanville | 8 | 2.455 | 1.627 | 0 | 0.302 | 0.303 | -0.020 |
Trois rivières | 10 | 2.545 | 1.822 | 2 | 0.373 | 0.360 | -0.055 |
| Overall mean | 9 ± 0.865 | 2.465 ± 0.103 | 1.761 ± 0.065 | | 0.335 ± 0.023 | 0.354 ± 0.023 | 0.035 ± 0.022 |
Na: average number of different alleles; Ne: effective number of alleles; Ho = Observed Heterozygosity; He: Expected Heterozygosity; F: Fixation index
Population structure of B. aethiopum in Benin
Nei's genetic distance among locations (Table 5) ranges from 0.073, as observed between Togbin and Hounviatouin (Guineo-Congolian region), to 0.577 between Togbin (Guineo-Congolian region) and Trois Rivières (Sudanian region). Overall, genetic distances between B. aethiopum sampling locations are lowest within the same region, with the lowest genetic distances among the sites of Pendjari, Pingou, and Trois Rivières which are all located in the Northern part of the country. One interesting exception is the Central (Guineo-Sudanian) region of Benin, where we find that the most genetically distant location from Savè is the one from the Agoua forest reserve (0.339). Surprisingly, Savè displays its highest genetic identity value when compared to the other two collection sites located within protected areas, namely Pendjari (0.870) and Trois Rivières (0.882) which are both located in the Sudanian region. This is an unexpected finding considering the geographic distances involved.
Table 5
Pairwise location matrix of Nei’s genetic distance and genetic identity values
| Togbin | Hounviatouin | Savè | Biguina | Agoua | Pendjari | Pingou | Malanville | Trois Rivières |
Togbin | - | 0.073 | 0.477 | 0.253 | 0.337 | 0.517 | 0.494 | 0.487 | 0.577 |
Hounviatouin | 0.929 | - | 0.419 | 0.110 | 0.215 | 0.435 | 0.317 | 0.375 | 0.535 |
Savè | 0.621 | 0.658 | - | 0.270 | 0.339 | 0.140 | 0.265 | 0.238 | 0.126 |
Biguina | 0.776 | 0.896 | 0.763 | - | 0.152 | 0.241 | 0.161 | 0.186 | 0.316 |
Agoua | 0.714 | 0.806 | 0.713 | 0.859 | - | 0.408 | 0.304 | 0.359 | 0.490 |
Pendjari | 0.596 | 0.647 | 0.870 | 0.786 | 0.665 | - | 0.167 | 0.108 | 0.103 |
Pingou | 0.610 | 0.728 | 0.767 | 0.851 | 0.738 | 0.846 | - | 0.174 | 0.175 |
Malanville | 0.614 | 0.688 | 0.788 | 0.831 | 0.699 | 0.898 | 0.841 | - | 0.145 |
Trois Rivières | 0.561 | 0.585 | 0.882 | 0.729 | 0.613 | 0.902 | 0.840 | 0.865 | - |
Above the diagonal: Nei's genetic distance; below: genetic identity.
A similar structure of genetic distances emerges from the analysis of pairwise location genetic differentiation (Fst) (Table 6), suggesting genetic differentiation according to geographic distances between collection sites, with the notable exception of the lower genetic differentiation between samples from Savè and those from either one of the forest reserves in the Northern region, namely Pendjari and Trois Rivières.
Table 6
Pairwise sampling locations Fst value
| Togbin | Hounviatouin | Savè | Biguina | Agoua | Pendjari | Pingou | Malanville | Trois Rivières |
Togbin | 0.000 | | | | | | | | |
Hounviatouin | 0.072 | 0.000 | | | | | | | |
Savè | 0.233 | 0.221 | 0.000 | | | | | | |
Biguina | 0.168 | 0.086 | 0.145 | 0.000 | | | | | |
Agoua | 0.215 | 0.153 | 0.157 | 0.105 | 0.000 | | | | |
Pendjari | 0.247 | 0.212 | 0.077 | 0.120 | 0.188 | 0.000 | | | |
Pingou | 0.252 | 0.181 | 0.138 | 0.103 | 0.169 | 0.100 | 0.000 | | |
Malanville | 0.301 | 0.246 | 0.149 | 0.121 | 0.197 | 0.072 | 0.119 | 0.000 | |
Trois Rivières | 0.285 | 0.279 | 0.076 | 0.178 | 0.224 | 0.073 | 0.104 | 0.107 | 0.000 |
In order to assess the strength of the relationship between genetic and geographic distances, we plotted them as a linear regression and performed the Mantel permutation test. As shown in Fig. 1, the positive correlation between both variables is weak, but significant (R2 = 0.1139, P = 0.040).
AMOVA (Table 7) shows that within-site variation underlies the major part (53%) of total variance, whereas among-site and among-regions variations explain genetic variance to a similar extent (23 and 24%, respectively). Accordingly, the average Number of migrants between collection sites (Nm = 1.019) is low, indicating very limited gene flow.
Table 7
Source | df | SS | MS | Est. var. | % total variance | P value |
Among Regions | 2 | 309.407 | 154.704 | 1.944 | 24% | < 0.001 |
Among Locations | 6 | 254.302 | 42.384 | 1.903 | 23% | < 0.001 |
Within Locations | 171 | 739.100 | 4.322 | 4.322 | 53% | < 0.001 |
Total | 179 | 1302.809 | | 8.169 | 100% | |
df = degree of freedom, SS = sum of squares, MS mean squares, Est. var. = estimated variance
The Principal Coordinates Analysis (PCoA) of 180 B. aethiopum samples (Fig. 2A) shows that the first axis (accounting for 24% of total variation out of a sum of 33.90 for axes 1 and 2) roughly separates individual samples in two main groups, a result that is in agreement with the analysis of genetic distances. The sampling locations-based PCoA (Fig. 2B) confirms the genetic separation along the first axis (accounting for 44.08% of total variation over a total of 61.06% for the sum for axes 1 and 2) between sites from the Guineo-Congolian (Southern) region, plus the sites of Agoua and Biguina (Center) vs. sites from the Sudanian (Northern) region, plus the site of Savè (Center). Although the distinction is not as clearly marked, the second axis (accounting for 16.98% of total variation) further allows to distinguish two subgroups within the first group, corresponding to sites belonging to the Southern region and to those from the Central one, respectively.
Likewise, the Bayesian analysis of our data indicates an optimal value of K = 2 for the clustering of the samples into two groups (Fig. 3A and Fig. 3B): one group that includes samples from Togbin and Hounviatouin in the Southern part of the country, as well as most samples from Biguina and Agoua at the Western (Togolese) border of the Centre region; and one group composed of the majority of samples collected in Savè (Eastern part of the Centre region) and from the Northern locations of Pendjari, Pingou, Malanville, and Trois Rivières. Since there is a possibility that the ΔK method used for estimating K leads to over- or under-estimated values, clustering with a value of K = 3 has also been tested (Fig. 3C). As previously observed with the location-based PCoA, under this hypothesis further clustering emerges within the first group, involving samples from Togbin and Hounviatouin (South) and those from Biguina and Agoua (Center), respectively.
The Unweighted pair-group method with arithmetic mean (UPGMA) tree constructed from our data (Fig. 4) distinguishes two main groups matching the ones defined through the Bayesian analysis with K = 2, and which are supported by bootstrap values above 50. Within each of these groups, subgroups corresponding to those observed with K = 3 clustering and that globally match geo-climatic regions (Savè excepted) can further be defined. However, in this case most bootstrap values attached to these secondary branches are not significant.