Chloroplast genome features of six species of Wikstroemia
The total length of the chloroplast genomes of the six species of Wikstroemia analyzed in this study ranged from 172,610 bp (W. micrantha) to 173,697 bp (W. alternifolia). All six chloroplast genomes exhibited the typical quadripartite structure (Figure 1) consisting of a pair of IRs regions (41,850—42,073 bp) separated by an LSC region (86,111—86,7017 bp) and an SSC region (2,799—2,871 bp). All six chloroplast genomes had the same 36.7% GC content. However, the GC content in the chloroplast genome of each species of Wikstroemia was unevenly distributed. The IR region accounted for the highest GC content (38.8--38.9%), followed by the LSC region (34.7—34.9%), while the SSC region was recorded as having the lowest GC content (26.9—29.5%).
The six chloroplast genomes of Wikstroemia displayed identical gene content, gene order and no structural reconfigurations. A total of 139 genes were predicted in six species used in this study, comprising 92 or 93 protein-coding genes, 38 tRNA genes, and 8 rRNA genes (Table 1). However, 28 genes were duplicated in the IR regions, including 16 protein-coding genes (ccsA, ndhA, ndhB, ndhD, ndhE, ndhH, ndhG, ndhI, psaC, rpl2, rpl23, rps7, rps12, rps15, ycf1, ycf2), eight tRNA genes (trnA-UGC, trnI-CAU, trnI-GAU, trnL-CAA, trnL-UAG, trnN-GUU, trnR-ACG and trnV-GAC) and four rRNAs (rrn4.5, rrn5, rrn16, and rrn23) (Table 2). A total of 15 genes were found to contain an intron, with five of them (ndhB, rpl2, trnA-UGC and trnI-GAU) located in the IR region and the remaining 10 genes (atpF, petB, petD, rpl16, rpoC1, rps16, trnG-UCC, trnL-UAA, trnK-UUU and trnV-UAC) located in the LSC region (Table S1). However, only the ycf3 gene, which was present in the LSC region, was detected to contain a pair of introns. Upon comparison, we found that the trnK-UUU gene had the longest intron, ranging from 2,498—2,508 bp, in all six genomes.
Repetitive sequence analysis
The total number of SSRs in the chloroplast genome sequences of W. alternifolia, W. canescens, W. capitata, W. dolicantha, W. micrantha, and W. scytophylla were 127, 128, 109, 87, 90, and 110, respectively (Figure 2a). However, no hexanucleotides were detected in the chloroplast genome sequences of W. alternifolia, W. canescens and W. scytophylla. The majority of SSRs (W. alternifolia: 70.79%; W. canescens: 70.31%; W. capitata: 68.81%; W. dolicantha: 63.22%; W. scytophylla: 63.64%; W. micrantha: 61.11%) were locatedin the LSC regions rather than in the other two regions of the chloroplast genome (Figure 2b).
All six species of Wikstroemia contained the same number of long repeats (Figure 3a). In general, all of them contained 24 forward repeats and 25 palindromic repeats, except for W. canescens and W. capitata. Long forward repeats that ranged between 30 and 40 bp were found most abundant in W. dolicantha and W. micrantha; while W. alternifolia, W. canescens, W. capitata, and W. scytophylla were recorded with higher number of long forward repeats with the lengths of 41 to 60 bp (Figure 3b). Long palindromic repeats were equally abundant in W. alternifolia and W. canescens, ranging from 40 to 60 bp and above 60 bp (Figure 3c), while long palindromic repeats were abundant in the range of 30 to 60 bp in W. capitata, W. dolicantha, W. micrantha, and W. scytophylla. Long reverse repeats were only detected in W. canescens and W. capitata, and mostly occurred within the range of 30 to 40 bp (Figure 3d).
Analysis of Codon usage
A total of 30 preferred codons (RSCU >1.00) were recorded each in W. alternifolia, W. canescens, W. capitata, W. dolicantha, W. micrantha, and W. scytophylla, in which 10, 10, 11, 11, 11, and 10 were with low preferences (1.00 <RSCU <1.30); 10, 10, 9, 9, 9, and 9 were with moderate preferences (1.30 ≤RSCU ≤1.50); and 12, 11, 13, 12, 12, and 12 were with strong preferences (RSCU>1.50), respectively (Table S2). The stop codon, UAA, was recorded to be most abundant and more preferred when compared to the other two stop codons, UAG and UGA, in all six species. Preferred codons mostly ending with amino acids A or U, except for the leucine (Leu)-encoded codon, UUG. The Leu-encoded codons accounted the highest occurrence (9.38%), while the cysteine (Cys)-encoded codons were recorded as having the least occurrence (3.13%) among all six Wikstroemia species.
Sequence divergence analysis
The chloroplast genome sequence alignment of eight species of Wikstroemia, using the W. chamaedaphne chloroplast genome as the reference, indicated high sequence conservatism across the chloroplast genomes of eight species of Wikstroemia, but not in the chloroplast genome of W. indica (Figure 4). As a whole, the size and gene order of the chloroplast genomes in Wikstroemia are well-conserved, but a distinct large gap was observed beginning within the ycf1 gene sequence of the IRa to the 5’ region of the trnL-UAG in the IRb of W. indica. Both of the single-copy regions were recorded as having greater sequence divergence than the IR region (Figure 5). With a Pi-value cut off point of 0.025, eight highly variable gene regions were identified; the ndhD-ndhF, ndhF-rpl32, ndhJ, petL-petG, psbI-trnS-GCU, trnG-UCC, trnK-UUU-rps16 and the trnL-UAA-trnF-GAA intergenic spacer regions. Six of these highly variable regions were located in the LSC, while two of them were in the SSC region.
Contraction and expansion in IR region
The genes adjacent to the IR borders were consistent across members of Wikstroemia, except in W. indica, which varied for its adjacent genes at the IRb/SSC (JSB) and IRa/SSC (JSA) border (Figure 6). Instead of the presence of rpl32 and ndhF genes in the SSC region, adjacent to JSB and JSA respectively, the ycf1 gene was located across both the JSA and JSB in the chloroplast genome of W. indica. The trnL-UAG gene was also placed adjacent to the JSA, in the SSC region of the W. indica chloroplast genome. On the other hand, six species (W. alternifolia, W. chamaedaphne, W. dolicantha, W. indica, W. micrantha, and W. scytophylla) had their rps19 gene crossing the IRa/LSC (JLA) border.
Phylogenetic analysis
The ML and BI trees based on the complete chloroplast genome sequences revealed that all the branch nodes for eight species of Wikstroemia included in the phylogenetic tree were supported with high bootstrap values and Bayesian posterior probabilities (ML: ≥90%; BI: ≥ 95%) (Figure 7). In addition, it was suggested a paraphyletic relationship was present in the genus Wikstroemia. Two species, W. alternifolia and W. canescens, were clustered with Stellera chamaejasme; while six species of Wikstroemia (W. capitata, W. chamaedaphne, W. dolicantha, W. indica, W. micrantha, and W. scytophylla) formed a monophyletic group.
For the ITS sequences in the ML tree revealed a paraphyletic relationship between Wikstroemia and S. chamaejasme; while most of the branch nodes within the Wikstroemia clade were not highly supported (Figure 8a). Strong bootstrap supports were recorded for the sistership between W. alternifolia and W. canescens, and W. micrantha and W. stenophylla. Weakly supported sisterships were present between W. dolicantha and W. scytophylla, and W. capitata and W. ligustrina. Contrarily, the BI analysis displayed a monophyletic relationship within the Wikstroemia clade (Figure 8b). Similar to the ML tree, sisterships were strongly supported between W. alternifolia and W. canescens; and W. micrantha and W. stenophylla, but not between W. dolicantha and W. scytophylla, and W. capitata and W. ligustrina in the BI tree.