3.1 Chloroplast genomes features
The whole cp genome of T. rupestris and T. rupestris var. ciliate respectively had the length of 155,558 bp and 155,479 bp. T. rupestris and T. rupestris var. ciliate (Figure 2). T. rupestris and T. rupestris var. ciliate cp genomes display a typical quadripartite circular structure containing one large single copy (LSC), one small single copy (SSC), and two inverted repeats (IRB and IRA) regions. In T. rupestris, an LSC region of 18,543 bp and an SSC region of 85,857 bp were separated by a pair of IR regions of 25,579 bp. The overall GC content of the T. rupestris cp genome was 36.79%, and the GC content of the SSC and LSC regions was 30.80% and 34.51%, respectively. Because each IR region is relatively rich in GC-rich ribosomal RNA (rRNA) gene and transfer RNA (tRNA) gene, the GC content of the IR region was 42.80%, which was much higher than that of the LSC and SSC regions. For T. rupestris var. ciliate, the SSC region is 85,820 bp, the LSC region is 18,499 bp, and the IR region is 25,580 bp. The GC content of the above regions is 34.50%, 30.94%, and 42.79%, and the GC content of the complete cp genome sequences is 36.80% (Table 1).
Table 1
Summary of T. rupestris and T. rupestris var. ciliata chloroplast genome features.
Species | Regions | T(U)/% | A/% | C/% | G/% | GC/% | Length (bp) | Number of protein-coding genes | Number of tRNA genes | Number of rRNA genes |
T. rupestris | Total | 31.95 | 31.25 | 18.78 | 18.02 | 36.79 | 155,558 | 84 | 36 | 8 |
IRA | 28.57 | 28.63 | 22.16 | 20.64 | 42.80 | 25,579 | |
IRB | 28.63 | 28.57 | 20.64 | 22.16 | 42.80 | 25,579 | |
SSC | 33.38 | 32.11 | 17.79 | 16.72 | 30.80 | 18,543 | |
LSC | 34.56 | 34.63 | 16.10 | 14.71 | 34.51 | 85,857 | |
T. rupestris var. ciliate | Total | 31.97 | 31.24 | 18.77 | 18.03 | 36.80 | 15,5479 | 84 | 36 | 8 |
IRA | 28.57 | 28.64 | 22.16 | 20.63 | 42.79 | 25,580 |
IRB | 28.64 | 28.57 | 20.63 | 22.16 | 42.79 | 25,580 |
SSC | 34.56 | 34.51 | 16.17 | 14.76 | 30.94 | 18,499 |
LSC | 33.39 | 32.11 | 17.77 | 16.73 | 34.50 | 85,820 |
The complete chloroplast genome of T. rupestris and T. rupestris var. ciliate contained 112 different genes out of which 6 are duplicated in the IRA and IRB region, for a total of 131 genes. The number of rRNA genes, tRNA genes, and protein-coding genes in the genome are 4, 29, and 79, respectively (Figure 2 and Table 2). Prediction of the T. rupestris and T. rupestris var. ciliate cp gene function was based on homology. Because these genes encode a variety of proteins, they are mainly involved in photosynthesis and other metabolic processes. Regarding photosynthesis, a subset of genes synthesize large Rubisco subunits and vesicle-like proteins. In addition, other genes encode subunits of a protein complex that mediates redox reactions to recycle electrons. Table 2 shows the gene functions and groups in the T. rupestris and T. rupestris var. ciliate cp genome.
Table 2
Genes present in the chloroplast genome of T. rupestris and T. rupestris var. ciliate
Category | Group of genes | Name of genes |
rRNA | rRNA genes | rrn16S (×2), rrn23S (×2), rrn4.5S (×2), rrn5S (×2) |
tRNA | tRNA genes | trnA-UGC (×2), trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-GCC, trnG-UCC, trnH-GUG, trnI-CAU (×2), trnI-GAU (×2), trnK-UUU, trnL-CAA (×2), trnL-UAA, trnL-UAG, trnM-CAU, trnN-GUU (×2), trnP-UGG, trnQ-UUG, trnR-ACG (×2), trnR-UCU, trnS-GCU (×2), trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC, trnV-UAC, trnW-CCA, trnY-GUA |
Genes for photosynthesis | Subunits of ATP synthase | atpA, atpB, atpE, atpF, atpH, atpI |
Subunits of photosystem II | psbA, psbB, psbC, psbD, psbE, psbF, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ, ycf3 |
Subunits of NADH-dehydrogenase | ndhA, ndhB (×2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK |
Subunits of cytochrome b/f complex | petA, petB, petD, petG, petL, petN |
Subunits of photosystem I | psaA, psaB, psaC, psaI, psaJ |
Subunit of rubisco | rbcL |
Self replication | Large subunit of ribosome | rpl14, rpl16, rpl2 (×2), rpl20, rpl22, rpl23 (×2), rpl32, rpl33, rpl36 |
DNA dependent RNA polymerase | rpoA, rpoB, rpoC1, rpoC2 |
Small subunit of ribosome | rps11, rps12 (×2), rps14, rps15, rps16, rps18, rps19, rps2, rps3, rps4, rps7 (×2), rps8 |
Subunit of Acetyl-CoA-carboxylase | accD |
c-type cytochrom synthesis gene | ccsA |
Envelop membrane protein | cemA |
Other genes | Protease | clpP |
Translational initiation factor | infA |
Maturase | matK |
Unknown | Conserved open reading frames | ycf1 (×2), ycf2 (×2), ycf4 |
The chloroplast genome of T. rupestris and T. rupestris var. ciliate are found to have intron in some of the genes. Out of the 131 different genes, 15 of them contain intron (Table 3), five tRNAs (trnK-UUU, trnG-UCC, trnL-UAA, trnI-GAU, and trnA-UGC) and ten protein-coding genes (rps16, rpoC1, ycf3, clpP, petB, petD, rpl16, rpl2, ndhB, ndhA). Four of the genes with intron viz.: rpl2, ndhB, trnA-UGC, and trnI-GAU are situated in the inverted repeat region, the 10 genes are in the large single-copy region (trnK-UUU, rps16, trnG-UCC, rpoC1, ycf3, trnL-UAA, clpP, petB, petD, and rpl16), and 1 gene (ndhA) is in the short single copy region. Ycf3 and clpP are the only genes with two introns, while the other 17 genes have one intron.
Table 3
Genes with intron in the T. rupestris and T. rupestris var. ciliate chloroplast genome and length of exons and introns
| Gene | Strand | Start | End | ExonI | IntronI | ExonII | IntronII | ExonIII |
T. rupestris | trnK-UUU | - | 1634 | 4310 | 37 | 2605 | 35 | / | / |
rps16 | - | 5444 | 6610 | 39 | 915 | 213 | / | / |
trnG-UCC | + | 9456 | 10258 | 32 | 711 | 60 | / | / |
rpoC1 | - | 20984 | 23750 | 430 | 718 | 1619 | / | / |
ycf3 | - | 43671 | 45654 | 129 | 721 | 228 | 753 | 153 |
trnL-UAA | + | 48468 | 49099 | 35 | 547 | 50 | / | / |
clpP | - | 71086 | 73219 | 71 | 852 | 291 | 694 | 226 |
petB | + | 76159 | 77590 | 6 | 784 | 642 | / | / |
petD | + | 77784 | 78983 | 9 | 717 | 474 | / | / |
rpl16 | - | 82669 | 84175 | 9 | 1093 | 405 | / | / |
rpl2 | - | 85922 | 87412 | 391 | 627 | 473 | / | / |
ndhB | - | 95935 | 98147 | 775 | 680 | 758 | / | / |
trnI-GAU | + | 103619 | 104642 | 32 | 952 | 40 | / | / |
trnA-UGC | + | 104707 | 105592 | 37 | 813 | 36 | / | / |
ndhA | - | 121204 | 123498 | 553 | 1203 | 539 | / | / |
trnA-UGC | - | 135824 | 136709 | 37 | 813 | 36 | / | / |
trnI-GAU | - | 136774 | 137797 | 32 | 952 | 40 | / | / |
ndhB | + | 143269 | 145481 | 775 | 680 | 758 | / | / |
rpl2 | + | 154004 | 155494 | 391 | 627 | 473 | / | / |
T. rupestris var. ciliata | trnK-UUU | - | 1628 | 4365 | 37 | 2666 | 35 | / | / |
rps16 | - | 5494 | 6647 | 39 | 902 | 213 | / | / |
trnG-UCC | + | 9465 | 10267 | 32 | 711 | 60 | / | / |
rpoC1 | - | 20991 | 23757 | 430 | 718 | 1619 | / | / |
ycf3 | - | 43656 | 45639 | 129 | 721 | 228 | 753 | 153 |
trnL-UAA | + | 48446 | 49077 | 35 | 547 | 50 | / | / |
clpP | - | 71077 | 73209 | 71 | 853 | 291 | 692 | 226 |
petB | + | 76141 | 77573 | 6 | 785 | 642 | / | / |
petD | + | 77768 | 78967 | 9 | 717 | 474 | / | / |
rpl16 | - | 82656 | 84137 | 9 | 1068 | 405 | / | / |
rpl2 | - | 85885 | 87375 | 391 | 627 | 473 | / | / |
ndhB | - | 95898 | 98110 | 775 | 680 | 758 | / | / |
trnI-GAU | + | 103592 | 104615 | 32 | 952 | 40 | / | / |
trnA-UGC | + | 104680 | 105565 | 37 | 813 | 36 | / | / |
ndhA | - | 121137 | 123436 | 553 | 1208 | 539 | / | / |
ycf1 | - | 125368 | 131164 | 1058 | 151 | 4588 | / | / |
trnA-UGC | - | 135901 | 136786 | 37 | 813 | 36 | / | / |
trnI-GAU | - | 136851 | 137874 | 32 | 952 | 40 | / | / |
ndhB | + | 143356 | 145568 | 775 | 680 | 758 | / | / |
rpl2 | + | 154091 | 155581 | 391 | 627 | 473 | / | / |
3.2 Codon usage, RNA editing sites, and repeat sequences
3.2.1 Codon usage
The codon usage frequency was calculated from the sequence of protein-coding genes, where the RSCU (relative frequency of occurrence of synonymous codon usage for a specific amino acid) values are shown in Figure 3 and Table S1. The T. rupestris and T. rupestris var. ciliate plastomes showed very similar frequencies of codon usage despite morphological and evolutionary divergence among them. We found that all possible codons of amino acids are used in their plasmids as specified in Table S1. The protein-coding genes present a total of 26,628 codons in T. rupestris to 26,632 in T. rupestris var. ciliate plastome (Figure 3, Table S1). Leucine (10.55 in T. rupestris, 9.98 in T. rupestris var. ciliate), Serine (7.65 in, 9.26 in T. rupestris var. ciliate), and Arginine (5.98 in T. rupestris, 6.41 in T. rupestris var. ciliate) are the most abundant in T. rupestris and T. rupestris var. ciliate. Methionine (2.34 in T. rupestris, 1.73 in T. rupestris var. ciliate) and Tryptophan (1.71 in T. rupestris, 1.92 in T. rupestris var. ciliate) are the least abundant in T. rupestris and T. rupestris var. ciliate. Moreover, we found that the distribution of codon types was consistent in T. rupestris and T. rupestris var. ciliate, which are also consistent with the patterns detected in Rubus [36] and other angiosperms [37] and algal lineages [38].
3.2.2 RNA editing analysis
After transcription of chloroplast mRNA molecules, RNA editing, a site-specific C to U conversion process, usually regulates gene expression and translation in the chloroplast [39]. The types and amounts of RNA editing are the same in T. rupestris and T. rupestris var. ciliate. In 27 protein-coding genes, the total number of 124 possible RNA editing sites were predicted among T. rupestris and T. rupestris var. ciliate plastomes (Table S2). These genes include photosynthesis-related genes (atpA, atpB, atpF, atpI, ndHA, ndhB, ndhD, ndhF, ndhG, petB, petD, petG, and psbE), self-replication genes (rpl2, rpl23, rpoA, rpoB, rpoC1, rpoC2, and rps14), and others (matK and ycf3). The highest number of potential editing sites were found in ndhB gene (14 sites), followed by the psbB gene (10 sites). Like mt-genome, no correlation was observed between the length of the gene and the predicted RNA-editing sites in the protein-coding genes (Table S2).
3.2.3 Long-Repeat and SSR Analysis
Using the default settings of the REPuter program to screen for repetitive sequences in the T. rupestris and T. rupestris var. ciliate chloroplast genomes, the program showed that only three types of repeats were present in the genome, viz. palindromic, forward and reverse, and no complement repeats were detected in the cp genome (Table S3). Table S3 demonstrated that 17 palindromic repeats, 28 forward repeat, and 4 reverse repeats were shown in the T. rupestris, and 17 palindromic repeats, 29 forward repeats, and 3 reverse repeats were exhibited in the T. rupestris var. ciliate. Most of the size of the repeats are between 20 and 29 bp (77.55%), followed by 30–39 bp (12.24%) whereas 40-49 bp (8.16%) and 50-59 bp (2.04%) are the least. In all, there are 49 number repeats in T. rupestris cp genome. In the T. rupestris var. ciliate, 49 repeats were shown in the T. rupestris var. ciliate, which are between 20 and 29 bp (77.55%), followed by 30-39 bp (12.24%), and 40-49 bp (8.16%) and 50-59 bp (2.04%) are the least.
There are 61 and 65 simple sequence repeats in T. rupestris and T. rupestris var. ciliate, respectively. Single nucleotide simple sequence repeats (SSRs) are the most abundant. Among all SSR types, A and T were the most commonly used bases, and A and T of T. rupestris are 28 and 31. There are 30 and 31 of A and T in the T. rupestris var. ciliate (Table S4). Our results showed the intraspecific variation in repeat number, repeats distribution, and repeat motifs, the highly similar morphological characteristics of T. rupestris and T. rupestris var. ciliate presented minor SSRs changes.
3.3 Phylogenetic analysis and time estimation
The cp genomes of the 29 species were applied to to reconstruct the phylogenetic tree determine the phylogenetic relationship and tribal positions of the nine species of Taihangia. Using 95 protein-coding genes and the complete plastome sequences, we performed phylogenetic analyses of the 27 Rosoideae Focke plastomes (Figure 4A). G. macrophyllum, G. rupestre, T. rupestris, and T. rupestris var. ciliate were clustered in one clade with strong support, and were divided into two major subclades. Sub clade 1 which is monophyletic includes G. macrophyllum and clade 2 containing G. rupestre, T. rupestris, and T. rupestris var. ciliate. Clade 2 has two major subclades, One branch is G. rupestre and the other is T. rupestris, and T. rupestris var. ciliate. Figure 4A indicated that G. rupestre is closely related to T. rupestris, and T. rupestris var. ciliate. G. macrophyllum, G. rupestre, T. rupestris, and T. rupestris var. ciliate belong to the Trib. Colurieae, the result is the same as it. The result indicated that four of them are closely related.
Divergence time was estimated for each internal node of the phylogenetic tree (Figure 4B). Taihangia genus was inferred to have originated at 0.2057 Mya, G. rupestre was inferred to have originated at 1.4431 Mya, and G. macrophyllum was inferred to have originated at 9.8532 Mya. The detected divergence time of T. rupestris, and T. rupestris var. ciliate may contribute to future studies on genus Taihangia.
3.4 Comparative analysis and sequence divergence analysis
3.4.1 Sliding window analysis
Sliding window analysis using the DnaSP program reveals highly variable regions in the cp genomes of two Taihangia. The sliding windows analysis (Figure 5) highlights two plastome regions as hotspots of nucleotide divergence among T. rupestris and T. rupestris var. ciliate. These hotspots correspond to three intergenic regions (petA-psbJ, psbJ-psbL, and trnRUCU-atpA) and four genes (psbA and ndhF).
3.4.2 IR expansion and contraction
IR expansions and contractions are common in cp genomes, which results in the change in cp genome size [40]. The differences in IRs may also reflect phylogenetic history. Here, we selected four species of Trib. Colurieae and compared their sizes and the junctions of their LSC, SSC, and IR regions. Although the lengths of the IR regions, ranging from 25,579 bp to 26,152 bp, varied little among the four species, some differences in the IR expansions and contractions were observed.
As shown in Figure 6, the rps19 is mostly located in the LSC region and the LSC-IRB boundary at bases 1-8 bp. The gene ycf1 was found to have 1107 bp, 1098 bp, 1107 bp, and 1188 in the IRB region in T. rupestris, T. rupestris var. ciliate, G. rupestre, and G. macrophyllum respectively. Whereas, the gene ycf1 was found to have 4,523 bp, 4,532 bp, 4,523 bp, and 4,424 in the SSC region in T. rupestris, T. rupestris var. ciliate, G. rupestre, and G. macrophyllum. Furthermore, the ycf1 gene extended into the SSC region excepted the G. rupestre genome, and the longest overlap between the SSC region and ycf1 genes was also observed in T. rupestris var. ciliate. The trnH is mostly located in the LSC region and the IRB-LSC boundary at bases 4-65 bp. In summary, the structure of the cp genomes is conservative among T. rupestris and T. rupestris var. ciliate.
3.3.3 Genome comparison
We analyzed the sequence differences of T. rupestris, T. rupestris var. ciliate, G. macrophyllum, and G. rupestre cp genomes using mVISTA (Figure 7), the G. rupestre cp genome sequence was set as the reference cp genome. Sequence comparison of 4 whole plastomes generated multiply aligned sequences of 155,558 bp in length. The analysis shows overall sequence identity and divergent regions in Taihangia and Geum. T. rupestris and T. rupestris var. ciliate cp genome sequences showed very high sequence similarities. A high degree of synteny and gene order conservation indicates evolutionary conservation at the plastome level (Figure 7). Notably, the LSC and SSC regions have greater divergence than the IRs, while the non-coding regions show higher sequence divergence than the coding regions, while the exons, introns, and ncRNA generally had little variation between genomes, which is similar to the results of previous studies [41].
3.3.4 Kimura’s two-parameter (K2P) analysis
To discover the hypervariable regions among T. rupestris, T. rupestris var. ciliate, G. rupestre, and G. macrophyllum. 95 intergenic regions were retracted from the chloroplast genomes of 4 species, and the genetic distance of the intergenic regions were calculated by the K2p (Kimura 2-parameter) model. A total of 29 intergenic regions has K2p values ranging from 2.35 to 25.745. Among them, rps16-trnQ-UUG, trnH-GUG-psbA, trnF-GAA-ndhJ have higher K2p values, which are 5.745, 4.752, and 4.607, respectively. It can be seen that these regions vary greatly among the chloroplast genomes of the four species, and can be used as potential molecular marker development regions (Figure 8).
3.5 Selective pressures analysis
Nonsynonymous (amino acid-replacing, Ka) and synonymous (Ks) substitutions and their ratio (Ka/Ks) are applied to reveal the intensity of natural selection on DNA sequence evolution [42, 43]. Ka/Ks value > 1 indicates positive selection, Ka/ Ks < 1 indicates purification or negative selection, and Ka/ Ks value 1 indicates neutral selection. The Ka/Ks ratio was calculated and compared for 96 protein-coding genes in T. rupestris, T. rupestris var. ciliate, G. rupestre, and G. macrophyllum chloroplast genomes to investigate genome evolution (Table 4). The Ka/Ks of Taihangia. T. rupestris, T. rupestris var. ciliate, G. macrophyllum and G. rupestre are 1.24455, 0.814338, 0.652981, and 0.732876, respectively. This result indicated that T. rupestris was subjected to positive pressure selection, T. rupestris var. ciliate, G. macrophyllum, and G. rupestre were subjected to negative selection.