General features of the T. ramosissima chloroplast genome
The complete cp genome of T. ramosissima was 156,150 bp in length, displayed a typical quadripartite structure, in which a small-copy region (SSC, 18247 bp) and a large single-copy region (LSC, 84795 bp) were separated by two identical inverted repeats (IR, 26554 bp) (Fig. 1). After comparing the size and structure of cp genomes from Tamaricaceae species, we found that the lengths of the five plastomes varied from 154533 bp to 156167 bp; T. chinensis had the largest, while R. trigyna had the smallest (Table 1). The overall GC content of the T. ramosissima plastome was 36.5%, which was similar to those of the other four Tamaricaceae species. As shown in Table 1, the T. ramosissima cp genome encoded 130 genes, including 85 protein-coding genes, 37 tRNA genes, and 8 rRNA genes. The sequence of the newly assembled T. ramosissima plastome has been submitted to GenBank, and deposited under the accession number MN726883.
Table 1
Comparison of general features of five Tamaricaceae plastomes
Species | Total | LSC | IR | SSC | Total | Protein coding genes | tRNA | rRNA | GC% |
Tamarix ramosissima | 156150 | 84795 | 53108 | 18247 | 130 | 85 | 37 | 8 | 36.5% |
Tamarix chinensis | 156167 | 84768 | 53152 | 18247 | 130 | 85 | 37 | 8 | 36.5% |
Hololachna songarica | 155596 | 85903 | 52138 | 17555 | 130 | 85 | 37 | 8 | 36.8% |
Reaumuria trigyna | 154533 | 84811 | 52116 | 17607 | 130 | 85 | 37 | 8 | 37.0% |
Myricaria paniculata | 154651 | 84379 | 49588 | 20684 | 130 | 85 | 37 | 8 | 36.3% |
All the genes annotated in the T. ramosissima cp genome are listed in Table 2. Of the 130 genes annotated, a total of 16 genes contained introns. Among these intron-containing genes, 14 genes contained one intron, including 8 tRNA genes (trna-UUU, trna-CGA, trna-UUC, trna-UAA, trna-ACA, trna-UGC) and 6 protein-coding genes (ndhA, ndhB, atpF, rpoC1, rpl2, rps12). Two genes contained two introns (clpP, ycf3). rps12 was the only trans-spliced gene in the T. ramosissima plastome.
Table 2
List of genes annotated in the chloroplast genome of T. ramosissima.
Function | Gene Names | Number |
Photosystem I | psaA; psaB; psaC; psaI; psaJ | 5 |
Photosystem II | psbA; psbB; psbC; psbD; psbE; psbF; psbH psbI; psbJ; psbK; psbL; psbM; psbN; psbT; psbZ | 15 |
Cytochrome b/f complex | petA;petB;petD;petG;petL;petN | 6 |
ATP synthase | atpA;atpB;atpE;atpF*;atpH;atpI | 6 |
NADH dehydrogenase | ndhA*; ndhB*(× 2); ndhC; ndhD;ndhE; ndhF ndhG; ndhH; ndhI; ndhJ; ndhK | 12 |
Rubisco Large subunit | rbcL | 1 |
Ribosomal RNAs | rrn4.5(× 2); rrn5(× 2); rrn16(× 2); rrn23(× 2) | 8 |
Transfer RNAs | trna-GUG; trna-UUU*;trna-UUG; trna-GCU; trna-CGA*;trna-UCU; trna-GCA; trna-GUC; trna-GUA; trna-UUC*(× 3); trna-GGU; trna-UGA; trna-GCC; trna-CAU(× 4); trna-GGA; trna-UGU; trna-UAA*;trna-GAA; trna-ACA*;trna-CCA; trna-UGG; trna-CAA(× 2); trna-GAC(× 2); trna-UGC*(× 2); trna-ACG(× 2); trna-GUU(× 2); trna-UAG | 37 |
DNA dependent RNA polymerase | rpoA; rpoB; rpoC1*; rpoC2 | 4 |
Small subunit of ribosome | rps2; rps3; rps4; rps7(× 2); rps8; rps11; rps14; rps12*T (× 2); rps16; rps15; rps18; rps19 | 14 |
Large subunit of ribosome | rpl2(× 2)*; rpl14; rpl16; rpl20; rpl22; rpl23 (× 2); rpl32; rpl33; rpl36 | 11 |
Proteins of unknown function | ycf1, ycf2 (× 2), ycf3**, ycf4 | 5 |
Other genes | accD; ccsA; cemA; clpP**; matK; infA | 6 |
* indicates genes containing one intron; ** indicates genes containing two introns;T indicates trans-spliced Genes; ×2 indicates genes have two copies |
Codon usage
Codon usage of protein coding sequences in the T. ramosissima cp genome was analyzed with DAMBE software. Overall, 64 codons, corresponding to the 20 amino acids, were found presence in the T. ramosissima plastome. A total of 24724 codons were identified for all the protein coding sequences (including the stop codons). Leucine (2651; 10.72%) was the most abundant amino acid, whereas cysteine (283; 1.14%) was the least abundant. The relative synonymous codon usage (RSCU) value, which was positively correlated with the quantity of codons, was calculated across the five Tamaricaceae species. As illustrated in Table 3, 30 codons exhibited high preferences (RSCU > 1) in all the Tamaricaceae plants, while 32 codons exhibited low preferences (RSCU < 1). The codon usage of methionine and tryptophan was unbiased (RSCU = 1).
Table 3
Codon content of 20 amino acid and stop codons in all protein-coding genes of the five Tamaricaceae cp genomes.
Amino acid | Codon | T. ramosissima | T. chinensis | R. trigyna | H. songarica | M. paniculata |
RSCUa |
Stopb | UGA | 0.622 | 0.679 | 0.714 | 0.786 | 0.532 |
Stopb | UAG | 0.732 | 0.679 | 0.714 | 0.679 | 0.646 |
Stopb | UAA | 1.646 | 1.643 | 1.571 | 1.536 | 1.823 |
A | GCU | 1.792 | 1.799 | 1.765 | 1.766 | 1.810 |
A | GCG | 0.348 | 0.335 | 0.344 | 0.350 | 0.348 |
A | GCC | 0.643 | 0.637 | 0.635 | 0.621 | 0.609 |
A | GCA | 1.217 | 1.230 | 1.256 | 1.263 | 1.234 |
C | UGU | 1.534 | 1.547 | 1.572 | 1.553 | 1.598 |
C | UGC | 0.466 | 0.453 | 0.428 | 0.447 | 0.402 |
D | GAU | 1.577 | 1.578 | 1.577 | 1.571 | 1.566 |
D | GAC | 0.423 | 0.422 | 0.423 | 0.429 | 0.434 |
E | GAG | 0.460 | 0.447 | 0.473 | 0.471 | 0.457 |
E | GAA | 1.540 | 1.553 | 1.527 | 1.529 | 1.543 |
F | UUU | 1.329 | 1.346 | 1.305 | 1.305 | 1.377 |
F | UUC | 0.671 | 0.654 | 0.695 | 0.695 | 0.623 |
G | GGU | 1.346 | 1.342 | 1.285 | 1.302 | 1.371 |
G | GGG | 0.619 | 0.608 | 0.633 | 0.633 | 0.574 |
G | GGC | 0.372 | 0.374 | 0.399 | 0.382 | 0.374 |
G | GGA | 1.663 | 1.676 | 1.684 | 1.682 | 1.681 |
H | CAC | 0.470 | 0.479 | 0.430 | 0.432 | 0.458 |
H | CAU | 1.530 | 1.521 | 1.570 | 1.568 | 1.542 |
I | AUU | 1.507 | 1.501 | 1.487 | 1.491 | 1.555 |
I | AUA | 0.915 | 0.926 | 0.939 | 0.942 | 0.919 |
I | AUC | 0.578 | 0.574 | 0.573 | 0.568 | 0.526 |
K | AAA | 1.494 | 1.518 | 1.489 | 1.482 | 1.543 |
K | AAG | 0.506 | 0.482 | 0.511 | 0.518 | 0.457 |
L | CUA | 1.109 | 1.090 | 1.121 | 1.134 | 1.179 |
L | CUC | 0.600 | 0.597 | 0.600 | 0.606 | 0.529 |
L | CUG | 0.515 | 0.518 | 0.521 | 0.498 | 0.479 |
L | CUU | 1.775 | 1.796 | 1.758 | 1.761 | 1.814 |
L | UUA | 1.229 | 1.244 | 1.194 | 1.201 | 1.260 |
L | UUG | 0.771 | 0.756 | 0.806 | 0.799 | 0.740 |
M | AUG | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
N | AAC | 0.459 | 0.440 | 0.450 | 0.450 | 0.433 |
N | AAU | 1.541 | 1.560 | 1.550 | 1.550 | 1.567 |
P | CCA | 1.143 | 1.168 | 1.157 | 1.165 | 1.142 |
P | CCC | 0.666 | 0.656 | 0.702 | 0.703 | 0.697 |
P | CCU | 1.604 | 1.608 | 1.566 | 1.564 | 1.661 |
P | CCG | 0.587 | 0.569 | 0.575 | 0.568 | 0.500 |
Q | CAA | 1.553 | 1.567 | 1.569 | 1.563 | 1.560 |
Q | CAG | 0.447 | 0.433 | 0.431 | 0.437 | 0.440 |
R | AGA | 1.471 | 1.472 | 1.446 | 1.445 | 1.439 |
R | AGG | 0.529 | 0.528 | 0.554 | 0.555 | 0.561 |
R | CGA | 1.555 | 1.593 | 1.554 | 1.573 | 1.531 |
R | CGC | 0.365 | 0.354 | 0.407 | 0.379 | 0.370 |
R | CGG | 0.490 | 0.486 | 0.545 | 0.543 | 0.469 |
R | CGU | 1.590 | 1.568 | 1.494 | 1.504 | 1.630 |
S | AGC | 0.418 | 0.418 | 0.455 | 0.447 | 0.433 |
S | AGU | 1.582 | 1.582 | 1.545 | 1.553 | 1.567 |
S | UCA | 1.151 | 1.148 | 1.121 | 1.125 | 1.108 |
S | UCC | 0.768 | 0.772 | 0.818 | 0.814 | 0.793 |
S | UCG | 0.452 | 0.457 | 0.466 | 0.464 | 0.450 |
S | UCU | 1.628 | 1.623 | 1.595 | 1.597 | 1.648 |
T | ACC | 0.667 | 0.671 | 0.677 | 0.673 | 0.670 |
T | ACA | 1.240 | 1.242 | 1.250 | 1.253 | 1.210 |
T | ACG | 0.427 | 0.416 | 1.410 | 0.409 | 0.385 |
T | ACU | 1.666 | 1.670 | 1.663 | 1.665 | 1.735 |
V | GUU | 1.510 | 1.508 | 1.476 | 1.489 | 1.544 |
V | GUG | 0.522 | 0.507 | 0.532 | 0.517 | 0.488 |
V | GUC | 0.439 | 0.441 | 0.460 | 0.444 | 0.434 |
V | GUA | 1.529 | 1.544 | 1.533 | 1.550 | 1.534 |
W | UGG | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Y | UAC | 0.376 | 0.376 | 0.388 | 0.382 | 0.361 |
Y | UAU | 1.624 | 1.624 | 1.612 | 1.618 | 1.639 |
aRelative synonymous codon usage; bstop codon |
Interspecific variation among Tamaricaceae cp genomes
The newly sequenced T. ramosissima cp genome was compared with those of the other four Tamaricaceae species using the mVISTA program (Fig. 2). The comparison revealed the high nucleotide conservation between the cp genome of T. ramosissima and T. chinensis. Furthermore, coding regions were found to be more conserved than non-coding regions, while the SSC and LSC regions were more divergent than the IRs regions.
To further reveal the divergence hotspots in the five Tamaricaceae chloroplast genomes, the nucleotide diversity values (Pi) were calculated using DnaSP. The Pi values for the five Tamaricaceae plastomes ranged from 0 to 0.195, and the average value was 0.02769. As illustrated in Fig. 3, the LSC region and the SSC region showed higher nucleotide diversity than the two IR regions. Six regions with high Pi values were identified as divergence hotspots (Fig. 3). The rpl32-tRNA-UAG region, with a Pi value of 0.195, was the most divergent part detected. Four intergenic regions (tRNA-GCC-tRNA-CAU, psbK-psbI, tRNA-GAA-ndhJ, rps15-ycf1) and one gene region (rpl16) had high Pi values and were also identified as divergence hotspots. The divergence hotspots identified could be developed as potential markers for species delimitation of the Tamaricaceae species.
SSR and repeat structure analysis
The total number of SSRs identified in the five Tamaricaceae cp genomes ranged from 59 to 67 (Fig. 4). Among these SSRs, mononucleotide repeats were the dominant type, and A/T repeats accounted for nearly 60% of all SSRs identified. Di-nucleotide repeats were the second most abundant motif types identified, constituting 13.3–20.3 percent of the total SSRs. Most of the di-nucleotide repeats were also AT-rich. Tri-, tetra-, and penta-nucleotide repeats comprised a relatively small part of the SSRs detected (Fig. 4).
Long repeats in the five cp genomes were also analyzed with the REPuter software. As shown in Fig. 5, T. ramosissima had the smallest number of repeats in its plastome, consisting of 12 forward, 14 palindromic, and 6 reverse repeats (32 in total). More repetitive elements were identified in the chloroplast genomes of the other four Tamaricaceae plants (49 in each), but the types and sizes of the repetitive sequences varied in different species. The majority of the repeats identified were less than 29 bp. Repeats with the length > 45 bp were only detected in the plastomes of H. songarica, R. trigyna, and M. paniculate.
Phylogenetic relationship
A plastome-based phylogenomic tree was constructed with MEGA X to analyze the phylogenetic relationship (Fig. 6). Among the five Tamaricaceae species, two genus Tamarix plants, T. ramosissima and T. chinensis, were clustered together. The two genus Reaumuria species, R. trigyna, and H. songarica, were also monophyletic. M. paniculate was inferred to have a closer relationship with Tamarix species, according to the phylogeny. The topological structure of the ML tree was consistent with the constructed NJ tree.