Phylogenomic Analysis and Dynamic Evolution of Chloroplast Genomes of Clematis nannophylla

doi:10.21203/rs.3.rs-2943201/v1

Download PDF

Research Article

Phylogenomic Analysis and Dynamic Evolution of Chloroplast Genomes of Clematis nannophylla

https://doi.org/10.21203/rs.3.rs-2943201/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

Clematis nannophylla is a small perennial shrub of Clematis with a high ecological, ornamental, and medicinal value and is distributed in the arid and semi-arid areas of northwest China. In this study, we successfully determined the complete chloroplast genome of C. nannophylla and reconstructed a phylogenetic tree of Clematis.

Results

The chloroplast genome of C. nannophylla was 159801 bp in length, including a large single-copy (LSC,79526bp), a small single-copy (SSC,18185bp), and a pair of reverse repeats(IRa and IRb,31045bp). The C. nannophylla cp genome contained 133 unique genes, including 89 protein-coding genes, 36 tRNA genes, and eight rRNA genes. In addition, 61 codons and 66 simple repeat sequences (SSR) were identified, of which 50 dispersed repeats (including 22 forward, 21 palindromic and 7 reverse) and 24 tandem repeats were found in C. nannophylla. Many of the dispersed and tandem repeats were between 20–30 bp and 10–20 bp, respectively. The chloroplast genome of C. nannophylla was relatively conserved, especially in the IR region, where no inversion or rearrangement was observed. The six regions with the largest variations were trnF-ndhJ, ndhE-ndhG, ndhF-rpl32, ccsA-ndhD, ccsA, and ndhD (Pi > 0.008), which were distributed in the LSCS and SSCs. A comparison of gene selection pressures indicated that purification was the main mode of selection for maintaining important biological functions in the chloroplast genome of C. nannophylla. However, to adapt to the living environment, ycf1 was positively selected (C. nannophylla and C. florida). Phylogenetic analysis showed that C. nannophylla was more closely related to C. fruticosa and C. songorica.

Conclusions

Our analysis of the C. nannophylla cp genome provides reference data for molecular marker development, phylogenetic analysis, population studies, and chloroplast genome processes, as well as for better exploitation and utilisation of C. nannophylla.

C. nannophylla

Chloroplast genome

Evolution

SSR

Comparative analysis

Phylogenetic analysis

Clematis are herbaceous or woody vines of the family Ranunculaceae, with several erect shrubs or perennial herbs [1]. Clematis is a large genus belonging to the Ranunculaceae family with high ornamental and medicinal value [2–4]. Clematis is widely distributed worldwide with approximately 300 species, and China, the modern distribution centre of Clematis, has rich natural populations [4], with more than 100 species [1,5]. C. nannophylla is mainly distributed on arid and semi-arid mountain slopes in northwest China [6] and has good stress tolerance. In addition, they possess important pharmaceutical, economic, and ecological properties.

However, the complexity and high morphological diversity of Clematis make it difficult to classify this genus systematically [4]. Currently, most studies have mainly focused on morphology, physiology, ecology, and pharmacological activity [7–9], whereas there are few basic molecular studies on germplasm resource identification, genetic breeding, resource conservation, and phylogeny. Furthermore, the chloroplast genome data of Clematis previously tested were submitted directly without detailed analysis, thus limiting our overall understanding of their phylogeny and genome evolution. However, studies on the endemic plant C. nannophylla are even fewer in China’s northwestern arid and semi-arid areas, limiting the protection and development of this plant species. Therefore, it is imperative to better understand the taxonomic status and predict the future populations of C. nannophylla to guide more efficient germplasm resource utilisation, conservation, and breeding strategies.

As the organelle for angiosperm photosynthesis, the chloroplasts can provide energy for plant metabolism. Chloroplasts are semi-autonomous genetic organelles containing a unique genome and gene expression system [10]. In most angiosperms, the cp genome is inherited maternally; however, in a few cases, it is inherited in a paternal or biparental mode [11]. Compared with the mitochondrial genome, chloroplast genes have a more stable genome structure and a higher evolutionary rate. Angiosperm cp genomes are usually conserved in terms of structure and sequence [12]. The cp genome has a distinctive quadripartite structure consisting of a large single-copy (LSC) region, and small single-copy (SSC) regions are separated by a pair of long inverted repeat (IRa and IRb) regions [10,13]. Although the cp genome is conserved; however, a lot of recent studies have identified many genetic mutations in the cp genome, such as loss of gene or intron fragments, insertion and deletion of bases, changes in the length of reverse repeat regions, or insertion/deletion of partial fragments, expansion or deletion of entire reverse repeat regions and gene rearrangement [14,15,16,17], These may lead to variations in plant structure and adaptation, and contribute to plant species identification and future selective breeding [18].

In addition, plant cp genomes contain a large amount of molecular information, which is a good resource for plant systematics, population genomics, and phylogenetic studies [19]; for example, they can be used for DNA barcoding, transplant studies, and evolutionary studies at the population level, as well as useful genetic markers for phylogenetic relationships [20–22]. However, the cp genome of C. nannophylla has not been determined, and a comprehensive analysis of the cp genome structure of the Clematis genus persists.

Therefore, to establish taxonomic boundaries and phylogenetic relationships between C. nannophylla and other groups, we determined the cp genome characteristics of C. nannophylla. This study aimed to (1) obtain the complete sequence of the cp genome of C. nannophylla and (2) analyse the phylogenetic positions of the 78 coding genes in C. nannophylla. (3) The coding and non-coding regions of the cp genome were compared between C. nannophylla and three other Clematis species, and the effective regions of the cp genome of C. nannophylla were determined. (4) Phylogenetic studies of the Clematis genus based on the complete cp genome and protein sequence clarified the phylogenetic relationship and evolution of C. nannophylla.

Features of the C. nannophylla Cp Genome

In total, 23,142,846 paired-end reads were obtained from the Illumina NovaSeq platform, with Q20 and Q30 values of 95.0% and 88.3%, respectively. The complete cp genome sequence of C. nannophylla was assembled de novo and was uploaded to the NCBI for Biotechnology Information database (GenBank accession number OQ581857). The circular cp genome of C. nannophylla was 158,091 bp in size (Fig. 1) and comprised a large single-copy (LSC,79526bp) region, two inverted repeat (IR,31045bp) regions, and a small single-copy (SSC,18185bp) region. The highest GC content was observed in the IR region (42.1%), whereas the GC content in the SSC region was the lowest (31.3%); the average GC content of the whole genome was 38%.

There were 133 predicted functional genes in the C. nannophylla cp genome, including 89 protein-coding, 36 tRNA, and eight rRNA genes (Table 1 and Table 2). Protein-coding, tRNA, and rRNA genes accounted for 66.92%, 27.07%, and 6.02% of all annotated genes, respectively. Most genes and protein-coding genes were located in the LSC region, and only 9.02% were in the SSC region.

Table 1

Characteristics of C. nannophylla cp genome.
Category	Item	Describe
Chloroplast genome structure	Cp gene/bp	159801
	LSC/bp	79526
	SSC/bp	18185
	IRA/IRB/bp	31045
	CDS/bp	80652
Gene composition	Cp gene	133
	CDS	89
	tRNA	36
	rRNA	8
GC content	Cp gene	38
	LSC	36.3
	SSC	31.3
	IRA/IRB	42.1
	CDS	38.4

Subsequently, we annotated all assembled genes and annotated functions. These genes belong to four types: photosynthesis-related, self-replication-related, genes whose function is unknown, maturase(matK), protease(clpP), and other genes. A total of 22 annotated genes were double-copy genes, including 11 protein-coding genes, seven tRNAs, and four rRNAs. Sixteen genes (atpF, ndhA, ndhB, petB, petD, rpl16, rpl2, rpoC1, rps12, rps16, trnA-UGC, trnG-UCC, trnI-GAU, trnK-UUU, trnL-UAA and trnV-UAC) had one intron each, whereas the protein-coding genes ycf3 and clpP had two introns each (Table 2). The longest intron (2554 bp) was found in turnK-UUU, which completely encompassed matK, and the smallest intron (492 bp) was found in trnL-UAA.

Table 2

Genes in cp genome of C. nannophylla
Category	Gene Group	Gene Name
photosynthesis	Subunits of photosystem I	psaA, psaB, psaC, psaI, psaJ
	Subunits of photosystem II	psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI,psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
	Subunits of NADH dehydrogenase	ndhA , ndhB (2), ndhC, ndhD, ndhE, ndhF,ndhG, ndhH, ndhI, ndhJ, ndhK
	Subunits of cytochrome b/f complex	petA, petB , petD , petG, petL, petN
	Subunits of ATP synthase	atpA, atpB, atpE, atpF *, atpH, atpI
	Large subunit of rubisco	rbcL
	Subunits photochlorophyllide reductase	-
self-replication	Proteins of large ribosomal subunit	# rpl22, rpl14(2), rpl16(2) , rpl2 (2), rpl20, rpl23 (2), rpl32, rpl33, rpl36
	rpl32, rpl33, rpl36	# rps16, rps11, rps12 (2), rps14, rps15, rps18, rps19(2), rps2, rps3(2), rps4, rps7 (2), rps8(2)
	Subunits of RNA polymerase	rpoA, rpoB, rpoC1 *, rpoC2
	Ribosomal RNAs	rrn16 (2), rrn23 (2), rrn4.5 (2), rrn5 (2)
	Transfer RNAs	trnA-UGC * (2), trnC-GCA, trnD-GUC,trnE-UUC, trnF-GAA, trnG-GCC, trnG-UCC ,trnH-GUG, trnI-CAU (2), trnI-GAU (2),trnK-UUU , trnL-CAA (2), trnL-UAA ,trnL-UAG, trnM-CAU, trnN-GUU (2), trnP-UGG, trnQ-UUG, trnR-ACG (2), trnR-UCU, trnS-GCU,trnS-GGA, trnS-UGA, trnT-GGU, trnV-GAC (2), trnV-UAC *, trnW-CCA,trnY-GUA, trnfM-CAU
other genes	Maturase	matK
	Protease	clpP **
	Envelope membrane protein	cemA
	Acetyl-CoA carboxylase	accD
	c-type cytochrome synthesis gene	ccsA
	Translation initiation factor	# infA
	other	-
Genes of unknown function	Conserved hypothetical chloroplast ORF	# ycf1, ycf2 (2), ycf3 **, ycf4
Note: # Gene, Pseudo gene; Gene (2), Multiple copy gene, the number of copies in parenthesis; Gene , Gene with one intron; Gene *, Genes containing two introns.

PR2 plot mapping analysis was performed using protein-coding gene sequences of C. nannophylla (Fig. 2), which was constructed to show the relationship between the values A3/(A3 + T3) and G3/(G3 + C3), and the data were distributed into four quadrants in a scatter diagram. Most genes were located in the second quadrant, the ribosomal protein SSU genes were located in the first quadrant (G > C, A > T), and the photosystem II genes were located in the third quadrant (C > G, T > A).

A3, T3, C3, and G3 represent nucleotide A, T, C, and G content at the third position of synonymous codons.

Codon usage bias

As each amino acid corresponds to at least one or up to six codons, codon use varies widely among organisms and species [23], and this difference in synonymous codon usage is referred to as codon preference. Natural selection, species mutations, and genetic drift may cause codon use biases. We selected a codon bias unique to the CDS genome, and the results showed that 26,795 amino acids were detected in the cp genome of C. nannophylla (Fig. 3), of which leucine was the most abundant, with 2744 codons (10.2%), followed by isoleucine with 2350 codons (8.8%), serine and glycine with 2070 and 1851 codons (7.7% and 6.9%, respectively), and cysteine was the least abundant. with 214 codons (1.2%) and 30 (49.18%) preferred codons (RSCU > 1). Methionine and tryptophan had RSCU values equal to 1, but the most preferred codon was TTA, encoding leucine (Leu), with an RSCU value of 1.806.

Detection of Cp genome Repeat Sequences and SSRs

The abscissa represents SSR repetition units, and the ordinate represents the number of SSRs of each type.To learn the repeat sequence of the Clematis cp genome, the four categories of repeat sequence were detected and analysed. there were not found the complement repeat in Clematis(Fig. 4), and the number of repeats was highest in C. songorica (75) and lowest in C. florida (71). The number of discrete replicates of C. nannophylla was 74, second only to C. songorica. Forward, palindromic, and tandem repeats were the most common repeat sequences. A total of 50 dispersed repeats were found in the C. nannophylla cp genome, including 22 forward, 21 palindromic, and seven reverse repeats, which were more than 20 bp in C. nannophylla, which is different from other Clematis species. Most dispersed and tandem repeats were between 20–30 bp and 10–20 bp, respectively.

We detected 66 SSRs identified in the C. nannophylla cp genome using MISA Perl script(Fig. 5); the SRRs were mainly distributed in the LSC region (45, 68.18%), followed by the IR regions (15, 22.73%) and the IR regions. Additionally, 49 SRRs were located in intergenic spaces, and 17 SRRs were located in a gene such as matK, psbC, rpoB, rpoC2, clpP, petB, rps3, ndhA, trnV-UAC, rpl16, and ycf1. The SRRs consisted of 39 mononucleotides, eight dinucleotides, three trinucleotides, nine tetranucleotides, one hexanucleotide, and six complex nucleotide repeats. Moreover, oligo A and T repetitions accounted for 21.21% and 36.36% of the total SSRs, respectively, whereas oligo C and G were uncommon, and only one mononucleotide(G10) was detected in C. nannophylla.

Comparison of Complete Cp Genomes

The cp genome sequences of C. nannophylla were analysed using the BLAST program on the NCBI for Biotechnology Information website (http://www.ncbi.nlm.nih.gov/blast). The Clematis fruticosa plant, most similar to C. nannophylla, was selected(Fig. 6). Therefore, the complete cp genomes of the five Clematis species were represented using the mVISTA program, with C. fruticosa as the reference.

The results showed that the cp genome of Clematis was highly conserved and that the LSC and SSC regions were more divergent than the IR regions. Furthermore, the coding regions were more conserved than the non-coding regions in our alignment, and the differences between C. nannophylla and C. fruticosa were not significant. There was only one evident difference between trnE-UUC-trnT-GGU. However, there were many divergent regions in C. florida. These divergent regions mainly included psbA-atpA, atpI-rpoC2, rpoB-psbD, psbE-petG, clpP, and rpoC2, most of which were found in intergenic regions. The most divergent regions for the coding regions were clpP and rpoC2, known as hotspot regions because they contain variations such as single-nucleotide polymorphisms and indels, which can be used as molecular markers in DNA barcoding and phylogenetic analysis of C. nannophylla.

IR Expansion and Contraction

As a highly conserved region of the cp genome, the expansion and contraction characteristics of the IR region are mainly responsible for changes in cp genome size and rearrangement. Therefore, to compare IR expansion and contraction in the cp genome of C. nannophylla with those of the four Clematis plants, we analysed the border structure of C. nannophylla and four reference Clematis cp genomes (Fig. 7).

The genes located in the binding regions of LSC/IRb, IRb/SSC, SSC/IRa, and IRa/LSC were rpl36, infA, ycf1, trnN, ndhF, ycf1, trnN, rps8, and rps16. The rpl36 and infA genes were located at the junctions of the LSC/IRb border, the rpl36 gene was located in the LSC region, and the infA gene of C. florida was located exclusively in the IR region and 20 bp distant from the LSC/IRb border, whereas those of other Clematis species extended into the LSC regions.

trnN was found to be completely located in the IRb region of C.nannophylla and C. florida and 72 bp from the IRb/SSC boundary. However, the ycf1 gene was found at the IRb/SSC boundary of the other three Clematis species (C. fruticose, C. tomentella, and C. songorica), which extended into the SSC regions; IRb/SSC extended into ndhF genes in all Clematis species, except for C. florida.

The distribution of ycf1 and trnN at the SSC/IRa boundary was the same in the five Clematis species. All ycf1 genes were embedded at the SSC/IRa border, with 3943 bp and 1697 bp located in the SSC and IRa regions, respectively. The trnN genes were all located in IRa regions, 72 bp away from the SSC/IRa boundary.

Except for C. florida, rps8 genes completely located in the IRa region were 311 bp away from the IRa/LSC boundary, respectively, while infA genes were found in C. florida completely located in the IRa region, 20 bp away from the IRa/LSC boundary. The distance between rps16 and the IRa/LSC boundary in the five Clematis species was 1193–1200 bp. Based on these results, The IR, LSC, and SSC regions of C. nannophylla were slightly different from those of the other four Clematis species at the boundary, and the numbers and sequences of the genes in these regions were conserved.

Adaptive Evolution Analysis

Using C. nannophylla as a reference, the selection patterns of protein-coding genes were uncovered by examining synonymous and non-synonymous substitutions in the cp genomes of five Clematis species. The Ka/Ks ratios of 78 protein-encoding genes were compared among the five cp genomes (Fig. 8). The Ka/Ks values of most coding genes were less than 1, or one was 0, which could not be calculated, indicating that they were relatively conserved. In particular, all genes of Clematis species had Ka/Ks values less than 1, except C. florida. However, the Ka/Ks values of ycf1 between C. nannophylla and C. florida were greater than one. The Ka/Ks ratios of ndhB, rpoCl, and ycf1 in C. nannophylla were the same as those of C. fruticosa, C. songorica, and C. tomentella.

Nucleotide diversity (Pi) values of the cp genome of C. nannophylla and four other Clematis plants (C. fruticose, C. tomentella, C. songorica, and C. florida) were calculated to determine divergent hotspots(Fig. 9). Pi values within 600 bp of the five Clematis cp genomes were calculated. The minimum and maximum values for the entire genome sequence ranged between 0 and 0.014, with an average of 0.001416.

However, some highly variable loci, including trnF-ndhJ, ndhE-ndhG, ndhF-rpl32, ccsA-ndhD, ccsA, ndhD, trnS-trnL, ndhF-rpl32, rps15, and ndhE, were located more precisely. All these regions had much higher values than other regions (Pi > 0.007), and most of these higher-value regions were located in the SSC region. In the LSC region, there were few loci with Pi values greater than 0.007, whereas the IR region had the lowest Pi value. The Pi values were all less than 0.003, indicating that the IR regions were substantially more conserved. Based on these results, we believe that rpl32, ccsA, ndhD, rps15, and ndhE, which have relatively high sequence deviation, are good sources for interspecies phylogenetic analyses.

Phylogenetic Analysis

Chloroplast genomes are of great significance in the study of plant phylogenetic relationships and evolutionary history. To determine the phylogenetic status of C. nannophylla within Ranunculaceae, a phylogenetic tree(Fig. 10) was reconstructed with the best-fit model GTR + G + I, and complete cp genomes of 17 Clematis species and five outgroups (4 Aconitans and 1 Magnolia) were selected to construct phylogenetic trees (ML). There were 21 nodes in this phylogenetic tree, 17 of which had an approval rating of ≥ 80% and 16 of which had an approval rating of 90%, indicating that the clustering results were highly reliable. The 21 plant species were divided into two large and seven small groups in the phylogenetic tree. Magnolia denudata of Magnoliaceae was in one large group, and 19 species of Ranunculaceae were in the other group. The 20 Ranunculaceae species were divided into Clematis and Aconitum. This result showed that C. nannophylla is highly homologous with C. fruticose and C. songorica but has a distant relationship with other plants in the genus. Within the Ranunculaceae family, Clematis and Aconitum are highly credible sister groups.

Cp genome structure and size of C. nannophylla

In plants, chloroplasts are important organelles for photosynthesis and energy production and are essential for plant growth and development [10]. Chloroplasts have a unique genome and gene expression system that plays a crucial role in metabolism as a source of energy that supports plant life [24]. The complete C. nannophylla cp genome showed great similarities to the majority of angiosperms in terms of GC content and quadripartite architecture, including two inverted repeats (IRs), a large single-copy region (LSC), and a small single-copy region (SSC), which is common in plants [24–26].

Furthermore, the cp genome of C. nannophylla contains 133 genes (including 89 protein-coding genes, 36 tRNAs, and eight rRNAs), and the GC% content of the genome is 38%. High GC content often correlates with earlier phylogenetic location differentiation (such as Nymphaeaceae and Magnoliaceae) [27]. Generally, the complete cp genome of C. nannophylla demonstrates great similarity to other reported cp genomes of Clematis plants in terms of length, structure, and gene composition [24,26,28]. There was no evidence of rearrangement, and a good collinearity relationship was observed. Aligning entire cp genomes revealed that C. nannophylla cp genomes were relatively well conserved; therefore, we concluded that C. nannophylla differentiated earlier among Ranunculaceae.

Cp genome repeat sequence of C. nannophylla

Plants contain numerous replicates in their genomes. However, the number, size, type, and location of repeats between different plants [29] and repeats of the cp genome have been widely used to identify mutation hotspots and determine plant evolutionary relationships [30]. Fifty dispersed repeats were found in C. nannophylla, including 22 forward, seven reverse, and 21 palindromic repeats. The number of dispersed repeats was the same as that in other species of Clematis, and most of these dispersed repeats were located in the LSC region. Most dispersed repeats were 20–30 bp in length, indicating that short repeats occurred more frequently than long repeats in the dispersed repeats of C. nannophylla. Tandem repeats are generally considered the primary cause of genomic rearrangements and expansions [31]. Tandem repeats of C. nannophylla ranged from 10 bp to 20 bp, with most of the tandem repeats located in intergenic spaces or intron regions and a few in the same gene region, ycf2 [32].

Simple sequence repeats (SSR) usually consist of 1–6 nucleotide repeating units and have been recognised as important molecular markers in the study of population variation [33,34]. Since genetic information in the cp genome is inherited only from the maternal progenitor, SSR in the cp genome are sensitive to population genetic effects [35] and have been widely used in the study of population evolution and polymorphism [36]. SSR varied in number and type according to species; 66 SSR repeats were screened in C. nannophylla, and their distribution was mainly found in the LSC and SSC regions. The number of variation sites in the IR region was reduced, mainly in the single-copy region [37]. Among the mononucleotide SSR repeats, A/T mononucleotide repeats were significantly higher than G/C mononucleotide repeats; this pattern also exists in other angiosperms [32,38]. The dispersed, tandem, and SSR repeats identified above are responsible for cp genome rearrangement, gene replication, and gene expression; play a vital role in genomic rearrangement and sequence variation in cp genomes; and are helpful in phylogenetic studies. Rearrangement or sequence variation in these repeat units may also lead to substitutions, insertions, and deletions in the cp genome [17,39,40]. Therefore, these repeat sequences have also been shown to be a source of information for the development of markers that play an important role in population and phylogenetic studies [32] and can be used for future genetic structure, differentiation, and species identification of C. nannophylla. Therefore, they are a source of information for the development of markers and thus play an important role in population and phylogenetic studies [32] for future genetic structure, differentiation, and species identification of C. nannophylla.

Codon usage bias in the cp genome of C. nannophylla

Codon usage bias is an important feature of genome evolution and is of great significance in the study of molecular gene evolution and exogenous expression [41]. PR2 further confirmed that most genes in C. nannophylla favour T and G in the coding chain rather than A and C and that the direct cause of this base asymmetry is the replication mechanism. However, the asymmetry between coding and non-coding strands is an important cause of nucleotide skew [42]. However, the influence of replication mechanisms on base bias differs in the AT and CG asymmetries. Replication is generally strong for GC skew, whereas AT skew is caused by coding sequence-related mechanisms [42,43].

Codon usage patterns are the evolutionary features of the genome. In plants, codon usage bias is related to gene expression and is mainly affected by natural selection and mutation pressure, with differences among species [44]. In the cp genome of C. nannophylla, there are 30 high-frequency codons (RSCU > 1); leucine is the most important amino acid, and cysteine contains the least, which is consistent with the codons observed in other higher plants [41,45,46]. The use of synonymous codons is not random, and analysis of codon preferences can provide valuable information for understanding species adaptation and molecular evolution.

Comparative genomic analysis of the cp genome of C. nannophylla

The IR regions of the cp genomes of angiosperms are highly conserved. The expansion and contraction of the IR region boundaries are common evolutionary events in most angiosperms, which may lead to variations in cp genome length, gene replication or reduction, and the origin of pseudogenes [47,48]. This study found that IR expansion and contraction of C. nannophylla showed great similarity with other plants of Clematis, and these regional genotypes and distribution locations are similar [25]. However, only minor differences were observed near the IRb/SSC boundaries. trnN was not ycf1 at the IRb/SSC boundary of C. nannophylla and C. florida, and infA was not observed near the IRa/LSC boundary, which may be the result of contraction and expansion of the IR region; this is also an important reason for the differences in cp genome length [49]. The infA gene is transcribed as polycistronic mRNA, a component of the ribosome protein (rpl23) operon, while the ycf1 gene is a functional gene and encodes essential products for cell survival [50]. Therefore, the loss (or pseudogenisation) of infA and ycf1 may result from gene transfer to the nucleus. However, there is no evidence that infA and ycf1 are transferred from the cp genome to the nuclear genome in Clematis. Further studies on the transcriptomes of these two genes are required to elucidate the effect of length variation on Clematis.

Owing to the highly conserved structure and nucleotide content of cp genomes, mutation hotspots of cp genomes can be quickly and accurately identified by comparative analysis. Therefore, mutation hotspots are often used as a basis for highly variable markers (DNA barcodes) in population genetics and phylogenetic studies [51,52]. In this study, we compared the cp genome structure of five Clematis species using mVISTA (using Clematis fruticosa as a reference) and found that the non-coding region was more prone to mutations than the coding region. Furthermore, the variation in the SC region was higher than that in the IR region, which is similar to the results of previous plant studies [25,51,53]. psbA-atpA, atpI-rpoC2, rpoB-psbD, psbE-petG, clpP, and rpoC2 were the most highly variable regions detected in C. nannophylla. To determine the degree of variation in these highly variable regions in C. nannophylla, the nucleotide variability in DNASP v6 was used to identify differences among the cp genomes of Clematis and mutation hotspots. Nucleotide diversity (Pi) indicates the degree of variation in the nucleic acid sequences in each species, and sites with high variability can be selected as molecular markers for population genetics [49,54]. In the present study, the results of the nucleotide diversity analysis showed that the gene sequences in the LSC and SSC regions were more variable than those in the IR regions, which is consistent with the results found in Asteraceae and Fagaceae plants [49,59].

By analysing the cp genome sequence variation of five Clematis species, we identified 13 hypervariable regions (Pi > 0.006) in the LSC and SSC regions, which is of great significance for the study of molecular barcodes; highly variable regions, such as ndhF,ccsA, and ndhD, have also been found in two Korean endemic Clematis species [25]. Simultaneously, the same highly variable regions, ccsA and rpl32, were also found in Fagus longipetiolata of Fagaceae. The ccsA gene is also considered to be the locus for understanding cp genome evolution in Fagus longipetiolata of Fagaceae [49], Litsea [54], Pterocarpus [51], and Prosopis genera [55]. Furthermore, the Pi values of 13 height-variable regions in this study were all higher than 0.006, corresponding to the height-variable region. Overall, these highly diverse regions provide a wealth of information for the development of molecular markers for the identification of Clematis species, as well as for the analysis of the phylogenetic relationships and population genetics of C. nannophylla.

Adaptive Evolution Analysis of the cp genome of C. nannophylla

By comparing C. nannophylla with four other species of Clematis, we detected the protein-coding region genes in C. nannophylla under selection pressure. If a base change leads to an amino acid mutation, it is called a non-synonymous mutation (Ka); otherwise, it is called a synonymous mutation (Ks), and a non-synonymous mutation is usually affected by natural selection [56]. Ka/Ks is generally used to express the selection pressure of protein-coding genes. When Ka/Ks is greater than 1, it shows a positive selection effect; when Ka/Ks is less than 1, it shows a purification selection effect [57]. In this study, the Ka/Ks of most genes in C. nannophylla was less than 1 compared to that of the other four plants, indicating that purification selection played an important role in the cp genomes of the five Clematis species. However, only the Ka/Ks of the ycf1 (C. nannophylla and C. florida) genes was greater than 1, indicating that the ycf1 gene was selected to adapt to the living environment; ycf1 was also positively selected in previous studies [33]. The ycf1 gene, the largest gene in cp and the most potential cp DNA barcode encodes the ATP-binding (ABC) protein in cp. ycf1 is characterized by species-specificity [50,58], rapid mutation rate, and rapid evolution [57] and has been verified to have classification potential at the subgenus level. In C. nannophylla, regions with high purification selectivity were mainly distributed in self-replication (proteins of large ribosomal subunits and subunits of RNA polymerase), photosystem genes (subunits of photosystem and NADH dehydrogenase), other genes, and unknown genes (ycf), similar to the evolution of cp genes in Pterocarpus, Artemisia maritima, and Artemisia absinthium [51,59], suggesting that strong purification selection preserves specific gene residues and gene functions in these species.

Phylogenetic analysis of the cp genome of C. nannophylla

Cp genomes contain a large amount of genetic information that is a useful resource for inferring evolutionary and phylogenetic relationships [60]. Many researchers have used the complete cp genome sequence to resolve phylogenetic relationships at various taxonomic levels, and a strong phylogenetic tree can intuitively represent the relatedness of species and the evolutionary relationships at various scales. The present study reconstructed a phylogenetic tree with the complete cp genomes of 23 species using the ML method with four Aconitum and 1of magnolia as outgroups. The results showed that C. nannophylla was more closely related to C. fruticosa and C. songorica but less closely related to C. florida, which is consistent with the results of classification based on morphological characteristics. C. nannophylla, C. fruticosa, C. tomentella, and C. songarica belong to the sect. Fruticella, whereas C. florida belongs to the sect. Viticella belongs to the Clematis group [6]. The present study also showed that Clematis is monophyletic, divides into two large subclades, and Clematis forms sister relationships with Aconitum [28].

In summary, the complete cp genome of C. nannophylla was sequenced and compared with that of other related species, providing an important reference for the phylogeny of C. nannophylla. Although the cp genomes of C. nannophylla were identical to other Clematis species in genome structure, gene content, and GC content, there were some differences in the boundaries of the IR region. Nucleotide diversity analysis indicated some hotspots in the LSC and SSC regions of the cp genes in C. nannophylla, which could provide informative markers for the phylogenetic analysis of C. nannophylla. Purification selection played an important role in the cp genomes of five Clematis species, whereas ycf1 was positively selective (C. nannophylla and C. florida). Phylogenetic analysis showed that C. nannophylla is closely related to C. fruticosa, C. tomentella, and C. songarica, and the well-resolved phylogenetic tree showed a monophyletic origin of the genera Clematis and Aconitum as sister genera. The cp genome information in this study provides reference data for molecular marker development, phylogenetic analysis, population studies, and cp genome processing, as well as for the better exploitation and utilisation of C. nannophylla. These results can guide more efficient germplasm resource utilisation, conservation, and breeding strategies.

Plant Material, DNA Extraction, and Genome Sequencing

Healthy and mature leaves of C. nannophylla were sampled from Guide County, Qinghai Province, China, and preserved in liquid nitrogen for further study. The plant specimens were lodged at the College of Animal Science and Veterinary Science, Qinghai University, China. The leaves of C. nannophylla were conserved in drikold, delivered to Genesky Biotechnologies Inc. for cp genome extraction and sequencing, and then assembled and further analysed by Genesky Biotechnologies Inc.

DNA extraction, sequencing, and assembly

Sample Quality Control. Firstly, Nanodrop was used to detect the concentration and purity of the sample, and the concentration was ≥ 20 ng/µL, the total amount was ≥ 100ng, and OD260/OD280 = 1.8–2.2. The integrity of the DNA sample was tested using agarose gel electrophoresis, which required the main band of genomic DNA to be apparently visible without evident degradation and dispersion.

Random DNA Library Construction. A random sequencing library was constructed using a transposable enzyme library-building kit. The library was constructed quickly and efficiently using transposition enzymes to randomly interrupt the DNA and attach splices to both ends of the fragment.

PCR Amplification of DNA libraries. A high-fidelity polymerase was used to amplify the original library to ensure a sufficient library volume in the sequencer. Simultaneously, PCR was used to introduce specific index and sequencing connectors at both ends of the library. The number of PCR amplification cycles was maintained between 12 and 15. The bias introduced by excessive amplification cycles was reduced to ensure sufficient products.

Size Selection of library fragments. For enlarged libraries, fragment size screening was performed using the Agencourt SPRIselect fragment screening kit while purifying the library. A double-size selection screening method was adopted. First, SPRI magnetic beads were used to remove the left side of the target area. The large fragment on the right-side size selection was removed, and the sequencing library with a fragment peak value of 300 bp was screened out.

Library Quality Check. The sequencing library was inspected and quantified. Qubit was used to accurately quantify the library concentration for accurate mixing of samples to ensure the proper and balanced data volume of each sample. An Agilent 2100 Bioanalyzer was used to determine the size distribution of the library fragments and evaluate their suitability for computer use.

Library Pooling and Sequencing. The qualified samples were diluted with an equal molar number of samples mixed in the machine. The library was sequenced using an Illumina HiSeq platform with a 2× 150 double-ended sequencing strategy.

Data quality assessment and assembly. First, the FastQC software and R were used to evaluate the quality of the original sequencing data. Kraken2 software was used to identify the chloroplast sequences in the sequencing data. Finally, the sequence was assembled into contigs using metaSPAdes software. The assembly result of the reference genome was further analysed to determine whether the contig was circular, to correct the direction of the contig, and to determine the position of the starting base.

Annotation and analysis of the Cp genome sequences

According to the reference species (C. florida:NC_058885, Clematis fruticosa:NC_065273, Clematis tomentella:NC_065291, Clematis songorica:NC_065290), chloroplasts were annotated with CPGAVAS2 software, GenBank files were mapped with ogdraw software (http://www.1kmpg.cn/cpgview/), the collinearity between the sample and the corresponding reference genome was analysed using BLAST + software, and the collinearity results were analysed using Circos software.

SSRs were analysed using the Perl script MISA V1.0 (https://webblast.ipk-gatersleben.de/misa/index.php), and the minimum repeats of mononucleotides, dinucleotides, trinucleotides, tetranucleotides, pentanucleotides, and hexanucleotides were set to 10, 5, 4, 3, 3, and 3, respectively [61,62]. Tandem repeats were identified using the Tandem Repeats Finder v. 4.09 (https://tandem.bu.edu/trf/submit_options)[49]. REPuter software identified dispersed repeats, including forward(F), reverse(R), complement(C), and palindromic(P) match repeats, with a minimal length of 8 bp and a Hamming distance of 3 (https://bibiserv.cebitec.uni-bielefeld.de/reputer ) [29, 62].

The nucleotides A, T, C, and G at the third position of synonymous codons were acquired using the program CodonW (version 1.4.2, https://sourceforge.net/projects/codonw/ ) [49]. Parity rule 2 (PR2) analysis was employed to examine nucleotide usage bias in the coding genes of C. nannophylla, and Origin 2021 Ink. was used for mapping [29].

Phylogenetic analysis

Combined with 23 previously reported Clematis plastomes, we constructed a phylogenetic tree using the newly sequenced C. nannophylla cp genome and 23 cp genomes, including one family and one outgroup, downloaded from the NCBI database. MAFFT (v7, auto mode) was used for multiple sequence alignment [63]. The aligned data were then used to find the best DNA/protein models (ML) using MEGA 11 software, and the best model was the GTR + I + G model. Phylogenetic relationships were analysed using MEGA 11 with the Maximum Likelihood (ML) method to construct a phylogenetic tree (1000 bootstraps) [29,49].

Genome structure comparison

Based on the above results of the phylogenetic analysis, the MVISTA format files of the four Clematis species were submitted to the online analysis tool for comparative cp genomes(http://genome.lbl.gov/vista/mvista/submit.shtml ), with the shuffle-LAGAN mode using the annotation of C. fruticosa as a reference [29,64]. Expansion and contraction of four Clematis cp genomes IR boundaries were analysed using the IRscope tool to visualise the genes on the boundaries of the junction sites (https://irscope.shinyapps.io/IRapp/ ).

Adaptive Evolution and Phylogenetic Analyses

Based on the cp genomes of C. nannophylla and four other Clematis plants in this study, Ka/Ks values for each functional protein-coding gene and nucleotide diversity (Pi) values of the four Clematis cp genomes were calculated using DNASP 6 software at default settings [65]. Origin 2021 software was used to plot the data.

IR: Inverted repeat

SSR: Simple Sequence Repeat

ML: Maximum likelihood

Ka: Non-synonymous substitutions

Ks: Synonymous substitutions

Ethics approval and consent to participate

The sampling of three newly sequenced Clematis nannophylla specie was approved by Qinghai province of China and met local policy requirements. Our experimental research, including the collection of plant materials, are complies with institutional, national or international guidelines.

Consent for publication

Not applicable.

Availability of data and materials

All annotated chloroplast genomes have been deposited in GenBank (https://www.ncbi.nlm.nih.gov/genbank/), accession numbers are provided in Supplementary Additional file 1.

Competing Interest

The authors declare that they have no competing interests.

Authors' Contributions

JinPing Qin conceived and designed the study, performed the experiments, contributed materials and data analysis, and wrote the paper. Ying Liu and YanLong Wang revised the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the “A Demonstration Study on Domestication and Application of Native Ecological Grass Species in Alpine Region”, Scientific Research Project of “Qinghai Scholars” in Qinghai Province.

Acknowledgments

We sincerely thank Genesky Biotechnologies Inc., Shanghai for performing the high throughput sequencing.

Authors' information

Qinghai University, Qinghai Academy of Animal and Veterinary Sciences, Qinghai Provincial Key Laboratory of Adaptive Management on Alpine Grassland, Key Laboratory of Superior Forage Germplasm in the Qinghai-Tibetan Plateau,810016 Xining, China.

Hu Q, Qian R, Zhang Y, Zhang X, Ma X, Zheng J. Physiological and Gene Expression Changes of Clematis crassifolia and Clematis cadmia in Response to Heat Stress. Front. Plant Sci.2021,12:624875. doi: 10.3389/fpls.2021.624875
Hao DC, Gu XJ, Xiao PG, Peng Y. Chemical and biological research of clematis medicinal resources. Chinese Sci. Bull 2013, 58, 1120–1129. doi: 10.1007/ s11434-012-5628-7
Li R, Guo LX, Li Y, Chang WQ, Liu JQ, Liu LF, et al. Dose-response characteristics of Clematis triterpenoid saponins and clematichinenoside AR in rheumatoid arthritis rats by liquid chromatography/mass spectrometry-based serum and urine metabolomics. J. Pharm. Biomed. Anal. 2017, 136, 81–91. doi: 10.1016/j.jpba.2016.12.037
Liu D, Qu K, Yuan Y, Zhao Z, Chen, Y, Han B, et al. Complete sequence and comparative analysis of the mitochondrial genome of the rare and endangered Clematis acerifolia, the first clematis mitogenome to provide new insights into the phylogenetic evolutionary status of the genus. Front. Genet.2023,13:1050040. doi: 10.3389/fgene.2022.1050040
Qian R, Ye Y, Hu Q, Ma X, Zhang X, Zheng J. Metabolomic and Transcriptomic Analyses Reveal New Insights into the Role of Metabolites and Genes in Modulating Flower Colour of Clematis tientaiensis. Horticulturae 2023, 9,14. https://doi.org/10.3390/horticulturae9010014.
Committee of the Flora of China ,Chinese Academy of Sciences. Flora of China. Vol. 28. Ranunculaceae(2), Dicotyledonous Plant Class, Angiosperma Phylum. Science Press. 1980
Lyu R, Xiao J, Li M, Luo Y, He J, Cheng J, et al. Phylogeny and Historical Biogeography of the East Asian Clematis Group, Sect. Tubulosae, Inferred from Phylogenomic Data. Int. J. Mol. Sci. 2023, 24, 3056. https://doi.org/ 10.3390/ijms24033056
Zhao X, Hou Q, Su X, Qu B, Fan BL, Zhang H, et al. Variation of the floral traits and sexual allocation patterns of Clematis tangutica to the altitudinal gradient of the eastern Qinghai-Tibet Plateau. Biologia 2023,78,55–65. https://doi.org/10.1007/s11756-022-01178-5
Teshome N, Degu A, Ashenafi E, Ayele E, Abebe A. Evaluation of Wound Healing and Anti-Inflammatory Activity of Hydroalcoholic Leaf Extract of Clematis simensis Fresen (Ranunculaceae). Clin Cosmet Investig Dermatol 2022,15,1883-1897. https://doi.org/10.2147/CCID.S384419
Daniell H, Lin CS, Yu M, Chang WJ. Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Bio.2016,17:134. https://doi.org/10.1186/s13059-016-1004-2
Hu YC, Zhang Q, Rao GY, Sodmergen. Occurrence of plastids in the sperm cells of Caprifoliaceae: Biparental plastid inheritance in angiosperms is unilaterally derived from maternal inheritance. Plant Cell Physiol. 2008,49, 958–968. https://doi.org/10.1093/pcp/pcn069
Huang Y, Wang J, Yang YP, Fan CZ, Chen JH. Phylogenomic Analysis and Dynamic Evolution of Chloroplast Genomes in Salicaceae. Front. Plant Sci.2017, 8:1050. doi: 10.3389/fpls.2017.01050
He XY, Dong SJ, Gao CS, Wang QR, Zhou MJ, Cheng RB. The complete chloroplast genome of Carpesium abrotanoides L. (Asteraceae): structural organization, comparative analysis, mutational hotspots and phylogenetic implications within the tribe Inuleae. Biologia 2022,77, 1861–1876. https://doi.org/10.1007/s11756-022-01038-2
Blazier JC, Ruhlman TA, Weng ML, Rehman SK, Sabir JSM, Jansen RK. Divergence of RNA polymerase α subunits in angiosperm plastid genomes is mediated by genomic rearrangement. Sci. Rep. 2016,6, 24595. doi: 10.1038/srep24595
Guisinger MM, Kuehl JV, Boore, JL, Jansen RK. Extreme reconfiguration of plastid genomes in the angiosperm family Geraniaceae: Rearrangements, repeats, and codon usage. Mol. Biol. Evol. 2011,28, 583–600. doi:10.1093/molbev/msq229
Chumley TW, Palmer JD, Mower JP, Matthew FH, Calie PJ, Boore JL, et al.The complete chloroplast genome sequence of Pelargonium × hortorum: Organization and evolution of the largest and most highly rearranged chloroplast genome of land plants. Mol. Biol. Evol. 2006,23, 2175–2190. doi：10.1093/molbev/msl089
Abdullah, Mehmood F, Shahzadi I, Ali Z, Islam M, Naeem M, et al. Correlations among oligonucleotide repeats, nucleotide substitutions, and insertion-deletion mutations in chloroplast genomes of plant family Malvaceae. Journal of Systematics and Evolution 2021,(2):388-402. doi：10.1111/jse.12585
Shen XF, Wu ML, Liao BS, Liu ZX, Bai R, Xiao SM, et al. Complete Chloroplast Genome Sequence and Phylogenetic Analysis of the Medicinal Plant Artemisia annua. Molecules 2017,22:1330. doi：10.3390/molecules22081330
Yu XL, Tan W, Zhang HY, Gao H, Wang WX, Tian XX. Complete chloroplast genomes of Ampelopsis humulifolia and Ampelopsis japonica: Molecular Structure, Comparative Analysis, and Phylogenetic Analysis. Plants 2019, 8(10):410. https://doi.org/10.3390/plants8100410
Choi KS, Chung MG, Park S. The Complete Chloroplast Genome Sequences of Three Veroniceae Species (Plantaginaceae): Comparative Analysis and Highly Divergent Regions. Front Plant Sci. 2016, Mar 23;7:355. doi: 10.3389/fpls.2016.00355
Li B, Li YD, Cai QF, Lin FR, Huang P, Zheng YQ. Development of chloroplast genomic resources for Akebia quinata (Lardizabalaceae). Conserv Genet Resour. 2016,8:447–449. https://doi.org/10.1007/s12686-016-0593-0
Wang L, Wuyun TN, Du HY, Wang DP, Cao DM. Complete chloroplast genome sequences of Eucommia ulmoides: genome structure and evolution. Tree Genetics & Genomes 2016,12:12. https://doi.org/10.1007/s11295-016-0970-6
Sun CQ, Chen FD, Teng NJ, Xu YC, Dai ZL. Comparative analysis of the complete chloroplast genome of seven Nymphaea species. Aquatic Botany.2021, 170(1):103353. doi:10.1016/j.aquabot.2021.103353
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, et al. Versatile and open software for comparing large genomes. Genome Biology. 2004, 5: R12. doi：10.1186/gb-2004-5-2-r12
Cui GX, Wang CM, Wei XX, Wang HB, Wang XL, Zhu XQ, et al. Complete chloroplast genome of Hordeum brevisubulatum: Genome organization, synonymous codon usage, phylogenetic relationships, and comparative structure analysis. PLoS ONE. 2021,16(12): e0261196. doi: 10.1371/journal.pone.0261196
Liang DQ, Wang HY, Zhang J, Zhao YX, Wu F. Complete Chloroplast Genome Sequence of Fagus longipetiolata Seemen (Fagaceae): Genome Structure, Adaptive Evolution, and Phylogenetic Relationships. Life 2022, 12, 92. https://doi.org/10.3390/ life12010092
Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. doi：10.1093/molbev/mst010
Yan C, Du JC, Gao L, Li Y, Hou XL. The complete chloroplast genome sequence of watercress (Nasturtium officinale R. Br.): genome organization, adaptive evolution and phylogenetic relationships in Cardamineae. Gene 2019, 699: 24–36. https://doi.org/10.1016/j.gene.2019.02.075
Asaf S, Khan AL, Khan AR, Waqas M, Kang S-M, Khan MA, et al. Complete Chloroplast Genome of Nicotiana otophora and its Comparison with Related Species. Front. Plant Sci. 2016, 7:843. doi: 10.3389/fpls.2016.00843
Kovalenko SP. On the Origin of Genetically Coded Protein Synthesis. Russ J Bioorg Chem. 2021, 47, 1201–1219. https://doi.org/10.1134/S1068162021060121
Dobrogojski J, Adamiec M, Luciński R. The chloroplast genome: a review. Acta Physiol Plant. 2020,42, 98. https://doi.org/10.1007/s11738-020-03089-x
Choi KS, Ha YH, Gil HY, Choi K, Kim DK, Oh SH. Two Korean Endemic Clematis Chloroplast Genomes: Inversion, Reposition, Expansion of the Inverted Repeat Region, Phylogenetic Analysis, and Nucleotide Substitution Rates. Plants 2021, 10, 397. https:// doi.org/10.3390/plants10020397
Park BK, Ghimire B, Ha YH, Son DC, Kim DK. Complete chloroplast genome of Clematis taeguensis (Ranunculaceae), an endemic species from South Korea, Mitochondrial DNA Part B 2021, 6:4, 1496-1497. doi: 10.1080/23802359.2021.1910080
Cai ZQ, Penaflor C, Kuehl JV, Leebens-Mack J, Carlson JE, dePamphilis CW, et al. Complete Plastid Genome Sequences of Drimys, Liriodendron, and Piper: Implications for the Phylogenetic Relationships of Magnoliids. BMC Evol. Biol. 2006 ,10, 77. doi:10.1186/1471-2148-6-77
Park I, Kim WJ, Yang SY, Yeo SM, Li HL, Moon BC. The complete chloroplast genome sequence of Aconitum coreanum and Aconitum carmichaelii and comparative analysis with other Aconitum species. PLoS ONE 2017, 12(9): e0184257. https://doi.org/10.1371/journal.pone.0184257
Powell W, Morgante M, McDevitt R, Vendramin GG, Rafalski JA. Polymorphic simple sequence repeat regions in chloroplast genomes: applications to the population genetics of pines. Proc Natl Acad Sci U S A. 1995,15;92(17):7759-63. doi: 10.1073/pnas.92.17.7759
Zhao YM, Zhang X, Zhou T, Chen XD, Bing B. Complete chloroplast genome sequence of Gynostemma guangxiense: genome structure, codon usage bias, and phylogenetic relationships in Gynostemma (Cucurbitaceae). Braz. J. Bot. 2023. https://doi.org/10.1007/s40415-023-00874-z
Zhai YY, Zhang TY, Guo YB, Gao CX, Zhou LP, Feng L, et al. Phylogenomics, phylogeography and germplasms authentication of the Rheum palmatum complex based on complete chloroplast genomes. J Plant Res .2023. https://doi.org/10.1007/s10265-023-01440-0
Niu YF, Su T, Wu CH, Deng J, Yang FZ. Complete chloroplast genome sequences of the medicinal plant Aconitum transsectum (Ranunculaceae): comparative analysis and phylogenetic relationships. BMC Genomics 2023,24,90. https://doi.org/10.1186/s12864-023-09180-0
Huang SN, Ge XJ, Cano A, Salazar BGM, Deng YF. Comparative analysis of chloroplast genomes for five Dicliptera species (Acanthaceae): molecular structure, phylogenetic relationships, and adaptive evolution. Peer J. 2020, 8:e8450. https://doi.org/10.7717/peerj.8450.
Jiang M, Chen HM, He SB, Wang LQ, Chen AJ, Liu C. Sequencing, characterization, and comparative analyses of the plastome of Caragana rosea var. rosea. International Journal of Molecular Science 2018,19: 1419. https://doi.org/10.3390/ijms19051419
Jeong YM, Chung WH, Mun JH, Kim N, Yu HJ. De novo assembly and characterization of the complete chloroplast genome of radish (Raphanus sativus L.). Gene 2014, 551: 39–48. https://doi.org/10.1016/j.gene.2014.08.038
Luo YK, He J, Lyu R, Xiao JM, Li WH, Yao M, et al. Comparative Analysis of Complete Chloroplast Genomes of 13 Species in Epilobium, Circaea, and Chamaenerion and Insights Into Phylogenetic Relationships of Onagraceae. Front. Genet. 2021, 12, 730495. https://doi.org/10.3389/fgene.2021.730495
Wu ML, Yan RR, Xu X, Gou GQ, Dai ZX. Characterization of the Plastid Genome of the Vulnerable Endemic Indosasa lipoensis and Phylogenetic Analysis. Diversity 2023, 15(2):197. https://doi.org/10.3390/d15020197
Do HDK, Kim JH. A dynamic tandem repeat in monocotyledons inferred from a comparative analysis of chloroplast genomes in Melanthiaceae. Front Plant Sci. 2017, 8:693. doi: 10.3389/fpls.2017.00693
Wang ML, Wang X, Sun JH, Wang YH, Ge Y, Dong WP, et al. Phylogenomic and evolutionary dynamics of inverted repeats across Angelica plastomes. BMC Plant Biol. 2021, 21(1):26. doi10.1186/s12870-020-02801-w
Wang YZ, Jiang DC, Guo K, Zhao L, Meng FF, Xiao JL, et al. Comparative analysis of codon usage patterns in chloroplast genomes of ten Epimedium species. BMC Genom Data. 2023, 24, 3. https://doi.org/10.1186/s12863-023-01104-x
MRA´ZEK J, Karlin S. Strand compositional asymmetry in bacterial and large viral genomes. Proc Natl. Acas.Sci. USA. 1998, 95:3720–3725. https://doi.org/10.1073/pnas.95.7.3720
Romiguier J, Roux C. Analytical biases associated with GC-content in molecular evolution. Frontiers in Genetics. 2017, 8: 16. https://doi.org/10.3389/fgene.2017.00016
Sheng, JJ, She X, Liu XY, Wang J, Hu ZL. Comparative analysis of codon usage patterns in chloroplast genomes of five Miscanthus species and related species. PeerJ. 2021, Sep,23;9:e12173. doi: 10.7717/peerj.12173
Wang ZJ, Cai QW, Wang Y, Li MH, Wang CC, Wang ZX, et al. Comparative analysis of codon Bias in the chloroplast genomes of Theaceae species. Front Genet. 2022, 13:824610. doi:10.3389/fgene.2022.824610
Li G, Zhang L, Xue P. Codon usage pattern and genetic diversity in chloroplast genomes of Panicum species. Gene. 2021, 802:145866. doi:10.1016/j.gene.2021.145866
Jiang DZ, Cai XD, Gong M, Xia MQ, Xing HT, Dong SS, et al. Complete chloroplast genomes provide insights into evolution and phylogeny of Zingiber (Zingiberaceae). BMC Genomics 2023, 24, 30. https://doi.org/10.1186/s12864-023-09115-9
Bai XJ, Wang G, Ren Y, Su YY, Han JP. Insights into taxonomy and phylogenetic relationships of eleven Aristolochia species based on chloroplast genome. Front. Plant Sci.2023 14:1119041. doi: 10.3389/fpls.2023.1119041
Drescher A, Ruf S, Calsa JrT, Carrer, H, Bock, R. The two largest chloroplast genome-encoded open reading frames of higher plants are essential genes. Plant J. 2010, 22(2), 97–104. doi: 10.1046/j.1365-313x.2000.00722.x
Hong Z, Wu ZQ, Zhao KK, Yang ZJ, Zhang NN, Guo JY, et al. Comparative Analyses of Five Complete Chloroplast Genomes from the Genus Pterocarpus (Fabacaeae). Int. J. Mol. Sci. 2020, 21, 3758. https://doi.org/10.3390/ijms21113758
Abdullah, Mehmood F, Shahzadi I, Waseem S, Mirza B, Ahmed I, et al. Chloroplast genome of Hibiscus rosa-sinensis (Malvaceae): Comparative analyses and identification of mutational hotspots. Genomics 2020, 112, 581–591. doi:10.1016/j.ygeno.2019.04.010
Liu HY, Yu Y, Deng YQ, Li J, Huang ZX, Zhou SD. The Chloroplast Genome of Lilium henrici: Genome Structure and Comparative Analysis. Molecules 2018, 23(6):1276. doi:10.3390/molecules23061276
Zhang YY, Tian YJ, Tng DYP, Zhou JB, Zhang YT, Wang ZW, et al.Comparative chloroplast genomics of Litsea Lam. (Lauraceae) and its phylogenetic implications. Forests 2021, 12, 744. doi10.3390/f12060744
Asaf S, Khan AL, Khan A, Al-Harrasi A. Unraveling the chloroplast genomes of two prosopis species to identify its genomic information, comparative analyses and phylogenetic relationship. Int. J. Mol. Sci. 2020, 21, 3280. doi:10.3390/ijms21093280
Lohmueller KE, Albrechtsen A, Li YR, Kim SY, Korneliussen T, Vinckenbosch N, et al. Natural selection affects multiple aspects of genetic variation at putatively neutral sites across the human genome. PLoS Genet. 2011;7(10): e1002326. doi: 10.1371/journal.pgen.1002326
Nekrutenko A, Makova KD, Li WH. The Ka/Ks ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study. Genome Res. 2002,12(1):198–202. doi: 10.1101/gr.200901
Dong WP, Liu J, Yu J, Wang L, Zhou SL. Highly variable chloroplast markers for evaluating plant phylogeny at low taxonomic levels and for DNA barcoding. PLoS One 2012;7(4): e35071. doi: 10.1371/journal.pone.0035071
Shahzadi I, Abdullah, Mehmood F, Ali Z, Ahmed I, Mirza B. Chloroplast genome sequences of artemisia maritima and artemisia absinthium: comparative analyses, mutational hotspots in genus artemisia and phylogeny in family Asteraceae. Genomics2020 112(2), 1454-1463. https://doi.org/10.1016/j.ygeno.2019.08.016
Firetti F, Zuntini AR, Gaiarsa JW, Oliveira RS, Lohmann LG, Van, et al. Complete chloroplast genome sequences contribute to plant species delimitation: a case study of the Anemopaegma species complex. Am J Bot. 2017;104(10):1493–509. doi：10.3732/ajb.1700302

No competing interests reported.

Additionalfile1.xlsx

Download PDF

Version 1

posted

You are reading this latest preprint version

Phylogenomic Analysis and Dynamic Evolution of Chloroplast Genomes of Clematis nannophylla

Status:

Version 1

Abstract

Figures

Background

Results

Features of the C. nannophylla Cp Genome

Codon usage bias

Detection of Cp genome Repeat Sequences and SSRs

Comparison of Complete Cp Genomes

IR Expansion and Contraction

Adaptive Evolution Analysis

Phylogenetic Analysis

Discussion

Conclusion

Methods

Plant Material, DNA Extraction, and Genome Sequencing

DNA extraction, sequencing, and assembly

Annotation and analysis of the Cp genome sequences

Phylogenetic analysis

Genome structure comparison

Adaptive Evolution and Phylogenetic Analyses

Abbreviations

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1