Chloroplast genome basic characteristics of S. albonervius and three Sinosenecio species
We assembled a 151,224 bp closed circular chloroplast genome with a typical quadripartite structure from the sequencing data of S. albonervius, which includes a pair of inverted repeat regions (IRs) of 24,848 bp separated by large single-copy region (LSC) of 83,355 bp and small single-copy regions (SSC) of 18,173 bp (Fig. 1). The sequence of chloroplast genome encodes 133 genes, containing 88 protein-coding genes, 8 ribosomal RNA genes (rRNA) and 37 transfer RNA genes (tRNA) (Table 1). 18 duplicate genes are discovered in the IR regions, with 7 protein coding genes (rps7, rps12, rpl2, rpl23, ycf2, ycf15, ndhB), 4 rRNAs (rrn16s, rrn23s, rrn4.5s, rrn5s), and 7 tRNAs (trnN-GUU, trnR-ACG, trnA-UGC, trnI-GAU, trnI-CAU, trnV-GAC, trnL-CAA). 17 genes (atpF, ndhA, ndhB, petB, petD, rps12, rps16, rpl16, rpl2, rpoC1, rrn23s, trnA-UGC, trnG-UCC, trnI-GAU, trnK-UUU, trnL-UAA, trnV-UAC) have a single intron, and 2 genes (pafI and clpP1) contain two introns (Table 2). The overall GC content of this genome is 37.4%, while the corresponding values of the LSC, SSC, and IR regions were 35.50%, 30.60%, and 43.00%, respectively. Additionally, comparison of S. albonervius and other Sinosenecio species chloroplast genomes was provided (Table 3). The size of chloroplast genomes range from 150,926 to 151,315 bp, of which S. oldhamianus is the smallest and S. baojingensis is the largest. For their total genes, S. jishouensis and S. oldhamianus have the same number of genes (134), followed by S. albonervius, the same as that of S. baojingensis (133). The number of protein-coding genes in S. albonervius is up to 88, one more than S. baojingensis, S. jishouensis and S. oldhamianus (87), while that of tRNA and rRNA are consistent among the four species. Moreover, there is no significant difference in CG content in the analytical genome.
Table 1
The gene composition of S. albonervius chloroplast genome, "a" labeled genes have intron.
Group of genes
|
Name of genes
|
ATP synthase
|
atpA, atpB, atpE, atpFa, atpH, atpI
|
Photosystem II
|
psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbT, psbZ
|
NADPH dehydrogenase
|
ndhAa, ndhBa, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
|
Cytochrome b/f compelx
|
petA, petBa, petDa, petG, petL, petN
|
C-type cytochrome synthesis
|
ccsA
|
Photosystem I
|
psaA, psaB, psaC, psaI, psaJ
|
Photosystem assembly factor
|
pafII, pafIa
|
Photosystem biogenesis factor
|
pbf1
|
Large subunit of rubisco
|
rbcL
|
Small ribosomal units
|
rps11, rps12a, rps14, rps15, rps16a, rps18, rps19, rps2, rps3, rps4, rps7, rps8
|
Large ribosomal units
|
rpl14, rpl16a, rpl2a, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36
|
RNA polymerase sub-units
|
rpoA, rpoB, rpoC1a, rpoC2
|
Translation initiation factor
|
infA
|
Ribosomal RNA
|
rrn16s, rrn23sa, rrn5s, rrn4.5s
|
Transfer RNA
|
trnA-UGCa, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-GCC, trnG-UCCa, trnH-GUG, trnI-CAU, trnI-GAUa, trnK-UUUa, trnL-CAA, trnL-UAAa, trnL-UAG, trnM-CAU, trnN-GUU, trnP-UGG, trnQ-UUG, trnR-ACG, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC, trnV-UACa, trnW-CCA, trnY-GUA, trnfM-CAU
|
Acetyl-CoA-carboxylase sub-unit
|
accD
|
Envelope membrane protein
|
cemA
|
Protease
|
clpP1a
|
Maturase
|
matK
|
Component of TIC complex
|
ycf1
|
Hypothetical genes reading frames
|
ycf2, ycf15
|
Table 2
Genes with introns in the chloroplast genomes of S. albonervius as well as the lengths of the exons and introns.
Gene
|
Location
|
Exon 1 (bp)
|
Intron 1 (bp)
|
Exon 2 (bp)
|
Intron 2 (bp)
|
Exon 3 (bp)
|
trnK-UUU
|
LSC
|
37
|
2560
|
35
|
|
|
rps16
|
LSC
|
40
|
841
|
227
|
|
|
rpoC1
|
LSC
|
432
|
722
|
1635
|
|
|
atpF
|
LSC
|
145
|
704
|
411
|
|
|
trnG-UCC
|
LSC
|
23
|
725
|
48
|
|
|
pafI
|
LSC
|
124
|
696
|
230
|
740
|
155
|
trnL-UAA
|
LSC
|
37
|
454
|
50
|
|
|
trnV-UAC
|
LSC
|
38
|
575
|
37
|
|
|
rps12
|
LSC / IR
|
114
|
530
|
232
|
|
26
|
clpP1
|
LSC
|
71
|
806
|
292
|
606
|
229
|
petB
|
LSC
|
6
|
772
|
642
|
|
|
petD
|
LSC
|
8
|
718
|
475
|
|
|
rpl16
|
LSC
|
9
|
1061
|
399
|
|
|
rpl2
|
IR
|
393
|
670
|
435
|
|
|
ndhB
|
IR
|
777
|
671
|
756
|
|
|
trnI-GAU
|
IR
|
42
|
777
|
35
|
|
|
trnA-UGC
|
IR
|
38
|
821
|
35
|
|
|
rrn23
|
IR
|
2611
|
199
|
|
|
|
ndhA
|
SSC
|
553
|
1072
|
540
|
|
|
Table 3
Comparison of four Sinosenecio species chloroplast genomes.
Characteristics
|
S. albonervius
|
S. jishouensis
|
S. baojingensis
|
S. oldhamianus
|
Accession number
|
OL678114
|
NC057061
|
MZ325394
|
NC057622
|
Total length (bp)
|
151,224
|
151,257
|
151,315
|
150,926
|
LSC length (bp)
|
83,355
|
83,373
|
83,445
|
83,092
|
SSC length (bp)
|
18,173
|
18,178
|
18,172
|
18,130
|
IR length (bp)
|
24,848
|
24,853
|
24,849
|
24,852
|
Total number of genes
|
133
|
134
|
133
|
134
|
Protein coding genes
|
88
|
87
|
87
|
87
|
tRNA genes
|
37
|
37
|
37
|
37
|
rRNA genes
|
8
|
8
|
8
|
8
|
Total GC content
|
37.4%
|
37.4%
|
37.4%
|
37.3%
|
GC content in IRs
|
43.0%
|
43.0%
|
43.0%
|
43.0%
|
GC content in LSC
|
35.5%
|
35.5%
|
35.5%
|
35.4%
|
GC content in SSC
|
30.6%
|
30.6%
|
30.6%
|
30.6%
|
Simple sequences repeats (SSRs) and larger repeat sequences
S. albonervius chloroplast genome contained 53 simple sequence repeats (SSRs), including 26 mononucleotides, seven dinucleotides, eight trinucleotides, and 12 tetranucleotides (Fig. 2A). We counted the number of SSRs in SC and IR regions (Fig. 2B) and the different types of SSRs, in each chloroplast genome (Fig. 2C, Table S1). It can be seen that SSRs mainly occur in LSC, while SSRs are not detected in the IR regions of S. baojingensis and S. albonervius. The SSRs in S. albonervius, S. jishouensis, S. baojingensis, and S. oldhamianus are 53, 55, 49, and 56. It is worth noting that mononucleotide repeats of S. baojingensis and S. oldhamianus are more than the sum of other types. The most common SSR is mononucleotide repeats composed of A or T (Fig. 2D), and S. oldhamianus has the most (35 mononucleotide repeats). In contrast, S. albonervius has 26, as do S. jishouensis and S. baojingensis. Furthermore, we discovered larger repeats (> 10 bp) in the chloroplast genomes (Fig. 3, Table S2). Palindromic and forward repetitions are more universal than other repetition types. For S. albonervius, 99 larger repeats were identified, which are composed of 37 forward (F), 21 reverse (R), 37 palindromic (P), and four complements (C) repeats, and the largest repeat is a palindromic repeat with a size of 48 bp.
Codon usage and RNA editing sites
The codon usage frequency and relative synonymous codon usage (RSCU) frequency were calculated using 54 protein-coding sequences from the chloroplast genome of
S. albonervius (Table
4). There are 21,301 codons in these protein-coding sequences. With 2281 and 238 codons, Leu and Cys have the most and least number of amino acids. Relative synonymous codon usage analysis (Fig.
4) showed that RSCU value of 30 codons is greater than one, indicating some biased usage for these codons. At the same time, Met and Trp are encoded by a single codon (RSCU = 1), showing no biased usage. Additionally, among those genes with RSCU > 1, only the Leu codon (UUG) is G–ending, and the other 29 codons are A or U–ending. A total of 46 potential RNA editing sites were found in 18 protein-coding genes from the chloroplast genome of
S. albonervius (Table
5). The
ndhB genes contain the most RNA editing sites (9 sites), while several genes (
atpI,
psbf,
rpl20,
rpoA,
rpoB, and
rps2) include only one editing site. C-T conversion occurred at the first (21.7%) and second codon positions (78.3%) of all RNA editing sites, indicating that the editing frequency of the third codon position was lower than that of the second or first codon positions. Furthermore, serine codons were edited more frequently than other amino acid codons, and the conversion from serine to leucine occurred the most frequently.
Table 4
Codon usage for S. albonervius chloroplast genome by using 54 CDS.
Amino Acid
|
Codon
|
Number
|
RSCU
|
Amino Acid
|
Codon
|
Number
|
RSCU
|
Phe
|
UUU
|
828
|
1.37
|
Ser
|
UCU
|
478
|
1.81
|
|
UUC
|
382
|
0.63
|
|
UCC
|
231
|
0.87
|
Leu
|
UUA
|
738
|
1.94
|
|
UCA
|
324
|
1.22
|
|
UUG
|
472
|
1.24
|
|
UCG
|
126
|
0.48
|
|
CUU
|
490
|
1.29
|
Pro
|
CCU
|
342
|
1.55
|
|
CUC
|
136
|
0.36
|
|
CCC
|
159
|
0.72
|
|
CUA
|
301
|
0.79
|
|
CCA
|
262
|
1.19
|
|
CUG
|
144
|
0.38
|
|
CCG
|
120
|
0.54
|
Ile
|
AUU
|
897
|
1.47
|
Thr
|
ACU
|
427
|
1.63
|
|
AUC
|
328
|
0.54
|
|
ACC
|
197
|
0.75
|
|
AUA
|
601
|
0.99
|
|
ACA
|
330
|
1.26
|
Met
|
AUG
|
518
|
1
|
|
ACG
|
92
|
0.35
|
Val
|
GUU
|
424
|
1.49
|
Ala
|
GCU
|
533
|
1.77
|
|
GUC
|
123
|
0.43
|
|
GCC
|
189
|
0.63
|
|
GUA
|
433
|
1.53
|
|
GCA
|
343
|
1.14
|
|
GUG
|
155
|
0.55
|
|
GCG
|
139
|
0.46
|
Tyr
|
UAU
|
670
|
1.64
|
Cys
|
UGU
|
166
|
1.39
|
|
UAC
|
148
|
0.36
|
|
UGC
|
72
|
0.61
|
TER
|
UAA
|
32
|
1.78
|
TER
|
UGA
|
12
|
0.67
|
|
UAG
|
10
|
0.56
|
Trp
|
UGG
|
383
|
1
|
His
|
CAU
|
373
|
1.49
|
Arg
|
CGU
|
285
|
1.36
|
|
CAC
|
128
|
0.51
|
|
CGC
|
85
|
0.41
|
Gln
|
CAA
|
594
|
1.53
|
|
CGA
|
277
|
1.33
|
|
CAG
|
180
|
0.47
|
|
CGG
|
84
|
0.4
|
Asn
|
AAU
|
830
|
1.59
|
Ser
|
AGU
|
340
|
1.28
|
|
AAC
|
217
|
0.41
|
|
AGC
|
89
|
0.34
|
Lys
|
AAA
|
836
|
1.51
|
Arg
|
AGA
|
389
|
1.86
|
|
AAG
|
273
|
0.49
|
|
AGG
|
134
|
0.64
|
Asp
|
GAU
|
671
|
1.58
|
Gly
|
GGU
|
490
|
1.33
|
|
GAC
|
177
|
0.42
|
|
GGC
|
178
|
0.48
|
Glu
|
GAA
|
834
|
1.50
|
|
GGA
|
565
|
1.53
|
|
GAG
|
275
|
0.50
|
|
GGG
|
242
|
0.66
|
Table 5
RNA editing sites in the S. albonervius chloroplast genome.
Gene Name
|
Nt pos
|
AA pos
|
Align Col
|
Effect
|
Score
|
accD
|
451
|
151
|
162
|
CAC (H) = > UAC (Y)
|
1
|
accD
|
824
|
275
|
304
|
UCG (S) = > UUG (L)
|
0.8
|
accD
|
1225
|
409
|
450
|
CCA (P) = > UCA (S)
|
1
|
accD
|
1433
|
478
|
519
|
CCU (P) = > CUU (L)
|
1
|
atpA
|
773
|
258
|
258
|
UCA (S) = > UUA (L)
|
1
|
atpA
|
791
|
264
|
264
|
CCC (P) = > CUC (L)
|
1
|
atpI
|
629
|
210
|
213
|
UCA (S) = > UUA (L)
|
1
|
ccsA
|
110
|
37
|
39
|
CCA (P) = > CUA (L)
|
0.86
|
ccsA
|
370
|
124
|
127
|
CCC (P) = > UCC (S)
|
0.86
|
matK
|
284
|
95
|
108
|
UCU (S) = > UUU (F)
|
0.86
|
matK
|
637
|
213
|
229
|
CAU (H) = > UAU (Y)
|
1
|
matK
|
1240
|
414
|
430
|
CAU (H) = > UAU (Y)
|
1
|
ndhA
|
566
|
189
|
189
|
UCA (S) = > UUA (L)
|
1
|
ndhA
|
1073
|
358
|
358
|
UCC (S) = > UUC (F)
|
1
|
ndhB
|
149
|
50
|
50
|
UCA (S) = > UUA (L)
|
1
|
ndhB
|
467
|
156
|
156
|
CCA (P) = > CUA (L)
|
1
|
ndhB
|
586
|
196
|
196
|
CAU (H) = > UAU (Y)
|
1
|
ndhB
|
611
|
204
|
204
|
UCA (S) = > UUA (L)
|
0.8
|
ndhB
|
737
|
246
|
246
|
CCA (P) = > CUA (L)
|
1
|
ndhB
|
746
|
249
|
249
|
UCU (S) = > UUU (F)
|
1
|
ndhB
|
830
|
277
|
277
|
UCA (S) = > UUA (L)
|
1
|
ndhB
|
836
|
279
|
279
|
UCA (S) = > UUA (L)
|
1
|
ndhB
|
1481
|
494
|
494
|
CCA (P) = > CUA (L)
|
1
|
ndhD
|
359
|
120
|
128
|
UCA (S) = > UUA (L)
|
1
|
ndhD
|
575
|
192
|
200
|
UCA (S) = > UUA (L)
|
1
|
ndhD
|
854
|
285
|
293
|
UCA (S) = > UUA (L)
|
1
|
ndhD
|
863
|
288
|
296
|
CCC (P) = > CUC (L)
|
1
|
ndhD
|
1286
|
429
|
437
|
UCA (S) = > UUA (L)
|
0.8
|
ndhF
|
290
|
97
|
97
|
UCA (S) = > UUA (L)
|
1
|
ndhF
|
1340
|
447
|
447
|
UCU (S) = > UUU (F)
|
1
|
ndhG
|
166
|
56
|
56
|
CAU (H) = > UAU (Y)
|
0.8
|
ndhG
|
314
|
105
|
105
|
ACA (U) = > AUA (I)
|
0.8
|
petB
|
418
|
140
|
140
|
CGG (R) = > UGG (W)
|
1
|
petB
|
611
|
204
|
204
|
CCA (P) = > CUA (L)
|
1
|
psbF
|
77
|
26
|
26
|
UCU (S) = > UUU (F)
|
1
|
rpl20
|
308
|
103
|
103
|
UCA (S) = > UUA (L)
|
0.86
|
rpoA
|
824
|
275
|
279
|
UCA (S) = > UUA (L)
|
1
|
rpoB
|
983
|
328
|
345
|
GCG (A) = > GUG (V)
|
1
|
rpoC1
|
511
|
171
|
171
|
CCC (P) = > UCC (S)
|
1
|
rpoC1
|
1592
|
531
|
548
|
GCA (A) = > GUA (V)
|
0.86
|
rpoC1
|
2039
|
680
|
710
|
CCC (P) = > CUC (L)
|
1
|
rpoC2
|
2701
|
901
|
1101
|
CAU (H) = > UAU (Y)
|
1
|
rpoC2
|
3695
|
1232
|
1452
|
UCG (S) = > UUG (L)
|
0.86
|
rps2
|
248
|
83
|
83
|
UCA (S) = > UUA (L)
|
1
|
rps14
|
80
|
27
|
27
|
UCA (S) = > UUA (L)
|
1
|
rps14
|
149
|
50
|
53
|
CCA (P) = > CUA (L)
|
1
|
Comparative genomes and nucleotide diversity analysis
The chloroplast genomes of Sinosenecio species were compared and analyzed to determine the level of divergence, with S. oldhamianus as a reference (Fig. 5). IR (84.094–108.955k, 127.224–152.085k) regions and the coding regions are more conserved than the SC and non-coding regions. The coding regions of the ycf1 (122.727–127.817k) gene, on the other hand, are the most divergent, with greater diversity than the coding regions of other genes. We also compared IR, SC, and junction sites of Sinosenecio species (Fig. 6). The size of IR regions in different chloroplast genomes ranges from 24,848 to 24,853 bp. IR regions contain the rpl2 gene, two genes psbA and rpl22 in LSC region. SSC/IRa border is located within the coding region of the ycf1 gene, while rpl19 exists at the junction of LSC/IRb region. Moreover, at JSB, the ycf1 gene of S. albonervius and S. baojingensis extends into SSC region with two bp. The ndhF creates a location of 1 bp at the IRb region of each chloroplast genome. The trnH gene in S. albonervius and S. baojingensis at the JLA prolongs 10 bp to IRa. The rps19 gene extends into SSC region in S. jishouensis with three bp and S. oldhamianus with one bp, respectively. DNAsp analyzed the nucleotide diversity to determine the mutation hot spot regions in the chloroplast genome (Fig. 7). Pi values range from 0.00083 to 0.02611. The highest Pi values occurs in accD–pasI area with 0.02611, and other high-level peaks (Pi > 0.013) are found in following regions: trnK_UUU-rps16 (0.01583), ycf1 (0.01444), ccsA-ndhD (0.01333) and trnT_UGU-trnL_UAA (0.01306). However, these regions are primarily concentrated in LSC, implying that the LSC contains the most highly diverse regions.
Phylogenetic analysis
An ML phylogenetic tree was constructed using the chloroplast genome sequence alignments of 14 Asteraceae species (Fig. 8). All nodes have high support values, and Senecioneae of Asteraceae contains three major clades. The first clade includes four species from Sinosenecio of subtrib. Tephroseridinae and the other two clades consist of eight species from subtrib. Senecioninae. In the genus Sinosenecio, S. oldhamianus is the first to differentiate, followed by S. albonervius, and finally S. baojingensis and S. jishouensis. From the perspective of whole chloroplast genomes, Sinosenecio is phylogenetically close to Farfugium and Ligularia.