The plastid genome features of these species
The complete plastid genome of Angelica, Ostericum and related species was a single and typical quadripartite circular structure (Figure 1). The length of the nine species’ plastid genomes ranged from 146765 bp in Angelica biserrata to 164329 bp in Melanosciadium pimpinelloideum. The typical complete plastid genome quadripartite structure of these species consisted of two identical IR regions ranged from 17797 in Coelopleurum saxatile to 35157 bp in M. pimpinelloideum, small single-copy regions (SSC) ranged from 17432 in Ostericum grosseserratum to 17636 bp in A. tianmuensis, and large single-copy regions (LSC) ranged from 76450 in M. pimpinelloideum to 93901 bp in Coe. saxatile. The genomes total A content was between 30.8–31.1%, the G content was between 18.4–18.5%, the C content was between 19.0–19.2%, the T content was between 31.6–31.7% and the total GC content was between 37.5%–37.8%. The plastid genomes contained 130–143 genes, including 86–98 protein-coding genes (PCGs), 36–37 transfer RNA genes (tRNAs) and 8 ribosomal RNA genes (rRNAs) (Table 1).
Melanosciadium pimpinelloideum had the most genes (143 genes), with 14 more genes than Angelica, Coe. saxatile and Conioselinum chinense (petD, rpoA, rps11, rpl36, infA, rps8, rpl14, rps3, rpl22, rps19, rpl2, rpl23, ycf2, trnI-CAU), and contained 10 more genes than Ostericum (petD, rpoA, rps11, rpl36, infA, rps8, rpl14, rps3, rpl22, rps19). While Ostericum contained 4 more genes (rpl2, rpl23, ycf2, trnI-CAU) than Angelica, Coe. saxatile and Con. chinense. Moreover, M. pimpinelloideum had the most protein-coding genes (99 protein-coding genes), with 13 more protein-coding genes than Angelica, Coe. saxatile and Con. chinense (infA, petD, rpl2, rpl14, rpl22, rpl23, rpl36, rpoA, rps3, rps8, rps11, rps19, ycf2), and had 10 more protein-coding genes than Ostericum (infA, petD, rpl14, rpl22, rpl36, rpoA, rps3, rps8, rps11, rps19). Meanwhile, Ostericum contained 3 more protein-coding genes (rpl2, rpl23, ycf2, trnI-CAU) than Angelica, Coe. saxatile and Con. chinense. M. pimpinelloideum and Ostericum had the same number of tRNAs (37 tRNAs), and they had one trnI-CAU more than the other species. Referring to previous research, we considered infA, ycf15, ycf68, and the short copy of ψycf1 and ψrps19 as pseudogenes [34, 39, 40]. To distinguish pseudogene ycf1 from the functional ycf1 gene, we considered the pseudogene as ψycf1 in this study, and the ψrps19 is the same case.
Analysis of inverted repeat contraction and expansion
To assess the Angelica, Ostericum and related species’ (including Glehnia littoralis) expansion and contraction of the IR regions, we identified and focused on the junctions of IR/LSC and IR/SSC (Figure 2). Gene ycf2 of 277–766 bp in the junction of the LSC and the IRb region was located in the IRb region in Angelica sp., Coelopleurum saxatile, Conioselinum chinense, and G. littoralis. In Melanosciadium pimpinelloideum the junction of the IRb and LSC coincided with the petB gene, with 118 bp in the LSC region and 1293 bp located in the IRb region. In Ostericum sp. the rps19 gene extended 81 bp into the IRb region. The ψycf1 gene was located in the junction of the SSC and IRb region in all ten species. Similarly, the ndhF gene was located in the SSC region 25–126 bp away from the IRb/SSC border except for G. littoralis. In G. littoralis, the ndhF gene extended 9 bp into the IRb region at the border of IRb/SSC. The ycf1 gene crossed the SSC and IRa border in all ten species, with 3439–3592 bp in the SSC region and 1856–1975 bp located in IRa region. In all species, the trnH gene was closest to the IRa/LSC border and was located in the LSC region being 3–1155 bp away from the IRa region. However, there were differences in IRa region away from the IRa/LSC border. The trnL gene was in the IRa region in Angelica sp., Coe. saxatile, Con. chinense, and G. littoralis, but was closest to the IRa/LSC border being 619–1329 bp away. Whereas the petD gene was closest to the IRa/LSC border and in the IRa region in M. pimpinelloideum, being 1472 bp away from the border. In Ostericum sp., it was the ψrps19 gene that was closest to the IRa/LSC border and located in the IRa region with 81 bp.
Analysis of codon usage and amino acids frequency
The 20 amino acids were encoded by 64 codons in the ten complete plastid genomes (i.e. Angelica amurensis, Angelica biserrata, Angelica tianmuensis, Coelopleurum saxatile, Conioselinum chinense, Melanosciadium pimpinelloideum, Glehnia littoralis, Ostericum grosseserratum, Ostericum huadongense, Ostericum sieboldii) (Figure 3). Methionine (Met) and tryptophan (Trp) only had the minimum type of codons with one codon, while leucine (Leu) and serine (Ser) had the maximum type of codons with six. The total number of codons of these species ranged from 22489 in G. littoralis to 26569 in M. pimpinelloideum. Among the amino acids, Leu had the maximum codon numbers ranging between 2399–2826, and cysteine (Cys) had the minimum codon numbers ranging from 240–285. The most used codon was AUU ranging from 927 to 10744, with the second being AAA ranging from 905 to 1105. The three least used were the termination codons UAA, UGA and UAG, with UGA using the least and ranging from 12 to14, while UAA was the most used and ranged from 33–43. Excluding the termination codons, the least used codon was UGC and ranged from 58–60. The codon with the highest RSCU value was UUA, then AGA, and GCU (deep red). The codon with the lowest RSCU value was AGC, followed by (in order) CGC, CUG, GAC, UAC, and CUC (deep blue) (Figure 3, Table S2). Furthermore, RSCU values of third position codons A or U were mostly greater than 1, whereas the third position codons C or G were mostly less than 1 (Table S2).
Analyses of repeats sequences and single sequence repeats (SSRs)
We detected forward, palindromic, reverse, and complementary repeats in these ten species. After filtering out duplicate data, we found that forward and palindromic repeats were typical, while reverse and complementary repeats were rare. Most repeat sequence lengths were 30–45 bp (Figure 4). Among these species, Conioselinum chinense had the smallest total number of repeats with 24, whereas Melanosciadium pimpinelloideum and Ostericum sieboldii shared the largest number of repeats 40.
We detected six types of SSRs (mononucleotide, dinucleotide, trinucleotide tetranucleotide, pentanucleotide and hexanucleotide) in these ten species. The SSR analysis showed Ostericum huadongense had the smallest number of SSRs with 66, and Angelica biserrate had the largest with 86. The number of mononucleotide SSRs was the largest, followed by dinucleotide, then tetranucleotide rather than trinucleotide. Pentanucleotide and hexanucleotide repeats were very rare, especially hexanucleotide. Only three species (A. biserrate, Con. chinense and O. sieboldii) had hexanucleotide SSRs (Figure 4). This is consistent with other Apiaceae species [34, 37, 40, 41] and Allium [35, 36, 42], but dinucleotide repeats are most numerous in Forthysia [43], and trinucleotide repeats are most abundant in Nitotiana [44].
Analysis of sequence diversity nucleotide
Nucleotide diversity (Pi) of the ten plastid genomes was calculated to estimate the sequence divergence level of different regions. Among all species, the SSC and LSC regions exhibited high divergence levels and IR regions exhibited low divergence (Table S3. Figure 5). The results indicated that the IR region was relatively conserved. The sequences with high Pi values were predominantly in intergenic spacers in all species. There were two exceptions, gene regions of ycf2 and ycf1, which were in the boundaries of IR/SSC and IR/LSC.
In addition to the top ten Pi values, we found several more important peaks in Figure 5. Figure5A illustrates the Angelica sp. nucleotide diversity (Pi) level. We found that the regions with comparatively higher Pi values were the intergenic regions including psbZ-trnG (GCC)-trnfM(CAU)-rps14, rps4-trnL(UAA), atpB-rbcL, petA-psbJ-psbL-psbF-psbE, ψycf1-ndhF, ndhF-rpl32-trnL(UAG) and additional gene region ycf2. The petA-psbJ showed the highest variability, with a Pi value of 0.02778. Figure5B, which shows the nucleotide diversity (Pi) level of Ostericum, demonstrates that the intergenic regions had higher values including atpI-atpH, trnV (UAC)-ndhC, petA-psbJ-psbL-psbF, psbH-petB, infA-rpl36-rps11, ψycf1-ndhF and gene region of ycf1. The atpI-atpH showed the highest variability, with a Pi value of 0.17333. Figure5C illustrates the nucleotide diversity (Pi) of the ten subtribe Angelicinae species. Of note in Figure 5C are the intergenic regions including trnK(UUU)-rps16-trnQ(UUC), atpI-atpH, trnC(CGA)-rpoB, trnE(UUC)-trnT(GGU), petA-psbJ-psbL, rps4-trnT(UGU)-trnL(UAA), ndhF-rpl32-trnL(UAG)-ccsA, an gene regions of ycf1 and ycf2. In these regions, petA-psbJ had the highest variability, with a Pi value of 0.04726, then rpl32-trnL with a similarly high Pi value of 0.04693. There were several higher diversity regions that were different in Angelica and Ostericum sp., such as the intergenic region atpI-atpH in Ostericum, and the intergenic region atpB-rbcL in Angelica.
Phylogenetic analysis
We used 39 complete plastid genomes and 44 nrITS sequences to construct ML and BI trees to investigate the phylogenetic relationships in subtribe Angelicinae (Figure 6). The plastid genome and ITS trees produced incongruent topology trees. The plastid tree indicated that Angelica is monophyletic but subtribe Angelicinae is non-monophyletic. Angelicasp. were closely related to Coelopleurum saxatile and Melanosciadium pimpinelloideum. Ostericum sp. were closely related to Pternopetalum, and belonged to the Acronrma Clade [21]. Conioselinum chinense was in the Sinodielsia Clade [21]. In addition, Angelica sinensis was closely related to Con. chinense and also belonged to the Sinodielsia Clade [21]. In particular, Glehnia littoralis was embedded in Angelica (Figure6B, 6D, BS=69, PP=1).
However, these species and genera were in parallel branches in the nrITS tree, including Melanosciadium, Coelopleurum, Glehnia, Peucedanum, Angelica acutiloba and Angelica. The parallel branches and low support in BI analysis (Figure6A, 6C, BS=100, PP=0.5591) indicated that nrITS provided insufficient information to determine phylogenetic relationships and more nuclear gene and cpDNA sequences are needed to increase certainty in models. The ITS tree results from ML and BI analysis were a little different regarding the placement of Peucedanum japonicum where it was not one of the parallel branches of Angelica in BI analysis, but in ML analysis is one of the parallel branches of Angelica. Nevertheless, all trees indicated that the subtribe Angelicinae is non-monophyletic, and relatedness between Angelica and Ostericum is more distant.