In this study, no significant differences were found in terms of gene number and order, and organization in Convallarioideae species (Table 1). Each of the 7 Convallarioideae chloroplast genomes had four conjoined structures and contained the same set of 137 distinctive genes, including 87 coding genes, 38 tRNAs, and 8 rRNAs (Table 1). The subfamily exhibited a similar GC content ranging from 37.5–37.8% (Table 1), indicating a high level of species similarity, which falls within the typical range observed in chloroplast genomes of seed plants (34–40%) [36]. In addition, the GC contents of the LSC and SSC regions in Convallarioideae were much lower than those of the IR regions, which may be related to the presence of four rRNA sequences in these regions, e.g., rrn16, rrn23, rrn4.5, and rrn5.
In Convallarioideae, the length of chloroplast genomes ranges from 153 to 162 kb, and the LSC, SSC, and IR regions are generally conserved (Table 1). However, length variations have been observed in the SSC and IR regions in certain species. For example, the expansion of the SSC region in Rohdea chinensis is attributed to the movement of the ycf1 gene from the IR/SSC boundary to the SSC region [37]. Additionally, only in Convallaria, the occurrence of mitochondrial DNA sequences in the plastome led to the expansion of the IR regions [38].
Long repeats were essential for studying genome reorganization, rearrangement, and phylogeny, as well as causing substitutions and insertions in the chloroplast genome [39]. We detected 39–57 large repeats in seven Convallarioideae species, and the types and numbers of each type differed among species (Fig. 2A). SSRs are tandem repeats of one to six nucleotide long DNA motifs with high variability, multi-allelic nature, codominant inheritance, repeatability, relative abundance, and other traits that hold great promise in evolutionary and population genetics studies [40]. Our study identified a total of 311 SSRs in the seven Convallarioideae species (Fig. 2B-D). Consistent with the previous reports in Liliaceae species, mononucleotides are the most common SSRs and the largest amount of SSRs is located in the LSC region [41].
Comparative analyses make it easy to identify mutational hotspots in plant due to their highly conservative structure. The presence of mutational hotspots surrounded by conserved sequences serves as the foundation for DNA barcodes commonly utilized in population genetic or phylogenetic research [42]. According to the comparison analysis of the whole chloroplast genomes using mVISTA revealed that the IR regions exhibited lower variability compared to the LSC and SSC regions (Fig. 3), possibly due to copy correction caused by higher gene conversion between the sequences of the two IR regions [43]. Non-coding regions have higher nucleotide diversity than coding regions, which aligns with reports on other angiosperms [44].
The entire sequences of the 7 chloroplast genomes generated here, lack any striking inversions or rearrangements and therefore were outlined as three locally collinear blocks in our analyses. Nucleotide diversity analysis identified five intergenic regions (including rpoB-trnC_GCA, trnE_UUC - trnT_GGU, ndhC - trnV_ UAC, ccsA-ndhD, and trnR_ACG-rrn4.5) and six protein coding regions (including trnS_GCU, ndhF, ndhD, ndhH, ycf1, and trnN_GUU) (Fig. 5). Recently, ycf1 have been considered as the core DNA barcode in terrestrial plants [45]. Based on our study, these regions can be used as species identification barcode labels in the subfamily.
Although the IR and coding regions are highly conserved, the contraction and expansion of their borders could cause genome size variations, which were considered as an important pattern of genome evolution among plant lineages [46]. Our results showed that the trnH and rps19 gene cluster was included in the IR region of the Convallarioideae chloroplast genome, while JLA is located downstream of the psbA gene (Fig. 6), which was consistent with its location in most monocot genomes [47]. At the IRb/LSC junction, the truncated copy of ycf1 gene was observed in all species (Fig. 6). In contrast, gene overlap between ycf1 and ndhF was also found in Maianthemum henryi, Speirantha gardenia, Nolina atopocarpa, and Dracaena fragrans, which may be related to the contraction and expansion of the IR region.
Previous studies have shown that analysis of codon bias in the chloroplast genome is helpful for understanding the origin, variation patterns, and evolution of species or genes [48]. Most amino acids in the 7 species show codon bias with a high preference (RSCU > 1), except for methionine and tryptophan (RSCU = 1) (Table S2). The RSCU value of codon types ending with A or U was larger than that for types ending with G or C, which may have been caused by a bias in composition toward high A/T ratios [49]. The results of our study are similar to those of other species with chloroplast genome codon usage biases, and studying codon preference can help us better understand the gene expression and cellular function of Convallarioideae.
The phylogenetic tree of the subfamily reconstructed based on chloroplast genome data supports the classification of most tribes within this subfamily proposed by previous studies, except for Polygonateae (Fig. 7). Our phylogenetic tree indicated that Polygonateae did not form a monophyletic group because Maianthemum exhibited a close relationship with Nolineae and Ophiopogoneae, but with poor BS supports (Fig. 7). Traditionally, the placement of Maianthemum within in Polygonateae based on both morphological characteristics and genetic markers from multiple loci [50]. Nevertheless, our plastome data supports Maianthemum close to Ophiopogoneae, which agrees with former studies [10].
Our phylogenetic tree reveals some poor deep relationships among the main lineages within the subfamily, except for a robust affiliation between Rusceae and Dracaeneae, which is consistent with previous investigations utilizing different sequence datasets [6, 9]. The phylogenetic relationship of the unique Theropogon is still uncertain (Fig. 7). Meng et al. [6] proposed a close relationship between Theropogon and Ophiopogoneae utilizing transcriptome sequencing and coalescent analyses, but did not receive strong support. However, the plastid phylogeny from Ji et al. [11] supported the close relationship between Maianthemum and Theropogon, both of them share with free stamens. Considering that the chloroplast genome is maternally inherited, the low resolution and conflict relationships may be attributed to incomplete lineage sorting, convergent evolution, different evolutionary rates, and hybrid introgression [6]. Therefore, further works based on extensive samples and multidisciplinary data are needed to understand the phylogenetic position and their relationships with Convallarioideae.