Genome size and gene identification
The size of the E. ophiuroides cp genome was found to be 139,107 bp, similar to those cp genomes in Panicoideae subfamily, which range from 138 Kb in Setaria viridis [19] to 141 Kb in Saccharum offcinarum [20], but larger than those of other sequenced cp genomes in Chloridoideae, Pooideae, and Oryzoideae subfamilies with not more than 137 Kb in length (Supplementary Table 5). Since the average size of publicly available Poaceae cp genomes is 137,091 bp [21], E. ophiuroides is of average size within Panicoideae and of large size within Poaceae (Gramineae). The cp DNA of E. ophiuroides, like that of most angiosperms, is circular with a typical quadripartite structure containing a pair of IRs separated by LSC and SSC regions. The overall AT content of the E. ophiuroides cp genome was 61.6%, which is similar to that of most Gramineae plants (~61%, Supplementary Table 5).
The gene and intron contents in the E. ophiuroides cp DNA are basically identical to those of rice [22, 23], wheat [24], maize [25], sorghum [26] and other grasses [21, 27–29], with 77 protein-coding genes, 30 tRNA genes and four rRNA genes. Among the 111 unique genes, 14 contain one intron (six tRNA and eight protein-coding genes) and two (rps12 and ycf3) possess two introns. For all identified genes, 59 fragments are related to self-replication and 44 genes are associated with photosynthesis. Of the 44 photosynthesis related genes, five genes encode photosystem I components (psaA, B, C, I, J), 15 genes are related to photosystem II, and six genes (atpA, B, E, F, H, I) are responsible for ATP synthase and the other 18 genes encode electron transport chain components. A similar pattern of protein-coding genes is also present in Oryza sativa [30], Oryza glaberrima [31] and Oryza minuta [23].
Repeat sequence
The nucleotide sequences of most organism genomes contain many different types of repetitive sequences, such as short tandem repeats, interspersed repeats or spaced repeats. These repeat elements are either dispersed throughout the genome or within a short region of the genome [32]. The mismatching on slip chains and inappropriate recombination of repetitive sequences may lead to the occurrence of sequence variation and DNA rearrangement [33, 34]. Interspersed repetitive sequences (IRS) are a kind of repeats interspersed in genome DNAs and are potential resource to revealing gene rearrangements and losses during evolution [35, 36]. It usually includes forward, palindromic, reverse and complement repeats. In the present study, many forward and palindromic repeats, and a few reverse repeats were detected in E. ophiuroides cp genome sequences, and most of them were distributed in LSC regions of the genome. Similar findings were also reported in other plant species, such as Swertia mussotii [37], Oryza minuta [23]. This reflects the common characteristics of the IRSs in most of plant cp genomes.
SSRs, also called microsatellites, are known to be more informative and are very abundant and evenly distributed in angiosperm plastomes [38]. Because of their abundance, high rate of polymorphism, ubiquitous distribution throughout the genome, and high extent of allelic diversity, SSRs have been extensively used as versatile DNA-based markers in plant genetic and genomic research [39]. The motif type, length and abundance of SSRs are the main characteristics of microsatellites [40]. Besides complex SSRs, five types of perfect SSRs (mono-/di-/tri-/tetra-/penta- nucleotide repeats) were detected in the E. ophiuroides cp genome sequences. The most abundant SSR motif was mononucleotide repeats followed by trinucleotide and tetranucleotide repeats in detected SSRs. This result is not completely consistent with other findings that showed mono- and di-nucleotides are the most frequent SSR types in plant cp genomes [41–43], but is consistent with the report in Lythraceae [44] and Magnolia polytepala [12], and is also in accord with the finding of SSR mining from the E. ophiuroides RNA-seq data although mononucleotide repeat was omitted in that study [45]. Whether mononucleotide SSRs or polynucleotide SSRs detected in the present study, most of them were rich in A/T content. This is consistent with the existing chloroplast SSR reports [46–48].
IR contraction and expansion
IRs is prominent feature of most angiosperm cp genomes. Expansion and contraction of IR region boundaries is the main reason for size variations in the cp genome and plays an important role in species evolution [49]. In the present study, a detailed comparison on four junctions (Fig. 3), i.e., JLA (junction line between LSC and IRa), JLB (junction line between LSC and IRb), JSA (junction line between SSC and IRa) and JSB (junction line between SSC and IRb), between the two IRs (IRa and IRb) and the two single-copy regions (LSC and SSC) was performed among E. eriopoda, E. ciliaris, S. bicolor, Z. mays, S. italica, O. sativa, B. distachyon with regard to E. ophiuroides by carefully analyzing the exact IR border positions and adjacent genes. The IR region of E. ophiuroides was 22,230 bp in length, which was in medium length of the nine compared species from 20,804 bp to 22,783bp. This implies that some IR expansion and contraction may occur in the E. ophiuroides cp genome. JLA is between rps19 and rpl22, and JLB is located between rps19 and psbA in all eight Gramineae species. Both of the distances between rps19 and JLA, between rps19 and JLB are 35 bp in all three Eremochloa species, S. bicolor and Z. mays, which are shorter than that in other three Gramineae species; the distance between rpl22 and JLA in three Eremochloa species is shorter than that in S. bicolor and Z. mays, but is longer than that in the other species, while the distance between psbA and JLB in three Eremochloa species is longer than that in the other Gramineae species. The ndhF gene traverses the SSC and IRa regions, with 29 bp located in the IRa region for all the C4 plants including three Eremochloa species, S. bicolor , Z. mays and S. italic, but it is located in the SSC region for C3 plants of O. sativa and B. distachyon revealed in the present study, or of O. minuta reported by Asaf et. (2017)[23]. The ndhH gene traverses the SSC and IRb regions, with approximately 1,181 bp located in the IR region and only 1 bp in the IRb region for all species except for O. sativa and B. distachyon. This is accord with most reported findings in Gramineae plants [23]. This hints that variation in JSA border caused by IR expansion or contraction might result in the difference between C3 and C4 plant cp genomes. Our results also demonstrated that size variation of cp genomes resulted from IR contraction and expansion is a common feature during evolution of Gramineae plants, although structural organization and gene order of Gramineae cp genomes are highly conserved [50].
Phylogenetic analysis
The tribe Andropogoneae includes over 1,200 species in ca. 90 genera, and is a primary component of grasslands and savannahs that dominate tropical and subtropical regions throughout the world [51, 52]. Recently, a number of phylogenetic and evolutionary studies have been implemented for the tribe Andropogonodae using complete chloroplast genomes [52–55]. Although E. ophiuroides is an important member in genera Eremochloa of the tribe Andropogoneae, it has not been included in these studies, which restricts illuminating its evolutionary relationships to other Andropogoneae species. Our molecular phylogenetic tree based on sequences of complete cp genomes revealed that E. ophiuroides was closely related to E. ciliaris and E. eriopoda, and their placement in a clade with Mnesithea helferi is highly supported with bootstrap values of 100% within the subtribe Rottboelliinae (Fig. 5). This is congruent with the traditional morphology-based taxa of Rottboelliinae, indicating that the classification of subtribe Rottboelliinae is generally reasonable.
In addition, from our results, the Rottboelliinae, Saccjaromae, Sorghinae and Andropogoninae are typically monophyletic groups, which reflect the agreement between molecular phylogeny and traditional morphology-based taxonomy. However, some non-monophylies of subtribes were recognized in the current molecular phylogeny. In the present study, Germainia capitata (Germainiinae) was placed as sister to Pogonatherum paniceum (Incertae sedis), Dimeria ornithopoda (Dimeriinae) as sister to Eulaliopsis binata (Saccharinae), and Rottboellia cochinchinensis (Rottboelliinae) as sister to Coix lacryma-jobi (Coicinae), which are congruent with previous results for these species [52–55]. Another typical non-monophyletic area in the tree is the placement of Heteropogon triticeus and Cymbopogon flexuosus (two species in Anthistiriinae) in a clade with Andropogoninae species, and the similar result has actually been reported [52]. However, it is worth mentioning that Sorghastrum nutans and Eulalia aurea were not clustered as sister clades in the current study, which is incongruent with previously reported results [53–55]. This is mainly due to the fact that more extensive species (50 complete cp genome data of 47 different species) in the tribe Andropogoneae were used for phylogenetic analysis in the present study. Considering the same monophyletic clades cclustered between Rottboellia cochinchinensis and Coix lacryma-jobi, Germainia capitata and Pogonatherum paniceum, Dimeria ornithopoda and Eulaliopsis binata, and the different monophyletic clades formed from Eulalia aurea and Eulaliopsis binata displayed in this study, combined with previously reported phylogenetic relationships between these species [53–55], future more sampling with better balancing of ingroup Rottboelliinae, Coicinae, Germainiinae, Incertae sedis, Dimeriinae, Saccharinae should be considered so as to better address questions of subtribal monophylies.