P. davidiana mitogenome features
The P. davidiana mitogenome was sequenced, assembled, and annotated (Table S1). The graph-based mitochondrial genome of P. davidiana was assembled in 15 linear segments (molecules 1-15) with a total length of 375,671 bp and the GC content was 45.5%, as shown in Figure 1. The segments ranged in size from 680 bp to 65,093 bp, with similar depth of coverage (Table S2 and Figure 2). The P. davidiana genome sequence was submitted to the GenBank database (SRR25468515).
There were 64 genes annotated in the P. davidiana mitogenome, including 39 protein-coding genes (PCGs), 22 tRNA genes, and 3 rRNA genes (Figure 2 and Table 1). Among the 39 PCGs, six contained introns (nad1, nad2, nad4, nad7, ccmFc, and rps3). Interestingly, two-copy genes (nad1, nad2 and cox1) and a three-copy gene (nad5) were found. Additionally, we also observed five tRNA genes and one rRNA gene located in repeat sequences (trnN-GTT, trnM-CAT, trnP-TGG, trnH-GTG, trnW-CCA, and rrn5) (Fig. 1).
Codon usage analysis of PCGs
Codon usage analysis of 39 PCGs was performed. The codon usage of each amino acid is shown in Figure 3. PCGs in P. davidiana had a total length of 30,234 nucleotides. The typical ATG start codon was found in the most PCGs; while nad4L and nad1 were ACG, which may be altered by C-to-U RNA editing (Table 2). As with the status in other mitogenomes, TAA, TGA, and TAG served as stop codons.
The relative synonymous codon usage (RSCU) of 39 PCGs was also analyzed. The 39 PCGs comprised 30,234 bp encoding 10,078 codons, excluding termination codons. The RSCU >1 of Codons were considered to be used preferentially by amino acids. All RSCU values of NNT and NNA codons, excluding Ile (ATA), Leu (CTA), and Thr (ACA), were higher than 1.0. The result indicates the existence of a strong As or Ts bias at the third codon position in P. davidiana mitogenomes. The phenomenon also exists in other mitogenomes of plant species [15].
Analysis of synonymous and nonsynonymous substitution ratios
The nonsynonymous to synonymous substitution ratios (Ka/Ks) between any two species among six species (Malus domestica, Sorbus aucuparia, Rhaphiolepis bibas, Pyrus betulifolia, Fragaria orientalis, and P. davidiana) were calculated based on the 32 shared genes in Rosaceae (Table S5). According to Figure 4, the mean values of pairwise Ka/Ks of ccmFn, rps4, and adh4 were higher than those of other genes. The result suggests that these genes have been under positive selection in the six plant species during evolution.
RNA editing sites Prediction in PCGs
RNA editing events, which are posttranscriptional processes, are enriched in metagenomes [32]. A total of 502 potential RNA editing sites were identified on 32 mitochondrial PCGs according to predictions from the online website PREP suit (http://prep.unl.edu/)(cutoff value = 0.2). All of them were observed to be C-to-U base editing. As shown in Figure 5, the nad4 gene encoded the most RNA editing sites (40 sites), whereas nad4L and rps7 only encoded one. Among the 502 predicted sites, the results were hydrophilic to hydrophobic (13.15%; 66 sites), hydrophobic to hydrophilic (47.01%; 236 sites), hydrophilic to hydrophilic (7.97%; 40 sites), hydrophobic to hydrophobic (31.27%; 157 sites), and hydrophilic to stop (0.60%, 3 sites).
Analysis of repeats in the P. davidiana mitogenome
Repeated sequences are widespread in the mitochondrial genome, mainly including SSRs, tandem repeats, and dispersed repeats [33]. SSRs are tandem-repeated motifs of one to six bases that are usually used as molecular markers in studying and identifying species, and in analyzing genetic diversity [34]. In the present study, a total of 132 SSRs were identified in the P. davidiana mitogenome, including 49 (37.12%) mono-, 20 (15.15%) di-, 11 (8.33%) tri-, 42 (31.82%) tetra-, 7 (5.30%) penta-, and 3 (2.27%) hexanucleotide repeats (Table 2). More than 68.94% belonged to monomers and tetramers among the 132 SSRs. In addition, the results showed that 69.39% of monomers were A/T contents. In previous studies, the proportion of A/T in the SSRs was contributed to that A/T in the whole mitogenomes [33, 35] .
In addition to the SSRs, 187 forward, 219 palindromic, and 20 tandem repeats were detected in the P. davidiana mitogenome. The whole repeats length was 11,574 bp, which accounted for 3.08% of the whole P. davidiana mitogenome. Most of the forward and palindromic repeats ranged from 31 to 50 bp, the longest one was 458 bp, whereas tandem repeats were shorter than 39 bp (Figure 6 and Table S3 and Table S4).
Phylogenetic and Synteny analysis
The largest co-linear blocks of 35.0 and 34.9 kb were both identified in the dot plot with P. avium and P. salicina, respectively (Figure 7). In addition, the co-linear blocks were not arranged in the same order, a larger number of homologous co-linear blocks were detected between P. davidiana and the closely related species (Figure 8). A total of 74 homologous co-linear blocks (>500 bp) were identified between P. davidiana and P. avium, the largest block was 29,172 bp. Similarly, a total of 85 homologous co-linear blocks (>500 bp) were identified between P. davidiana and P. salicina, with the largest block being 8728 bp in length. The results also indicated that the mitogenomes were extremely unconserved in structure.
The mitogenomes provides an opportunity to confirm plant phylogenetic positions. In the present study, to further explore the evolutionary relationships of P. davidiana mitochondria, 31 plant mitogenomes were downloaded from the GenBank database (https://www.ncbi.nlm.nih.gov/genome/browse/#!/overview/). The 31 conserved single-copy orthologous genes present in all 32 mitogenomes were selected to construct phylogenetic tree. Five monocotyledonous species were designated as the outgroup. As shown in Figure 9, 25 out of 27 nodes in the generated tree had bootstrap support values > 70%, and 17 nodes were supported 100%. The phylogenetic tree strongly supports (bootstrap support = 100%) the close phylogenetic relationship between P. davidiana and P. avium and P. salicina. Overall, the results provide a valuable foundation for future analyses of the phylogenetic affinities of Rosaceae species.
Chloroplast and nuclear genes to mitogenome transfer events
DNA fragment transfers are common events during plant evolution among nuclear and organellar genomes. The nuclear and cp genomes of P. davidiana were searched using its mitogenome sequences as queries, to further understand the sequence transfer event characteristics. The 303.0 kb sequences were obtained from nuclear genome transferred into the mitogenome (Fig. 10A). In addition, the greatest length was 19,266 bp, and they were mainly between 200 bp and 400 bp (Fig. 10B). The 19 complete genes (rps13, trnP-TGG, trnF-GAA, trnS-GCT, cob, rps14, rpl5, trnH-GTG, trnK-TTT, ccmB, trnN-ATT, trnK-TTT, ccmC, trnD-GTC, rps7, nad6, trnN-GTT, rps1, and rps4) were contained in the shared sequences.
The P. davidiana mitogenome sequence (375,671 bp) was approximately 2.37 times longer than the cp genome sequence (158,055 bp). The results showed that 16 fragments with a total length of 4,238 bp had migrated from the cp genome to the mitogenome in P. davidiana (Table 3). Three of these fragments were more than 500bp in length, and the longest one was 863bp (Figure 11). Eight intact cp genes (psaJ, petN, psbJ, trnD-GUC, trnN-GUU , trnW-CCA, trnP-UGG, and trnH-GUG) were identified in the fragments. Others were partial sequences of transferred genes (Table S6). The transferred genes would aided the movement of genetic material throughout Prunus.