Plastomes of Paphiopedilum
We obtained 66 full plastome sequences. The other 11 plastome sequences had one or two gaps located in regions of high AT content within the three intergeneric regions (trnS-trnG, trnE-trnT, and trnP-psaJ). The obtained plastid genome sequences were deposited in GenBank (accession Nos. MN587749 – MN587825) (Table S1). The mean coverage depth of the sequenced plastomes was over 3000-fold (Table S1). We included four published plastome sequences, namely, P. armeniacum [8], P. dianthum [10], P. malipoense [18], and P. niveum [9], in subsequent analyses, yielding a total of 81 genomes of the genus Paphiopedilum. The downloaded plastome sequence of Phragmipedium longifolium [8] was used as the outgroup.
The genome size of Paphiopedilum ranged from 152,130 bp in P. tigrinum to 164,092 bp in P. emersonii (Table S1). P. emersonii had the largest number of genes and one of the shortest SSC regions (660 bp). The plastid genome of the genus shows typical quadripartite structure, with two identical copies of IR separated by a LSC region and an SSC region (Fig. S1). Compared with the plastome of other angiosperms, Paphiopedilum has numerous expansions of IRs. The length of the IR region was enlarged to 31,743 bp – 37,043 bp, with the length of the IR region of eight samples even larger than 35 kb (Table S1), while the length of the SSC region contracted to 524 bp – 5916 bp (Fig. 1, Table S1). In addition, the SSC regions of Paphiopedilum are hotspots for gene transfer, loss, and rearrangement (Fig. 2–4). The size variation in the SSC region mainly results from the transfer of typical SSC genes to IR regions and the loss/pseudogenization of ndh genes. Subg. Parvisepalum has a relatively larger SSC region than the other species in the genus (Fig. 1–2, Table S1).
The gene order was conserved and composed of 127 to 134 genes, including 76 to 81 protein coding genes, 38 to 39 tRNA genes, eight rRNAs, three to eight pseudogenes, and 20 to 25 genes duplicated in the IR region (Table S1-S2). In addition to the duplication of the IR regions, trnG-GCC duplicated in P. exul and P. aff. exul, while trnQ-UUG duplicated in P. charlesworthii, P. tigrinum, and P. barbigerum var. lockianum. The two copies of trnG-GCC have one nucleotide variation, and the two copies of trnQ-UUG have eight nucleotide variations. Gene density ranged from 0.78 to 0.84, and P. tigrinum had the shortest plastome size and the highest gene density (Table S1). The GC content of the plastome genome ranged from 34.7–36.3%, and the GC content of the protein-coding genes ranged from 29.7–46.1%.
In addition, we found 17 genes containing introns, including six tRNA genes (trnA-UGC, trnG-UCC, trnL-UAA, trnI-GAU, trnK-UUU, and trnV-UAC) and 11 protein coding genes (atpF, clpP, ndhB, petB, petD, rpl2, rpl16, rpoC1, rps12, rps16, and ycf3). Eight of the protein coding genes contain one intron, while three of them (clpP, rps12, and ycf3) contain two introns (Table S2).
The LSC/IRb boundary is relatively stable. While the LSC/IRb junction is on rpl22 in most species (76 of 81 samples), the LSC/IRb junction is on rps19 in P. concolor and P. wenshenanse × P. bellatulum, between rpl22 and rps19 in P. rhizomatosum, and between rps19 and trnH-GUG in P. hirsutissimum (Fig. 4). Compared to the LSC/IR boundaries, the IR/SSC boundaries of Paphiopedilum varied among species (Fig. 4). Substantial variation in the SSC/IR boundary was mainly in subg. Parvisepalum. In most other samples (56 of 81 samples), one end of the SSC/IR junction was located in the intergeneric spacer region trnL-ccsA, near trnL-UAG, whereas the other junction of SSC/IR was located on the ccsA gene (Fig. 2).
The contraction of the SSC region resulted in the typical SSC genes being transferred to the IR region. One to six genes from the SSC region were transferred to the IR region. For example, ycf1 was transferred to the IR region in all the sequenced samples, while ΨndhD, psaC, and rps15 were incorporated into the IR region in most species. The gene ccsA expanded in the IR region occasionally, and trnL-UAG was transferred to the IR region in P. delenatii, P. dianthum, and P. parishii (Fig. 2).
The genomic comparison demonstrates that the SSC region of Paphiopedilum differs greatly in gene content, gene order, and gene orientation (Fig. 2, 4, S2). The SSC regions of most species contain trnL, rpl32, and partial ccsA, while the SSC regions of five species are on the brink of losing, P. appletonianum, P. barbigerum, P. emersonii, P. hirsutissimum, and P. villosum only contain trnL-UAG in this region (Fig. 2). In addition, the genes psaC and ΨnadD were preserved in the SSC region in six samples of subg. Parvisepalum (Fig. 2). In addition, there might be two copies of SSC with different directions in the same species [19]. Wang and Lanfear [20] used long-read sequencing to test the structural heteroplasmy in land plants and found the presence of chloroplast structural heteroplasmy in most land plant individuals, so the direction of the SSC region was not considered. Based on gene content and gene orientation, the SSC regions were classified into twelve types (Fig. S2), and type Ⅷ is the dominant type (56 of 81 samples) (Fig. 2). Type Ⅰ and type Ⅱ are identical in gene content but differ in the gene direction of rpl32 and trnL-UAG (Fig. S2). Type Ⅸ and type Ⅹ are also identical in gene content, but in type IX, the two genes run in opposite directions, while in type X, the two genes run in the same direction (Fig. S2). Type Ⅺ and type Ⅻ both have trnL-UAG, but one nucleotide in type Ⅻ has shifted to the IR region. Subg. Parvisepalum has six types, whereas sect. Cochlopetalum has only one type (type Ⅷ) (Fig. 2). When the SSC types are plotted on the phylogenetic tree, the result shows that the SSC types are not lineage-specific and that even the closely related species have distinct SSC types, such as species in subg. Brachypetalum and subg. Parvisepalum (Fig. 2). Surprisingly, the gene content of SSC regions has intraspecies variation. For example, one sample of P. barbigerum contains trnL-UAG, while the other sample contains trnL-UAG and rpl32. The two samples of P. appletonianum also have different SSC types (Fig. 2).
Gene gain and loss in Paphiopedilum samples was also analysed. Some of the gene losses are shared throughout the genus (e.g., some ndh genes), while other gene losses are lineage specific (Table S3). Most of the ndh genes were pseudogenized (ΨndhD, ΨndhJ, and ΨndhK) or lost (ndhA, ndhC, ndhE, ndhF, ndhG, ndhH, and ndhI) from the Paphiopedilum species plastome, except for ndhB. Most of the samples (76 of 81) sequenced in this study retained an intact copy of ndhB. In addition, the complete open reading frame of ndhJ was preserved in 23 samples, including four samples of sect. Cochlopetalum and 19 samples of sect. Paphiopedilum (Table S3). The genes ndhC and ndhK were preserved as pseudogenes in subg. Parvisepalum and sect. Concoloria but lost in the other species (Table S3). In particular, in five species sequenced (P. barbatum, P. dayanum, P. platyphyllum, P. sugiyamanum, and P. tigrinum), all 11 of the ndh genes were lost or pseudogenized (Fig. 2, Table S3).
In addition to the pseudogenes found in the ndh genes, we found that premature termination induced pseudogenization of other protein coding genes (cemA and ycf15). Most of the annotated copies (38 of 48) of cemA are preserved as pseudogenes, while all the annotated copies of ycf15 were retained as pseudogenes (Table S3). The pseudogenization of cemA is mainly due to the slippage of poly structure and the appearance of premature stop codons, while the pseudogenization of ycf15 is due to the appearance of more than one premature stop codon.
The plastomes of Paphiopedilum also show multiple structural rearrangements. We found widespread structural variation in the SSC regions, especially in subg. Parvisepalum, including the inversion and recombination of the SSC genes and the shift of the IR/SSC boundary (Fig. 2–4). In particular, we found a 47 kb inversion spanning from petN to clpP in P. fairrieanum, which is absent in other species of Paphiopedilum (Figs. 3, S1).