Severe shrinkage of the A. indica plastid genome and evolutionary fates of the lost plastid genes
The complete plastid genome of A. indica is 86,212 bp in length, highly reduced relative to the size of most other angiosperms. It has a typical quadripartite structure, with 22,301 bp of the LSC region, 529 bp of the SSC region, and 31,691 bp each of the IR regions (Figure 1). AT content of this plastid genome was 65.64%. Based on the DOGMA and GeSeq annotation, the plastid genome of A. indica contains 48 putative intact genes and three pseudogenes. These intact genes contain 18 tRNA genes, 4 rRNA genes, 8 rpl genes, 12 rps genes and 6 other genes, namely, ycf1, ycf2, accD, matK, infA and clpP (Table 1). The three pseudogenes are ψatpA, ψatpI and ψndhB. ψatpA and ψatp genes in the LSC region of A. indica plastome became pseudogenes because of being truncated at the 88th codon and a premature stop codon at the 32nd codon, respectively. ψ internal stop codon at the 53rd codon.
The SSC region in plastome of A. indica shows a severe reduction in size and only two genes, rpl15 and trnL-UAG, were found in this region (Figure 1). The two IR regions have undergone expansions towards both the LSC and SSC regions. In the chloroplast genomes of the autotropic relative Lindenbergia philippensis and other autotropic plants, an intact ycf1 gene usually spans the IR and SSC regions, and rps8, rpl14, rpl16, rps3, rpl22 and rps19 genes were in the LSC region. Whereas, in A. indica plastome, there is an intact ycf1 gene in each of the IR regions, and rps8, rpl14, rpl16, rps3, rpl22 and rps19 genes all shift into the IR regions.
Gene contents in the plastomes of A. indica, four holoparasitic species including Cistanche deserticola, Orobanche austrohispanica and Epifagus virginiana from Clade III, and Lathraea squamaria from Clade V, and the autotrophic relative L. philippensis in Orobanchaceae, were compared (Table S1). Compared with L. philippensis, there is substantial loss of genes in the A. indica plastid genome. Ten ndh (ndhA, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ and ndhK) genes were lost, and ndhB gene became All five psa (psaA, psaB, psaC, psaI and psaJ) and 15 psb (psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT and psbZ) genes involved in photosystem I and photosystem II, were lost. Also, all six pet (petA, petB, petE, petF, petH and petI) genes, which encode cytochrome b6/f complex subunits with function in photosynthetic electron transport, were missing. In addition, four atp (atpB, atpE, atpF and atpH) genes encoding F-type ATPase subunits, four cemArbcL (ycf3 and ycf4) were lost as well.Orobanche austrohispanica and Epifagus virginiana Clade III. Although most of these genes were not lost in the plastomes of Lathraea squamaria from Clade V and Cistanche deserticolaClade III, many of them became pseudogenes (Table S1).
To study the evolutionary fates of the lost plastid genes in A. indica, we identified their genomic positions through extracting genome skimming reads matched with the reference plastomes of 64 plant species and other analyses (see Methods). A total of 339 contigs with length ≥ 150 bp were assembled from the extracted reads, 76 of them were annotated as fragments of plastid genes lost in the A. indica plastid genome. These 76 fragments, ranging from 150 to 3,086 bp in length, represent 27 lost plastid genes, including four atp genes (atpA, atpB, atpE and atpF), seven ndh genes (ndhA, ndhB, ndhD, ndhE, ndhF, ndhH and ndhJ), four psa genes (psaA, psaB, psaC and psaI), six psb genes (psbA, psbB, psbC, psbD, psbE and psbJ), three rpo genes (rpoB, rpoC1 and rpoC2), and one gene each for petB, rbcL and ycf3 (Table S2). In addition, we found that 15 fragments representing nine genes in the A. indica plastid genome were also transferred to mitochondrial and/or nuclear genomes and exist in small fragments. Among these genes, seven (atpA, ndhA, ndhB, ndhF, petB, rpoB and rpl23) were transferred to both the mitochondrial and nuclear genomes, 11 (ndhD, ndhE, ndhH, ndhJ, rpl14, rpl16, rpl2, rps12, rps14, rps3 and rps4) were transferred to only the mitochondrial genome, and 18 (atpB, atpE, atpF, psaA, psaB, psaC, psaI, psbA, psbB, psbC, psbD, psbE, psbJ, rbcL, rpoC1, rpoC2, ycf3 and ycf2) were transferred to only the nuclear genome based on their sequencing depth. All these fragments were not detected in the transcriptomes of multiple tissues, suggesting that they were non-functional.
Multiple structural rearrangements in the plastid genome of A. indica relative to its autotrophic relative
With Mauve 2.4.0, sequence alignment for the plastomes of A. indica (Clade VI) and five other Orobanchaceae species mentioned above was shown in Figure 2. We identified nine locally co-linear blocks (LCBs) for these six species, and A. indica plastid genome has undergone two inversions relative to the chloroplast genome of L. philippensis. One inversion is located in the LSC region and contains an intact accD gene, and the other contains the intact SSC region and IRB region. Compared with the L. philippensis chloroplast genome, there were no inversions in the plastomes of Lathraea squamaria, Cistanche deserticola and Epifagus virginiana, while there were two distinct inversions in that of Orobanche austrohispanica (Figure 2).
Relaxed purifying selection of A. indica plastid genes
A total of 20 protein coding genes shared among the seven species in Orobanchaceae, including 10 rps genes, 7 rpl genes, and accD, infA and matK genes were used for phylogenetic analysis. The maximum likelihood tree was strongly supported, with bootstrap values of all branches being 100 (Figure S1). Three Striga species were clustered into one clade, and Buchnera americana was sister to them. Aeginetia indica was sister to the clade consisting of the former four species. Non-synonymous (dN)/synonymous (dS) substitution rate ratio (ω) can be considered as an indicator for selection pressure. Two-ratio model (M2) was first compared with one-ratio model (M0). ω values of all genes but rpl20 and rps18 in the parasitic plant branch were larger than those of the nonparasitic plant branch (Table S3), and the likelihood ratio test showed that M2 is significantly better than M0 at nine genes, i.e. accD, infA, rpl22, rps11, rps14, rps19, rps2, rps3 and rps7, suggesting that these genes were under relaxed purifying selection in parasitic plants. Using three-ratio branch model (M3), we found that hemiparasitic species had higher or much higher ω than holoparasitic species at 13 of 18 genes (ω values of the remaining two genes are not available), while holoparasitic species had slightly higher ω than hemiparasitic species at only five genes (Table S3). This suggests that protein-coding genes retained in the plastome of A. indica still play important functional roles rather than experiencing more relaxed selective pressure than hemiparasitic species.
Degradation of the photosynthesis pathway in A. indica revealed by transcriptome analysis
We obtained 21.05, 19.04, 18.34 and 18.02 Gb clean reads for four tissues, i.e. flower, sepal, fruit, and stem, respectively. By de novo assembly of read data from the four tissues, we obtained a total of 205,380 transcripts, among which 153,986 were extracted as unigenes. The average length and N50 of these unigenes were 623.18 and 880 bp, respectively. There were 47,480 ORFs (Open Reading Frames) predicted from all unigenes by TransDecoder, and 42,007 of them could be annotated in Swiss-Prot database, among 42,007 Swiss-Prot annotations, 8,466 could be assigned to 131 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.
To confirm the reliability of gene expression obtained from transcriptome sequencing, the expression of 10 genes were also examined with qRT-PCR. The expression levels of these genes obtained from the two approaches were of relatively high correlation (Pearson correlation coefficient R2 = 0.71; Figure S2), suggesting that gene expression obtained from transcriptome sequencing was reliable.
The photosynthesis pathway (ko00195) from the KEGG pathway database contains 63 genes (30 plastid genes and 33 nuclear genes). In the A. indica plastome, genes that encode proteins involved in photosystem I and II, cytochrome b6f complex, and photosynthetic electron transport are completely lost. The only two F-type ATPase related genes (atpA and atpI) in its plastome are pseudogenes. Based on the transcriptome analysis, only 14 nuclear unigenes enriched in the photosynthesis pathway had expression (Table S4). The 14 genes include one gene encoding PSII 6.1 kDa protein, seven genes encoding proteins implicated in photosynthetic electron transport and six genes encoding components of F-type ATPase (Figure S3). Expression of other genes in this pathway was not detected, indicating that these genes were either lost or non-expressional. The results from plastome and transcriptome analyses indicate that the photosynthesis pathway in A. indica was completely lost.
The porphyrin and chlorophyll metabolism pathway (ko00860) is complicated in plants. Porphyrins are intermediates of heme and chlorophyll, and heme is required for chlorophyll biosynthesis [20]. In the pathway from glutamate to protoporphyrin IX, the expression of eight genes (HemA, HemB, HemC, HemD, HemE, HemF, HemL and HemY) were observed in the transcriptome of A. indica (Figure S4). However, because of the absence of expression of divinyl chlorophyllide a 8-vinyl-reductase [EC:1.3.1.75], which catalyzes divinyl protochlorophyllide to protochlorophyllide [21], the chlorophyll synthesis pathway appears to end at divinyl-proto-chlorophyllide production in A. indica (Figure S4). Obviously, the chlorophyll synthesis pathway is not complete at the later stage and chlorophyll can not be synthesized in A. indica.