Sequencing data quality control and statistics
The original data volume of the next-generation sequencing (NGS) was 9037.3 Mb, and the effective data volume was 9025 Mb after mass cutting the original data. The percentage of bases with a Phred value greater than 20 was 99.26% of the total bases. The percentage of bases with a Phred value greater than 30 was 97.17%. The GC content was 36.35%.
The number of subreads after filtration was 685,714 in the Jin A SMRT sequencing, the size of the subread data was 967,209,861 bp, and the largest subread length was 38,029 bp. The subread length N50 was 1327 bp. The N90 length of the subreads was 983 bp, while the average length of the sample reads was 1411 bp. These results indicated that the constructed database and sequence were suitable for subsequent chloroplast genome assembly and bioinformatics analysis.
Assembly and characteristics of the CMS Jin A chloroplast genome
Chloroplast DNA in higher plants is a double-stranded covalent closed ring molecule, and its length varies with species. According to the genome assembly, the chloroplast genome length was 160,042 bp in CMS Jin A (Fig. 1). The genome consisted of 131 genes, including 112 functional genes (79 protein-coding genes, 29 tRNA genes and 4 rRNA genes) and 19 repeat genes (Table 1).
The base composition and gene distribution of each component region (LSC/SSC/IR) of the chloroplast genome were determined and summarized (Table 2), and there were four typical regions: LSC (55.37%), SSC (12.63%) and two IRs (15.99%).
Functional analysis of the genome revealed that most of the genes were related to photosystem and ATP synthesis (Table 3). Five genes encoded PSI subunits, 15 genes encoded PSII subunits, 12 genes encoded NADH dehydrogenase, 6 genes encoded cytochrome b/f, 6 genes encoded ATP synthase and 1 gene encoded Rubisco large subunits. In addition, there were 9 genes encoding ribosome large subunit proteins, 12 genes encoding ribosome small subunit proteins, 4 genes encoding DNA-dependent RNA polymerase, 4 genes encoding ribosomal RNAs and 28 genes encoding transfer RNAs. Other identified genes included the mature enzyme-encoding gene matK, the protease gene clpP1, the envelope protein-encoding gene cemA, the acetyl-cocarboxylase gene accD, and the cytochrome synthesis gene ccsA. In addition, 5 genes whose functions were unknown were identified. Five reference databases were used for gene annotation, among which the Swiss-Prot database was used to annotate the largest number of genes (Table 4), and a total of 86 protein-coding genes (including 79 genes and 7 duplicate genes, all encoding proteins) were annotated.
Comparative analysis of the Gossypium hirsutum chloroplast genome
The cytoplasmic background of CMS Jin A is the Gossypium hirsutum, so we chose to sequence the chloroplast genome of Gossypium hirsutum as the reference genome. A total of 29 chloroplast genes with single nucleotide (SNP) differences were obtained by sequence comparison of the chloroplast-coding protein genes between CMS Jin A and Gossypium hirsutum [34]. These DEGs mainly included ATP synthase subunit, NAD (P) H-quinone oxidoreductase subunit, and photosystem complex subunit genes. The results of the differential amino acid sequence comparison of the proteins encoded by genes are shown in Table S1. The trnfM-CAU gene did not exist in the chloroplast genome of the sterile line Jin A. The trnfM-CAU is a differential hotspot gene in the chloroplast genome and plays an important role in the phylogenetic evolution of different species [35]. The trnfM-CAU located in the large copy region of chloroplasts in higher plants, and the deletion of trnfM-CAU indicated a change in the chloroplast genome composition in the sterile line Jin A, which may have affected normal function of the chloroplast genes.
Analysis of ATP synthase subunit gene expression during the anther development in CMS Jin A plants
ATP synthase plays an important role in cellular energy conversion and transfer. To explore the role of changes in chloroplast ATP synthase subunits in CMS Jin A, we first used NCBI CDD to search for and analyze the protein sequence domains encoded by three ATP synthase subunit genes, namely, atpB, atpE and atpF, and found that SNPs did not cause changes in protein domains. Next, we measured the expression levels of the atpB, atpE and atpF genes in CMS Jin A, the maintainer line Jin B and the three-line hybrid F1 (Fig. 2). The results showed that the expression levels of atpB, atpE and atpF in the sterile Jin A line were significantly lower than those in the maintainer line at the microspore abortion stage. Moreover, the expression levels of these three genes in the fertile restored F1 line were significantly greater than those in the Jin A sterile line. Therefore, we hypothesized that the differences in the transcription levels of these genes led to the inhibition of ATP synthesis.
ROS detection in atpB, atpE, and atpF silenced cotton plants
To determine the functions of atpB, atpE and atpF, gene-silenced recombinant vectors were constructed, the recombinant plasmids were transformed into Agrobacterium tumefaciens GV3101, and cotton cotyledons were injected to obtain atpB, atpE, and atpF silenced cotton plants. The plants whose expression decreased the most were selected (Fig. 3).
Using negative control plants with an empty pTRV2 vector, after 15 days of silencing, the cotton leaves were stained with Nitrotetrazolium blue chloride (NBT) and 3,3'-diaminobenzidine (DAB), and the ROS content of the silenced plants was determined (Fig. 3a-c). The results showed that the accumulation of O2−• of the leaves of the atpE and atpF silenced plants was increased significantly compared to that of the negative control plants (Fig. 3b). There was no significant difference in H2O2 between the experimental group and the control group (Fig. 3c).
The results of the determination of 1O2 in leaves showed that the fluorescence color of leaves from plants silenced by atpE and atpF was deeper compared with the negative control, indicating significant accumulation of 1O2 (Fig. 4). There was no difference in the ROS content between the atpB silenced cotton plants and the control plants.