Genome-wide identification and characterization of lncRNAs during anther development of three-line hybrid cotton
To identify the lncRNAs involved in cytoplasmic male sterility (CMS) and restoration of fertility during anther development, high-throughput sequencing was performed for the CMS line (A), maintainer line (B), and fertility restoration line (R), each with three biological replicates. In total, 910 million clean reads (284 million in A, 315 million in B, and 311 million in R) passed the quality filters and were retained for further analysis. Almost 86%, 87%, and 88% of the reads were aligned to the TM-1 reference genome for A, B, and R, respectively (Table 1). On the basis of the transcripts assembly, a total of 940,696 transcripts were identified. To identify putative lncRNAs, we first filtered out the single-exon transcripts located within 500 bp of other transcripts, then excluded short transcripts (length < 200 bp), and screened the transcripts based on expression level (FPKM ≥ 0.5 for multi-exon transcripts, or FPKM ≥ 2 for single-exon transcripts), and finally filtered out the known non-lncRNAs. After these basic filtering processes, 93,546 transcripts were retained as lncRNA candidates. We evaluated the coding potential of the remaining transcripts by means of CPC (coding potential calculator) and Pfam protein domain analyses (Fig. 1a). After the five-step filter process, 80,695 lncRNAs were identified during anther development (Table S2).
To describe explicitly the characteristics of lncRNAs, we compared the characteristics of lncRNAs and mRNAs in the following aspects (Fig. 1b). In transcript length, the lncRNAs length ranged from 200 to 18,412 bp, and more than 97% of the lncRNAs were smaller than 2000 bp in length, whereas the length of mRNAs ranged from 150 to 21,501 bp, but almost 85% of mRNAs were between 150 and 2000 bp in length. In exon number, lncRNAs had fewer exons than mRNAs on average, almost 83% of the lncRNAs contained one exon and 17% had multiple exons, whereas almost 26% of the mRNAs contained one exon and 74% had multiple exons. The overall open reading frame (ORF) length of lncRNAs was typically smaller than that of mRNAs. Comparison of FPKM distribution between lncRNAs and mRNAs showed that the FPKM of lncRNAs was lower than that of mRNAs during anther development (Table S2).
Identification and functional analysis of differentially expressed lncRNAs in A, B, and R lines
The following three comparisons of lncRNA expression levels were performed: A–B, which had the isogenic nuclear genomes (containing the recessive non-functional rf1 allele) but different cytoplasm and fertility; A–R, both of which had the same CMS-D2 cytoplasm but differed in fertility and Rf1 alleles; and B–R, both of which were isogenic and fertile but differed in cytoplasm and Rf1 alleles. A total of 18,755, 20,837, and 21,346 unique lncRNAs were specifically expressed in the A, B, and R lines, respectively, and 15,692 lncRNAs were expressed in common among the A, B, and R lines (Fig. 2a). On the basis of the expression level, lncRNAs with a greater than 2-fold change and p-value < 0.01 were considered to be differentially expressed in different samples. A total of 43,347, 44,739, and 46,431 differentially expressed lncRNAs were identified in the A–B, A–R, and B–R comparisons, respectively, and 1713 lncRNAs were differentially expressed in common among the A–B, A–R, and B–R comparisons (Fig. 2b) (Table S3). The differentially expressed lncRNAs represented a total of 67,021 non-redundant lncRNAs that were distributed among the 26 chromosomes of G. hirsutum, of which 33,142, 29,689, and 4190 differentially expressed lncRNAs were located in the A and D subgenomes and different scaffolds, respectively (Table S3). The distribution of differentially expressed lncRNAs for the three comparisons is shown in Fig. 2c.
Previous studies indicate that the Rf1 gene is located on chromosome Gh_D05, and the nearest flanking simple sequence repeat (SSR) markers to Rf1 are BNL3535 at a genetic distance of 0.049 cM and NAU3652 with a genetic distance of 0.078 cM [26, 38]. In the present study, a total of 2452 differentially expressed lncRNAs were identified on chromosome Gh_D05, of which 806 lncRNAs were differentially expressed in the A–R group. Furthermore, 86 lncRNAs were identified in the Rf1 region flanked by the two SSR markers, of which 65 lncRNAs were down-regulated and 21 lncRNAs were up-regulated in the R line compared with the A and B lines. For example, the RNA-seq data showed that TCONS_00779116, TCONS_00797453, TCONS_00796983, and TCONS_00797056 were up-regulated in the fertile R line, compared with the A and B lines, and the qRT-PCR results were approximately consistent (Fig. 2d).
It has been shown that lncRNAs are preferentially located close to genes that they regulate, and that lncRNAs might overlap with the promoter region and may regulate the expression profile of their target genes at the transcriptional or post-transcriptional level [6, 17, 39-41]. In the present study, to evaluate the function of these differentially expressed lncRNAs, we selected the genes located less than 10 kb from the differentially expressed lncRNAs as corresponding target genes and performed GO analyses among the A–B, A–R, and B–R comparisons (Fig. 3a) (Table S4). The GO terms oxidation–reduction process, regulation of hormone levels, and hormone metabolic process were the three most highly enriched terms in the A–B comparisons, whereas cell morphogenesis, cellular component morphogenesis, and oxidation–reduction process were the three most enriched terms in the A–R comparisons. Similarly, in B-R comparisons, the three most enriched terms were cell morphogenesis, cellular component morphogenesis, and single-organism biosynthetic process.
To identify lncRNAs associated with CMS or fertility restoration, differentially expressed lncRNAs between A–B and A–R comparisons were analyzed. In total, 43,347 and 44,739 differentially expressed lncRNAs were identified in the A–B and A–R comparison groups, respectively. Of these differentially expressed lncRNAs, 328 and 431 were unique in the A–B and A–R comparisons and thus might be involved in CMS or fertility restoration during anther development of cotton. In addition, we detected significant enrichment of GO terms involved in CMS or fertility restoration (p < 0.05). In the A–B comparison, we observed GO term enrichment for biological processes, including oxidation–reduction process (GO: 0055114), photosynthetic electron transport chain (GO: 0009767), regulation of hormone levels (GO: 0010817), cellular hormone metabolic process (GO: 0034754), and cytokinin metabolic process (GO: 0009690) (Fig. 3b). These results showed that the greatest difference between normal Upland cotton cytoplasm and sterile cytoplasm was enrichment in the cellular hormone metabolic process and oxidation–reduction reaction process. In the A–R comparison, the most significantly enriched GO terms among biological processes were cellular component morphogenesis (GO: 0032989) and small molecular biosynthetic process (GO: 0044283), of which cellular amino acid biosynthetic process (GO: 0008652) and tricarboxylic acid biosynthetic process showed significant enrichment (GO: 0072351) (Fig. 3c) (Table S4). These results indicated that differentially expressed lncRNAs may regulate functional genes involved in cellular component morphogenesis for fertility restoration in cotton.
Predicted interactions between lncRNAs and miRNAs during anther development
MiRNAs regulate gene expression at the post-transcriptional level by interacting with the complementary binding sites on target sequences, resulting in mRNA cleavage, decoy activity, and translation repression [15, 42]. A previous study indicated that lncRNAs may act as targets or eTMs by binding with miRNAs and thus inhibit the interaction between miRNAs and the target genes [18]. In the current study, we predicted the potential of lncRNAs as miRNA eTMs by integration of previous miRNA sequence data during anther development [25]. In total, two lncRNAs (TCONS_00342368 and TCONS_00148576) were predicted to be potential eTMs for five miRNAs, of which TCONS_00342368 was a putative eTM for ath-miR171c-5p, osa-miR171c-5p, and stu-miR171d-5p, and TCONS_00148576 was a putative eTM for ath-miR399b and cme-miR399d (Fig. 4a). The predicted miRNA binding sites and the bulge region in lncRNAs were identical among the different miRNAs in the same miRNA family. To analyze the sequence evolutionary conservation of these eTM lncRNAs, we aligned the sequences of the predicted eTM-binding sites for miR171 and miR399 from Arabidopsis and rice (Fig. 4b). The miRNA binding sequences were well conserved and the bulge region frequently varied among different species, consistent with previous studies of Arabidopsis and rice [18] and Brassica [19]. Further analysis is necessary to understand the role of the two lncRNAs predicted to be eTMs.
In addition, lncRNAs might have functions associated with a role as miRNA precursors [43]. Those lncRNAs that act as precursors of miRNAs might perform an indirect regulatory function through corresponding miRNAs. Moreover, differential expression of lncRNAs might result in the differential expression of corresponding mature miRNAs [15, 43]. In the present study, 63 lncRNAs were predicted to be putative precursors of 35 miRNAs belonging to 26 miRNA families (Table S5). Of these lncRNAs, 13 lncRNAs were identified as the putative precursor of multiple miRNAs, with 1 or 2 nt difference in the mature miRNA sequence. Interestingly, 19 of the precursor lncRNAs showed differential expression among the A, B, and R lines, of which five lncRNAs (TCONS_00600850, TCONS_00807084, TCONS_01123999, TCONS_01109996, and TCONS_01148734) showed a consistent expression pattern with the corresponding mature miRNA (gra-miR8753, gma-miR160b, ghr-miR7506, ghr-miR7511, and gra-miR8638). The results of qRT-PCR analysis showed that two lncRNAs (TCONS_00807084 and TCONS_01148734) and the corresponding miRNAs (gma-miR160b and gra-miR8638) showed a similar expression pattern in three-line hybrid cotton (Fig. 5a). Interestingly, gma-miR160b, which is derived from TCONS_00807084, was up-regulated in the A and B lines compared with the R line and may regulate an auxin response factor GhARF17 (Gh_D06G0360) (Fig. 5b). Thus, TCONS_00807084 and gma-miR160b might play critical roles in anther development by influencing the auxin regulatory pathway. These findings indicate the complex regulatory mechanisms by which lncRNAs and the corresponding mature miRNAs function during anther development, although the underlying regulatory network remains unclear.
The lncRNA–miRNA–mRNA regulatory networks between A, B, and R lines
To explore the regulatory networks of lncRNAs involved in CMS and fertility restoration, we selected the target genes of the differentially expressed lncRNAs and miRNAs based on the RNA-seq data and constructed a putative lncRNA–miRNA–mRNA regulatory network using Cytoscape software (Fig. 6) (Table S6). The networks were composed of 66 miRNAs, 161 lncRNAs, and 658 mRNAs (mRNAs regulated by miRNAs and lncRNAs), of which 58 lncRNAs regulated by 15 miRNAs and 77 lncRNAs regulated by 35 miRNAs were specifically differentially expressed in the A–B and A–R comparisons, respectively; in addition, 26 lncRNAs regulated by 16 miRNAs were in common among the A–B and A–R comparisons. These predicted target genes regulated by miRNAs and lncRNAs were divided into multiple groups. First, several genes showed critical roles in oxidation–reduction processes, of which a glutamyl-tRNA reductase (Gh_A08G0634), cupredoxin superfamily protein (Gh_D03G0412), and cinnamyl alcohol dehydrogenase (Gh_D11G3399) show oxidoreductase activity during anther development. Second, several genes were transcription factors and involved in cellular hormone response and metabolic processes. For example, bZIP (Gh_A01G1768), ARF (Gh_D06G0360), and SPL (Gh_A11G0706) transcription factors and SAUR-like auxin-responsive protein (Gh_A12G2237) and a gibberellin-regulated family protein (Gh_D02G0666) were regulated by miRNAs and lncRNAs involved in hormone response and metabolic processes. Third, several genes were functional proteins, including an ABORTED MICROSPORES (AMS) protein (Gh_D12G0328), nucleotide/sugar transporter family protein (Gh_A04G0407), and NB-ARC domain-containing disease resistance protein (Gh_A08G1378). Several functionally unknown genes regulated by miRNAs and lncRNAs were differentially expressed in the A, B, and R lines. These above-mentioned genes may play critical roles in CMS and fertility restoration in cotton.
The expression levels of several lncRNA–miRNA–mRNA regulatory networks were validated by qRT-PCR and were concordant with the transcriptome data (Fig. 7). For example, a bHLH transcription factor, ABORTED MICROSPORES (AMS) gene (Gh_D12G0328), was a specific phytochrome-interacting factor regulated by ath-miR414 and TCONS_01118841. The gene Gh_A06G1391, which was regulated by ghr-miR2950 and TCONS_00233548, shows oxidoreductase activity and may participate in fatty acid biosynthesis. The qRT-PCR results showed that both genes were down-regulated in the A line compared with the B line. These genes might be involved in CMS during anther development. In the A–R comparison, a glutamyl-tRNA reductase (Gh_A08G0634) functions in oxidation–reduction processes, as the target gene of gra-miR166d and TCONS_00327889, and was down-regulated in the A line compared with the R line. The gene Gh_D11G3015, which encodes a calcium-dependent lipid-binding (CaLB domain) protein and was the target gene of zma-miR171b-3p and TCONS_01086083, may participate in the calcium signaling pathway and was down-regulated in the A line compared with the R line. These genes might be involved in pollen development and fertility restoration during anther development. In addition, in our previous study, the PPR (Gh_D05G3392) gene located near the region of Rf1 was regulated by gra-miR7505b [25]. Interestingly, we detected the lncRNA TCONS_00797453 located in the promoter region of the Gh_D05G3392 gene and may strongly regulate the expression of that gene in the R line. This result showed that PPR (Gh_D05G3392) may be regulated by both gra-miR7505b and TCONS_00797453 to participate in fertility restoration.