PacBio sequencing and error correction of long reads
To obtain comprehensive transcriptome profiles of golden and brown scallops between the same tissue, 10 libraries of five tissues were constructed from golden and brown scallops. Totally, 53.97 Gb NGS clean reads were produced with an average of 5.4 Gb for each sample (Additional file 1: Table S1), the sequencing data of this study was deposited in the NCBI (SRA: SRP171083).
Five tissues from golden scallops were used in extracting RNA, and the library was constructed by equally RNA from each tissues, and then sequencing. Totally 679,907 polymerase reads were obtained, and their full passes ≥ 0, consensus accuracy > 0.8, the average length was 2,677 bp, and with quality of 0.89, and 12 passes (Additional file 1: Table S2). These polymerase reads were filtered using the standard protocol of SMRT Analysis software suite, and 182,010 ROIs were obtained, which included 90,727 FLNC and 80,556 non-full-length reads (Table 1). Duo to the high error of SMRT sequencing reads, it is indispensable to perform error correction using high-quality NGS short reads. After errors correction, low quality and redundant transcripts were removed. Finally, 26,237 of non-redundant transcripts were obtained from scallops.
Functional annotation and enrichment analysis
The obtained transcripts combining Pacbio and NGS data increase the accuracy and efficiency of functional gene prediction and annotation, especially lacking reference genome information. Functional annotation of the non-redundant transcripts was searched against the public databases using the BLAST, such as NR, GO, Swissprot and KEGG databases. In the GO database, 4,227 transcripts were annotated, and 6,424 in COG; 9,977 in KEGG; 13,320 in KOG; 15,759 in Pfam; 11,729 in Swiss-Prot; 17,251 in eggNOG; 20,961 in NR. Total of 21,030 transcripts were annotated at least one of the eight databases (Fig. 1a). Based on the NR database, homologous species of C. nobilis were predicted using sequence alignment. Approximately 91.11% of sequences were aligned to Mizuhopecten yessoensis, followed by Crassostrea gigas (1.16%) (Fig. 1b).
Alternative splicing analysis and SSR detection
Total 227 alternative splicing events were identified (Additional file 1: Table S3). Duo to no reference genome is available for this species, hence the types of alternative splicing events cannot be identify. And total of 26,135 transcripts (78,152,940 bp) were subjected to SSR analysis, including 23,758 SSRs and 12,442 SSR-containing sequences and found that most of them were with mono-, di-, or tri-nucleotide repeats (Fig. 2; Additional file 1: Table S4). Considering the high quality of transcriptome sequences, the detected SSRs would be useful for marker-assisted breeding and genetic analysis in the C. nobilis.
Long non-coding RNAs (lncRNAs)
As an emerging hot topic in biology, lncRNAs has been found to be functional as crucial regulators in variety of biological processes. In the present study, lncRNA transcripts were predicted by four methods, including CPC, CNCI, CPAT and pfam protein structure domain analysis, totally identified 6,032 lncRNAs in noble scallop (Fig. 3, Additional file 1: Table S5).
Identification of differentially expressed genes (DEGs)
Based on the reads from RNA-seq, FPKM values were used to investigate the gene expression patterns of different tissues of C. nobilis. Thus, the comparison of gene expression between golden scallops and brown scallops was performed. DEGs were analyzed using the edgeR software. To identify significantly differential expression genes, FDR < 0.05 & | log2(fold change)| ≥1 was set as the criteria. DEGs of five tissues from golden scallops and brown scallops was list in Table 2. And a total of 9263 DEGs were identified among the five tissues, and the up-regulated and down-regulated genes number were showed in Additional file 1: Table S6, respectively. Then, Venn diagrams showed the number of genes uniquely expressed in each tissue or genes shared between one or more tissues (Fig. 4). GO and KEGG pathway enrichment analysis of DEGs in the same tissue were shown in Fig. 5 and Fig. 6. And the top 20 KEGG pathways in each tissues were listed in Additional file 1: Table S7.
Functional annotation of DEGs and candidate genes involved in carotenoids accumulation
Total of 9263 DEGs between golden scallops and brown scallops, 3361 were up-regulated and 4980 genes were down-regulated. Of the 9263 DEGs, 8422 were annotated at least one of the following: Nr (8406), Swissprot (4766), KEGG (3702), KOG (4460) and GO (1697).
Based on the GO enrichment analysis, total of 47 significantly GO terms were observed (corrected P-value < 0.05). The top five GO terms were macromolecular complex, protein complex, oxoacid metabolic process, organic acid metabolic process and lyase activity (Fig. 7). And these include a number of terms related to carotenoids, such as lipid transport and lysosome. Total of 229 KEGG pathways were identified (Additional file 1, Table S8). These include a number of terms related to carotenoids accumulation, such as lysosome, fat digestion and absorption and ABC transporters (Fig. 8).
To explore the genetic mechanisms of the golden scallops, the differentially expressed genes of five tissues were filtered respectively for those believed to be involved in carotenoids accumulation. Several genes, including CD36, ABC transporter G family member 5 (ABCG5), beta, beta-carotene 15,15-dioxygenase (BCMO1), glutathione S-transferase (GSTs), intestine-specific homeobox (ISX), very low density lipoprotein receptor (VLDLR), low-density lipoprotein receptor (LDLR), Craotene oxygenase related to carotenoids accumulation were shown in Fig. 6, CD36, GAST-theta and very low density lipoprotein receptor (VLDLR) genes were significantly higher expressed in the hemolymph of golden scallops than brown scallops; ABCG5 gene was highly expressed in the mantle of golden scallops; BCMO1 gene was highly expressed in the intestine of golden and brown scallops than other tissues; GST-Mu gene was significantly higher expressed in the gonad of golden and brown scallops than other tissues; Craotene oxygenase gene was significantly higher expressed in the intestine of golden and brown scallops than other tissues; low-density lipoprotein receptor (LDLR) has a significant differences in mantle and adductor muscle. Overall, our results of the high expression levels for these genes related to carotenoids accumulation are consistent with higher carotenoids content in golden scallops, suggesting that these genes play important roles in carotenoids accumulation.
Function identified of the carotenoids related DEGs in the scallop early development
To explore the carotenoids related DEGs function in the scallop early development, we selected six genes to analyze their spatial and temporal expression characteristics (Fig. 9). The results indicate that all these genes showed a strongly expression level in fertilized egg, except for the BCMO1; and these genes also showed a higher level in golden scallops than that of brown scallops at S-stage and J-stage, except for BCMO1 and GST-Pi. The expression level of BCMO1 showed a decreasing trend, and brown scallops had higher levels than golden scallops at most stages.