F. taiwana transcriptome assembly and annotation
To study F. taiwana gene expression differences between black melanizing (Fig. 1A) and pink non-melanizing (Fig. 1B) embryos, we sequenced RNA from two biological replicates of each sample type, generating 12 to 41 million raw paired-end reads each (Table 1). Because there is no genome sequence available yet for this species, we first produced a de novo transcriptome assembly. After Trinity assembly and duplicate removal, we obtained 112,959 non-redundant transcripts (Ft-nr) with an N50 of 1,044 (Table 1). Of these, 28,455 (24.8%) transcripts were predicted to have long open reading frames (ORFs, defined here as ≥300 nt). The remaining transcripts presumably consist of untranslated regions, non-coding RNAs, and short or incomplete proteins.
As an initial assessment of the assembly, we examined the representation of Benchmarking Universal Single-Copy Orthologs (BUSCO) found in Ft-nr. This analysis revealed complete single or duplicate copies for 63.9% and 52.4% of the orthologs at the Insecta and Diptera taxa levels, respectively. Inclusion of the fragmented orthologs increased the corresponding coverages to 85.1% and 69.1%. Because only 1 life stage was examined, we consider both the Ft-nr transcriptome quality and coverage to be reasonable.
Table 1 Summary of RNA-seq metrics for F. taiwana egg transcriptomes under NaCl stress
† Number of paired-end reads
‡ After CD-HIT-EST clustering at 95% similarity
Next, we annotated all Ft-nr transcripts using three approaches (annotations are in Additional files 1, 2, 3, and 4). First, functional prediction software revealed that 112,959 transcripts had at least one predicted functional domain including 17,791 (15.8%) with a pfam annotation (Table 2 and Additional file 2). Second, we used blastx to compare all Ft-nr transcripts to five protein datasets. This analysis revealed that 24.9% (28,109) and 39.6% (44,727) of the transcripts had similarity to a protein in the high quality curated Swiss-Prot and the unreviewed Uniref50 databases (Additional file 2), respectively. We also observed comparable percentages of transcripts with putative homologs in the Drosophila (24.6%), Diptera (25.2%), and Culicidae (24.3%) ortholog datasets (Table 2 and Additional files 3 and 4). Third, we conducted GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) Pathway analyses and found that 26.0 % (29,425) and 20.1% (22,726) of the transcripts were annotated, respectively. When we restrict our analysis to the 28,455 long ORF encoding transcripts, annotation rates were higher (Swiss-Prot, 54.6%; GO, 64.7%), as expected.
Table 2 Summary of gene annotations for the F. taiwana egg transcriptomes
† Combined blastx, blastp and pfam results, see Additional file 1.
Differentially expressed genes (DEGs)
We next conducted differential gene expression analysis on the 28,455 (25.2%) transcripts with long ORFs by mapping the RNAseq reads to the transcriptome using HiSat2 and then calling DEGs with DESeq2. This analysis revealed 15,996 (56.2%) transcripts with log2 fold change ≥1 (i.e., 2-fold difference) between the black melanized (control) and the pink unmelanized (NaCl treated) eggs (Fig. 2A). Most DEGs were more highly expressed (12,515 transcripts, 78.2%) in black melanized eggs while 3,481 (21.8%) were more highly expressed in the NaCl treated pink unmelanized eggs (Fig. 2A).
Principal Component Analysis (PCA)
To provide an overview of the gene expression data, we conducted a PCA on all DEGs. The first and second components (PC1 and PC2) could explain 66.2% and 22.1% of the DEG variance, respectively. As shown in Fig. 2B, PC1 clearly separated the two sample types; however, PC2 was not clearly correlated with known sample properties (date sample, sequencing run date, etc). This result indicates that there were obvious differences in the transcriptome of the two sample conditions, consistent with the DEG analysis above.
Gene Ontology gene set enrichment analysis (GSEA) on PCA loading values
Of all 15,996 DEGs, 9,158 were annotated with GO assignments (Table 2), and for these we next conducted a GSEA. For each gene, the PC1 loading values were provided as the ranking basis for the GO GSEA. Because we were interested in the biological differences between the black melanized and pink unmelanized eggs, we decided to focus only on the genes with Biological Process GO term annotations; GSEA results for the molecular function and cellular compartment are in Additional files 5 and 6. To permit easier understanding of the GSEA results, we clustered the GO enriched DEGs hierarchically based on the distance of the PC1 loading values and GO assignments of each transcript. A dendrogram plot shows that these 8,865 DEGs with Biological Process GO term annotations could be separated into 5 cluster (Fig. 3). The DEGs belonging to the first cluster (red, inner circle) were related mainly to metabolic and cellular processes and were highly separated from the other cluster. The second and the third clusters (yellow and green) group near each other and have more complicated GO term compositions. The fourth one (blue) contains many genes associated with biological regulation, cellular process, and localization. Genes in the fifth cluster (purple) were mainly associated with cellular process, multicellular organismal process, biological adhesion and reproduction; additionally, the DEGs with pigmentation GO assignments belong to this group.
To obtain more insight into the biological differences between the black and pink eggs, we examined the more specific "next-level" terms for the 3 enriched GO categories with the most assigned transcripts. For metabolic process, the enriched terms were associated with organic substance, nitrogen compound, small molecule and hormone metabolic process, and their regulation (Fig. 4A). It is worth noting that the term “pigment metabolic process” was enriched in this analysis. Another major GO term, cellular process, was mainly composed of cellular response to stimulus, signal transduction, cell communication, cell cycle process, and regulation of cellular process (Fig. 4B). The third major GO term, multicellular organismal process, was associated with multicellular organism development, post-embryonic development and regulation of multicellular organismal process (Fig. 4C). Finally, the black melanized and pink unmelanized sample types clearly differ by pigmentation and this analysis also revealed the enrichment for the pigmentation GO term (Fig. 4D). Pigmentation subterms included developmental pigmentation and negative regulation of developmental pigmentation, which is critically important in the melanization pathway. However, the majority of pigmentation GO term were surprisingly up-regulated in pink eggs when laid in milieu containing high concentrations of NaCl (Fig. 4D).
KEGG pathway analysis
We also conducted a GSEA, as above, for the DEGs with KEGG assignments (6,888 transcripts, Table 3). This analysis revealed 102 significantly enriched KEGG pathways in 3 main categories (metabolism, genetic information processing, environmental information processing) as well as a few others (Table 3, see also Additional file 7 for more details).
In the metabolism category, there were 58 enriched pathways with 36.2% (n=21) associated with carbohydrate and glycan metabolism. This might indicate that the carbohydrate metabolism in eggs was dramatically changed under salt stress [12]. To a lesser extent, gene expression for other nutrition metabolism categories were altered, including lipid (n=8), amino acid (n=12) and cofactors and vitamins (n=8). In addition, 4 pathways associated with xenobiotic degradation were enriched in the high osmolarity condition (NaCl treatment). In the genetic information processing category, 20 pathways were enriched including DNA repair (n=3), regulation of transcription (n=3), translation (n=5), and degradation of RNA and proteins (n=4). All these were upregulated except the spliceosome pathway (within transcription). In the environmental information processing category, 12 pathways were enriched, including MAPK signaling, which plays a major role during egg development (Fig. 5). Interestingly, there were 3 pathways (retinol metabolism, phototransduction, and circadian rhythm) associated with light signal processing.
Table 3 Gene Ontology gene set enrichment analysis (GSEA) for 6888 DEGs with KEGG assignment
†The full list is in Additional file 7.
‡ There are no categories above these two pathways in the KEGG database.
§ This category is associates with human disease (subcategory of Human Diseases: Endocrine and metabolic disease). However, we still include this category because some organisms have the AGE-RAGE signaling pathway, including Drosophila melanogaster.
Two key terminal genes in the melanization pathway are downregulated in unmelanized eggs
An obvious phenotypic difference in pink eggs is that they are unmelanized. There are 6 enzymes important for melanin synthesis as shown in Fig. 5 [14]. We hypothesized that the pink embryos would have lower expression of some or all of these enzyme encoding genes relative to the control melanized eggs. Our RNAseq data showed that, indeed, the expression levels of the two terminal genes in the pathway, Laccase2 (-4.68 log2FC) and DCE (-8.87 log2FC), matched the expectation. However, TH (2.24 - 2.56 log2FC) and DDC (2.45 - 2.57 log2FC), which both function earlier in the pathway, were more highly expressed in pink eggs. The other gene (PO / PPO2) was more complicated with both highly and lowly expressed isoforms.
qRT-PCR validation
To confirm the RNA-seq results for several key genes associated with melanization or stress, we conducted qRT-PCR on independent biological samples. These samples were isolated with a slightly different feeding protocol. Of the 8 genes verified 6 were differentially expressed and 7 matched the direction of the RNA-seq results (Fig. 6). Among these, 5 (GST, TH, DDC, p38(MAPK) and PPO2) were higher expressed and the 2 key melanin producing genes (laccase2 and DCE/Yellow) were down-regulated in pink eggs. The expression of the duox gene did not match the RNA-seq results.