De novo transcriptome assembly
Sequencing of four libraries, corresponding to the four developmental stages of the red king crab P. camtschaticus—zoea I, zoea IV, glaucothoe, and juvenile—resulted in a total of 112.2 million raw paired-end reads. For further assembling, paired-end reads from 112 libraries from previous studies10 were added to these paired-end reads. After trimming adapters, filtering low quality and contaminated reads, 94.35% of paired-end reads were retained, with an average Phred quality score of 35.6. As a result, 1028 millions filtered paired-end reads with average read length, 94.8 nucleotides were participated in assembling (see Supplementary Table S1).
The first stage of assembling in SPAdes resulted in a total of 839,551 contigs. This was unsatisfactory, as the percentage of redundancy of almost identical contigs was high. Apparently, this situation arose due to the variability came from 5` and 3` untranslated regions (UTRs) as mentioned Boyko et al. For this reason we used previously described tool HomoloCAP11 to achieve the final assembly. As a result we obtained a total of 162,557 contigs and 149,603 genes.
Since all the transcriptome assemblies have not only protein-coding sequences (CDSs) in their composition, only CDSs with a length of more than 200 nucleotides with significant BLAST hits in the Drosophila melanogaster proteome were used in order to achieve a standardized comparison genome5 and our transcriptome assembly. Our assembly includes 85% of all protein-coding sequences of genomic assembly. At the same time, genomic assembly covers of only 37% of our assembly. This is also supported by BUSCO assessment results showing numerous missing and fragmented core arthropod genes in the genome (Fig. 1A). Compared to the genome, our assembly has better values of the basic statistic parameters such as average length, N50, N30, and N10 (Fig. 1B, see values at brackets). However, comparison of the basic parameters of all transcripts in the genome and transcriptome assemblies inverts the results. Hence, this assembly can serve as a reference assembly to search for polymorphisms, analyze gene families, etc.
Differential expression analysis
A search for DEGs revealed a total of 8057 genes with significant changes in expression level relative to the zoea I stage (see Supplementary Data S1). Of these, 5990 sequences had significant hits against the NCBI NR database. The number of DEGs at the zoea IV, glaucothoe, and juvenile stages is 2804, 2688, and 6926, respectively. The number of DEGs common to all three stages is 1250. The both negatively and positively regulated genes have the jagged dynamics of average LogFC (logarithm of fold change) values during the transition from the zoea IV to juvenile stages. In the glaucothoe stage genes show lowest level of expression (Fig. 2A). These observations are correlated well with morphological peculiarities of crab development. Zoea I–IV are swimming and planktotrophic larva with high feeding rate4. There is about 10 duration days of each of the four zoea stages12. Glaucothoe is a transition to the benthic juvenile stage. Unlike other stages, it is a nonfeeding and sedentary organisms. Given these facts, it is obvious that this lead to a decrease of gene expression.
This observation also converges with the number of unique DEGs and the correlation map (Fig. 2B, C). There is strong difference between the larval and juvenile stages on the correlation map, with the glaucothoe being in an intermediate position. At result of this analysis, the number of positively regulated genes decreases during the transition to glaucothoe and then increases again at juveniles. Based on DEGs, glaucothoe is closest to the juvenile stage, rather than zoea (Fig. 2C). This dynamics is probably the result of global changes in the work of genome during the transition to benthic form, as well as the gradual complication of structure, and the establishing of new tissues and organs.
Annotation
Annotation of 162,557 contigs by BLAST searching against the NCBI protein non-redundant database resulted in identification of 81,178 contigs and 73,241 genes with significant hits, with 8,654 contigs having only unnamed hits (see Supplementary Table S2). Hits belong to 3,984 organisms; over a half of contigs matched arthropods proteins and only 37% of genes matched to decapods proteins (see Supplementary Table S3). This indicates a rather low representation of decapods in the protein database and, accordingly, scanty knowledge. In addition, there is a high probability of the presence of contaminant sequences. We deleted 11,021 contaminant sequence mostly from Ciliophora, Streptophyta and Chordata phylums. And even after that in the final version of the assembly, about 11% of sequences bear resemblance to proteins of Ciliophora (see Supplementary Table S3). However, the level of similarity and the lack of the verified and well-annotated genome in P. camtschaticus do not allow us to unambiguously define these contigs as a consequence of contamination.
Based on data of P. platypus and P. camtschaticus genomes5,6, it can be assumed that the number of king crab genes varies about 28,000. The number of genes in our assembly can be estimated at several times more. The difference can be explained by the fact that, genome assembly of P. camtschaticus is incomplete. This is evidenced by the results of the BUSCO assessment with 27% of the arthropods core genes missing and 48% of genes incomplete (Fig. 1A).
During GO annotation, the number of contigs with hits significantly dropped as compared to BLAST searching against the NCBI database. Of the 7964 D. melanogaster orthologues (see Supplementary Table S4), only 2952 have notable expression at least at one developmental stage. The significantly enrichment processes at different stages of the development of P. camtschaticus are shown at Fig. 3 (also see Supplementary Data S2). As mentioned our earlier13, since the GO annotation is based on the functions of proteins of model organisms, GO enrichment analysis provides only a superficial understanding of the processes prevailing at one or another stage of development, the sites of their progress, and the functions in P. camtschaticus. Nevertheless, this analysis is useful for an overall assessment of gene activity and crab physiology.
It can be assumed, based on the comparison of significantly enrichment processes, that in the early larvae of P. camtschaticus some adaptation mechanisms do not function, in particular adaptation to elevated temperature and hypoxia. This conclusion has great practical significance. It is important to maintain optimal temperature and high aeration of water for the normal development of P. camtschaticus larvae and reducing mortality during cultivation at the zoea I–IV stages4.
Interestingly, the glaucothoe stage differ by activation of processes related to mitotic cell division. Despite the fact that the larvae do not feed during this period12, apparently, there is active development and restructuring of tissue in their body in connection with preparation for settlement and the transition to a benthic lifestyle.
For juvenile specimens, it is necessary to note the activation of such processes as “olfactory behavior” and “gluconeogenesis”. After settling, crabs switch to an active lifestyle, which is characterized by great physical activity, searching for food and avoiding predators. In this regard, the need to generation additional energy (gluconeogenesis) and enhance the sense of smell (olfactory behavior) increases.
TFs searching and clustering of expression profiles
A search for orthologues of D. melanogaster transcription factors resulted in identification of 353 TFs (see Supplementary Table S5). Of them, 27 TFs have significant changes in expression level relative to the zoea I stage and 21 of them have a “transcription factor” molecular function as specified in the FlyBase. All homologues were also verified against the NR NCBI database. Clustering of expression profiles resulted in 71 groups, many of which had already contained fewer than 50 sequences. In this regard, clusters having a similar expression pattern were manually combined. As a result, 15 clusters were obtained, each of which contained at least 100 sequences (Fig. 4). Of these clusters, 9 included TFs.
Apparently, the zoea I stage in P. camtschaticus is largely characterized by cluster 1, which is formed by 675 genes with maximum expression at this developmental stage (Fig. 4A). The cluster contains such GO terms as “chitin-based cuticle development” and “motor neuron axon guidance”. This cluster contained 3 significantly differential expressed TF genes (Hnf4, Eip75B, E2f1). These homologues play an important role in the early development of D. melanogaster. The hepatocyte nuclear factor 4 (Hnf4) control the expression of genes involved in glucose and lipid metabolism14,15. It is known, embryonal development of the Kamchatka crab is lecithotrophic and the prezoea larvae contain a lot of yolk16. It is likely that the utilization of stored nutrients continues in zoea I, since at this stage the number of lipid inclusions in tissues decreases compared to prezoea. Moreover, zoea I is the first actively feeding larva of the red king crab12. Hunting requires energy, which is provided by increasing glucose and lipid metabolism.
The ecdysone-induced protein 75B (Eip75B) regulates key aspects of the D. melanogaster larval development and metamorphosis17,18. The E2F transcription factor 1 (E2f1) essential for organ growth, development, and metamorphosis via cell cycle regulation19. After hatching, crab larvae, like many other arthropods, enter into interdependent cycles of molting and growth. Probably, it is associated with the activation of Eip75B and E2f1 expression in zoea I, which are involved in the regulation of the molting and organ growth mechanisms.
A less obvious peak at the zoea I stage has the cluster 2, containing a total 960 genes and among them 3 genes of TFs (zld, opa, and D). The cluster reflects the processes of this stage, including tracheal system, neuromuscular junction and midgut development, cell differentiation, segment polarity determination, dorsal/ventral pattern formation, and much more. The TFs included in the cluster perform are key for these processes. The zelda (zld) plays a pivotal role during early embryogenesis in D. melanogaster and later during wing growth and patterning in larvae20,21. The odd paired (opa) protein is important during embryonic segmentation, midgut formation and adult head morphogenesis22. The Dichaete (D) has important role in many developmental processes like neurogenesis, hindgut development, segmentation, and proper adult structure formation23–25 . The cluster 11 with negative peak at the zoea I stage do not have any significantly expressed TFs or enrichment processes.
The cluster 22 has a declining expression profile and no significantly enriched processes, but one transcription factor Optix (Fig. 4B). The Optix required for formation head skeletal elements and eyes of D. melanogaster26. Obviously, in P. camtschaticus, it is involved in the formation of eyes at the prezoea and zoea I stages and in the process of further larval development. The cluster 3 has an extending expression profile and various processes associated with fatty acid elongation, but has not any TFs.
The cluster 20 (Fig. 4C) has peak at the zoea I and IV stages and Hr3 TF. The Hormone receptor 3 (Hr3) required for developmental progression through early metamorphosis at response to ecdysone27. At the time, inductive function of the Hr3 can be negatively regulated by heterodimerization with Eip75B, whose expressed at the zoea I stage (see cluster 1)28. However, the cluster has no of significant enriched GO terms.
The clusters 10 and 31 with positive and negative peak at the zoea IV stage, respectively, do not have any significantly expressed TFs (Fig. 4D). There are clusters 6 and 7 with peak at the zoea IV stage, as well as glaucothoe (7) and juvenile (6) stages (Fig. 4E-F). The cluster 7 has peak at the zoea IV and glaucothoe stages. It includes 418 genes and one significant process of “apposition of dorsal and ventral imaginal disc-derived wing surfaces”. Probably, this GO term in crabs is associated with the processes of organogenesis and appendages formation. It is known, the development of pereiopods and abdominal pleopods occurs at zoea II–glaucothoe stages12. The cluster includes two TFs: hunchback (hb) and Heat-shock factor (Hsf). The hb regulates neural progenitor competence and body plan formation in D. melanogaster29,30. The Hsf activates antimicrobial/antifungal reactions in D. melanogaster and expression of heat shock proteins that prevent the developmental defects during heat stress31,32. These TFs probably perform similar functions in the development of P. camtschaticus.
The cluster 6 includes processes of Malpighian tubule morphogenesis and development, glial cell migration and immune response. The cluster has numerous genes (1294) with different expression profiles, but the clustering method could not divide it into separate groups with a more precise expression profile. Thus, in the cluster there are 4 genes of TFs (cabut (cbt), Jun-related antigen (Jra), kayak (kay), vrille (vri)) with a gradual increase of expression or a peak at the glaucothoe stage (cbt, vri). These TFs are associated with the development of various systems, including the nervous and respiratory, the immune response, metamorphosis, and circadian behavior33–40. Thereby, these genes are likely necessary for proper development and timely passage of metamorphosis, and therefore it should pay intent attention when designing markers of healthy development. Interestingly, the kay can inhibit vri function via dimerization41.
Cluster 32 with a positive peak of expression at the glaucothoe stage do not have TFs with significant changes of expression (Fig. 4G). This cluster contains various processes of activation of the immune response, apparently especially important at this stage7. For cluster 8, a decrease in expression was recorded at the glaucothoe stage. The GO enrichment analysis revealed only two terms: “fatty acid elongation, saturated fatty acid” and “fatty acid elongation, monounsaturated fatty acid”, which describe metabolic processes. A decrease in gene expression of this cluster at glaucothoe probably correlates with reduced feeding at the stage. The cluster has a total of 161 sequences and Mnt TF. The Mnt antagonizes cell growth promoting functions of the product encoded by Myc42.
Partly, the glaucothoe stage, together with juvenile, is characterized by cluster 13 (Fig. 4H), inclusive 278 genes and 2 TFs (Rel and Kr-h1). There are 3 enrichment processes here: “response to starvation”, “metamorphosis”, and “response to ecdysone”. These GO terms correlate with the processes occurring at the glaucothoe stage. During this period, the larva does not feed and transforms from a planktonic form to a benthic one12. The Relish (Rel) regulates the antibacterial and antiviral response, that reflected by not significantly enriched GO terms43. The Kruppel homolog 1 (Kr-h1) plays as ecdysone-induced metamorphosis regulator as well as repressor of neurogenesis in D. melanogaster44. The latter function of Kr-h1 in P. camtschaticus correlates with the inhibition of axonogenesis processes at the glaucothoe stage (Fig. 3).
The juvenile stage is characterized by clusters 4 and 9, but the latter has no TFs (Fig. 4I). Cluster 4 was the largest one by number of genes and contained 1903 contigs, a quarter of all detected DEGs. It also contains the most TFs (slbo, Ets21C, Ssb-c31a, aop, and drm). GO terms are associated with various aspects of the neural system development and morphogenesis, the heart, eye, and tracheal system development, and also immune response. This corresponds to the activation of organogenesis processes occurring after metamorphosis in crabs.
The slow border cells (slbo) protein involved at the epithelial cell migration during D. melanogaster ovarian development45. The Ets at 21C (Ets21C) protein possibly controls gene expression in response to infection or oncogene activation, but certain functions are unknown46. There are responses to bacterium processes here, but these are not significantly enriched. The Single stranded-binding protein c31A (Ssb-c31a) has unknown functions47. The anterior open (aop) protein acts downstream of the receptor tyrosine kinase signaling to regulate cell fate transitions critical to the development of many tissues including the nervous system, heart, trachea, and eye48–50. The drumstick (drm) protein involved in developmental patterning, cell fate specification, and gut morphogenesis51,52. Apparently, aop and drm in P. camtschaticus juveniles regulate similar processes.