A mature individual of 42cm was euthanized with MS222(1gL−1) prior to extract brain and blood tissues, which were immediately placed in ALLProtect buffer and EDTA-stabilized anticoagulant tubes, respectively and later preserved in -20℃ refrigerator for further usage[20]. Total RNA from each sample was extracted with TRIzol and 1g was used to prepare cDNA library(~400bp) for bridge amplification following the manufacturer's instructions. Finally, the purified libraries were loaded into Illumina Novaseq with 2*150bp paired-end configuration. Raw sequencing reads were trimmed where the probability of each base being correct was strictly confined to 99.99% (Data file 5). To perform de novo assembly, the processed reads were passed through Trinity-v2.11.0[21, 22] assembler that constructed 195,742 and 158,817 transcripts from blood and brain samples respectively (Data file 9). The primary number of transcripts was reduced to 160,481 and 129,040 after filtering and clustering non-redundant transcripts at 98% threshold. Quantitative analysis identified 41,572 and 17,242bp from the brain and blood transcriptomes as the longest transcripts with N50 values of 2,039 and 2,096bp (Data file 10). In both instances, the assembly length distribution remained uniform and comparable to one another (Data file 6). In addition to the quantitative assessment, BUSCO searches against 3,354 groups from vertebrate lineages found 82.3% and 71.5% of complete universal single-copy genes from brain and blood transcriptomes (Data file 7).
Implication of TransDecoder-v5.5.0[22] predicted around 80% of assembled transcripts had an ORF, of which 48,579 and 40,948 transcripts were capable of producing functional proteins (Data file 11). Using Blastx, Blastp as well as a series of tools based on HMM, we were able to annotate coding and non-coding transcripts with an e value cut-off at 10^-5. GO analysis ascertained 39015 and 33475 proteins had at least one relevant term with Molecular function, Cellular component or Biological precess. Search against Pfam database revealed that 70% of proteins in both instances had a functional domain. According to the loaded Sqlite database from Trinotate[23], 83% of predicted proteins were functionally annotated. The entire effort and representative datasets can be found in Table 1 (Data file 1, Data file 4 and Data file 14-19). To draw the homologous relationship, we retrieved Refseq proteins of seven others,
Table 1
Overview of all data files/data sets
Label | Name of data file/data set | File types (file extensions) | Data repository and identifier (DOI or accession number) |
Data file 1 | Method and Code availability | Document file (.docx) | Figshare https://doi.org/10.6084/m9.figshare.17056328 |
Data file 2 | RNAseq-Brain | SRA file (.sra) | NCBI Sequence Read Archive https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR16474177 |
Data file 3 | RNAseq-Blood | SRA file (.sra) | NCBI Sequence Read Archive https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR16474180 |
Data file 4 | FigS1 Complete work flow | Image file (.jpg) | Figshare https://doi.org/10.6084/m9.figshare.17054852 |
Data file 5 | FigS2 Post trimming quality assessment | Image file (.jpg) | Figshare https://doi.org/10.6084/m9.figshare.17054852 |
Data file 6 | FigS3 Transcript length distribution | Image file (.jpg) | Figshare https://doi.org/10.6084/m9.figshare.17054852 |
Data file 7 | FigS4 BUSCO assessment | Image file (.jpg) | Figshare https://doi.org/10.6084/m9.figshare.17054852 |
Data file 8 | FigS5 Phylogenetic relationship | Image file (.jpg) | Figshare https://doi.org/10.6084/m9.figshare.17054852 |
Data file 9 | Table S1 Preliminary assembly statistics | Document file (.docx) | Figshare https://doi.org/10.6084/m9.figshare.17054948 |
Data file 10 | Table S2 Final non-redundant assembly statistics | Document file (.docx) | Figshare https://doi.org/10.6084/m9.figshare.17054948 |
Data file 11 | Table S3 Annotation summery | Document file (.docx) | Figshare https://doi.org/10.6084/m9.figshare.17054948 |
Data file 12 | Table S4 Species Description | Document file (.docx) | Figshare https://doi.org/10.6084/m9.figshare.17054948 |
Data file 13 | Table S5 Homologue information | Document file (.docx) | Figshare https://doi.org/10.6084/m9.figshare.17054948 |
Data file 14 | brain.Trinotate.filtered.xls | Spreadsheet (.xls) | Figshare https://doi.org/10.6084/m9.figshare.16834564.v2 |
Data file 15 | brain.Trinity.RSEM.retained.clustered.fasta | Fasta file(.fasta) | Figshare https://doi.org/10.6084/m9.figshare.16834564.v2 |
Data file 16 | brain.Trinity.RSEM.retained.clustered.fasta.transdecoder.pep | Fasta file(.pep) | Figshare https://doi.org/10.6084/m9.figshare.16834564.v2 |
Data file 17 | blood.Trinotate.filtered.xls | Spreadsheet (.xls) | Figshare https://doi.org/10.6084/m9.figshare.16834546.v2 |
Data file 18 | blood.Trinity.RSEM.retained.clustered.fasta | Fasta file(.fasta) | Figshare https://doi.org/10.6084/m9.figshare.16834546.v2 |
Data file 19 | blood.Trinity.RSEM.retained.clustered.fasta.transdecoder.pep | Fasta file(.pep) | Figshare https://doi.org/10.6084/m9.figshare.16834546.v2 |
including clupeiform and non-clupeiform species, from NCBI repository (Data file 12). For blood and brain, we found that 40,304 and 34,301 proteins had orthologue relationships with other species accounting >82% of total proteins (Data file 13). Finally, to evaluate the phylogenetic relationship, one-to-one orthologue proteins were retrieved. As the datasets from brain tissue extracted more groups of homologue proteins, we used 204 one-to-one orthologue proteins from brain to reconstruct a phylogenetic tree. We have found that A. sapidissima was clustered well with the clupeiform clade that was supported with maximum bootstrap value (Data file 8). The constructed phylogeny supports all the other existing phylogenetic study regarding their position[24–26]. However, this present resource will leverage the whole genome study of A. sapidissima as well as provide a solid foundation to compare their impressive physiological and behavioural competence with other allies.