First report of de novo assembly and annotation from brain and blood transcriptome of an anadromous shad, Alosa sapidissima

doi:10.21203/rs.3.rs-1106703/v1

Download PDF

Research Article

First report of de novo assembly and annotation from brain and blood transcriptome of an anadromous shad, Alosa sapidissima

https://doi.org/10.21203/rs.3.rs-1106703/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 28 Mar, 2022

Read the published version in BMC Genomic Data →

You are reading this latest preprint version

Objectives: American shad (Alosa sapidissima) is an important migratory fish under Alosine and has long been valued for its economic, nutritional and cultural attributes. Overfishing and barriers across the passage made it vulnerable to sustain. To protect this valuable species, aquaculture action plans have been taken though there are no published genetic resources prevailing yet. So, here we reported a de novo transcriptome assembly and annotation for A. sapidissima from blood and brain tissues for the first time.

Data description: We generated 160,481 and 129,040 non-redundant transcripts from brain and blood tissues. The entire work strategy involved RNA extraction, library preparation, sequencing, de novo assembly, filtering, annotation and validation. Both coding and non-coding transcripts were annotated against Swissprot and Pfam datasets. Nearly, 83% coding transcripts were functionally assigned. Protein clustering with clupeiform and non-clupeiform taxa revealed ~82% coding transcripts retained the orthologue relationship which improved confidence over annotation procedure. Hopefully, it can serve as an useful resource in future for the research community to elucidate molecular mechanisms for several key traits like migration which is fascinating in clupeiform shads.

Alosa sapidissima

De novo transcriptome

Brain & Blood

Annotation

Alosa sapidissima is well discussed among the alosines for its biological, nutritional, and commercial calibre[1–4]. Their native range from the North Atlantic coast extends to several freshwater tributaries where come to reproduce by migrating, sometimes up to 1800km upstream[5–7]. For high fecundity, marketable weight, and sport fishing, this anadromous fish receives an overwhelming demand, which drives up the exploitation. Numerous obstructions on their passage are limiting free movement and segregating the populations into patches[8–12]. Being sensitive to environmental changes, several reports have anticipated the extinction of shad species namely Tenualosa. reevesii, T. thibaudeaui, and Alosa killarnensis[13, 14]. Considering this risk, American shad restoration project and captive rearing has been undertaken in the USA and China respectively. Despite these efforts, there is no large scale molecular information published to explain key traits that can strengthen a recovery program. Moreover, today's omics era is equipped to produce huge data at genetic level with precise accuracy. Therefore, we are going to report transcriptomic data from A. sapidissima for the first time. For a migratory species, it’s a challenge to maintain the ionic-balance in body fluid at a steady-state as it requires a rhythmic alteration between solvent and solutes contents. Moreover, a well-developed signalling system is also required to switch from salt to fresh water and vice versa, and to feed live prey[15–18]. So, the current transcriptomic resource from blood and brain will aid to understand key biological features from molecular level for this precious species. Nevertheless, the resource was initially produced to compare with other shads, but the effort was halted due to biological material transfer incompatibilities during cov-19 upsurge. Besides, WGS study of A. sapidissimsa is under consideration by G10K consortium[19]. Thereafter, it would be useful to share the data with scientific community to make better use of it.

A mature individual of 42cm was euthanized with MS222(1gL⁻¹) prior to extract brain and blood tissues, which were immediately placed in ALLProtect buffer and EDTA-stabilized anticoagulant tubes, respectively and later preserved in -20℃ refrigerator for further usage[20]. Total RNA from each sample was extracted with TRIzol and 1g was used to prepare cDNA library(~400bp) for bridge amplification following the manufacturer's instructions. Finally, the purified libraries were loaded into Illumina Novaseq with 2*150bp paired-end configuration. Raw sequencing reads were trimmed where the probability of each base being correct was strictly confined to 99.99% (Data file 5). To perform de novo assembly, the processed reads were passed through Trinity-v2.11.0[21, 22] assembler that constructed 195,742 and 158,817 transcripts from blood and brain samples respectively (Data file 9). The primary number of transcripts was reduced to 160,481 and 129,040 after filtering and clustering non-redundant transcripts at 98% threshold. Quantitative analysis identified 41,572 and 17,242bp from the brain and blood transcriptomes as the longest transcripts with N50 values of 2,039 and 2,096bp (Data file 10). In both instances, the assembly length distribution remained uniform and comparable to one another (Data file 6). In addition to the quantitative assessment, BUSCO searches against 3,354 groups from vertebrate lineages found 82.3% and 71.5% of complete universal single-copy genes from brain and blood transcriptomes (Data file 7).

Implication of TransDecoder-v5.5.0[22] predicted around 80% of assembled transcripts had an ORF, of which 48,579 and 40,948 transcripts were capable of producing functional proteins (Data file 11). Using Blastx, Blastp as well as a series of tools based on HMM, we were able to annotate coding and non-coding transcripts with an e value cut-off at 10^-5. GO analysis ascertained 39015 and 33475 proteins had at least one relevant term with Molecular function, Cellular component or Biological precess. Search against Pfam database revealed that 70% of proteins in both instances had a functional domain. According to the loaded Sqlite database from Trinotate[23], 83% of predicted proteins were functionally annotated. The entire effort and representative datasets can be found in Table 1 (Data file 1, Data file 4 and Data file 14-19). To draw the homologous relationship, we retrieved Refseq proteins of seven others,

Table 1

Overview of all data files/data sets
Label	Name of data file/data set	File types (file extensions)	Data repository and identifier (DOI or accession number)
Data file 1	Method and Code availability	Document file (.docx)	Figshare https://doi.org/10.6084/m9.figshare.17056328
Data file 2	RNAseq-Brain	SRA file (.sra)	NCBI Sequence Read Archive https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR16474177
Data file 3	RNAseq-Blood	SRA file (.sra)	NCBI Sequence Read Archive https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR16474180
Data file 4	FigS1 Complete work flow	Image file (.jpg)	Figshare https://doi.org/10.6084/m9.figshare.17054852
Data file 5	FigS2 Post trimming quality assessment	Image file (.jpg)	Figshare https://doi.org/10.6084/m9.figshare.17054852
Data file 6	FigS3 Transcript length distribution	Image file (.jpg)	Figshare https://doi.org/10.6084/m9.figshare.17054852
Data file 7	FigS4 BUSCO assessment	Image file (.jpg)	Figshare https://doi.org/10.6084/m9.figshare.17054852
Data file 8	FigS5 Phylogenetic relationship	Image file (.jpg)	Figshare https://doi.org/10.6084/m9.figshare.17054852
Data file 9	Table S1 Preliminary assembly statistics	Document file (.docx)	Figshare https://doi.org/10.6084/m9.figshare.17054948
Data file 10	Table S2 Final non-redundant assembly statistics	Document file (.docx)	Figshare https://doi.org/10.6084/m9.figshare.17054948
Data file 11	Table S3 Annotation summery	Document file (.docx)	Figshare https://doi.org/10.6084/m9.figshare.17054948
Data file 12	Table S4 Species Description	Document file (.docx)	Figshare https://doi.org/10.6084/m9.figshare.17054948
Data file 13	Table S5 Homologue information	Document file (.docx)	Figshare https://doi.org/10.6084/m9.figshare.17054948
Data file 14	brain.Trinotate.filtered.xls	Spreadsheet (.xls)	Figshare https://doi.org/10.6084/m9.figshare.16834564.v2
Data file 15	brain.Trinity.RSEM.retained.clustered.fasta	Fasta file(.fasta)	Figshare https://doi.org/10.6084/m9.figshare.16834564.v2
Data file 16	brain.Trinity.RSEM.retained.clustered.fasta.transdecoder.pep	Fasta file(.pep)	Figshare https://doi.org/10.6084/m9.figshare.16834564.v2
Data file 17	blood.Trinotate.filtered.xls	Spreadsheet (.xls)	Figshare https://doi.org/10.6084/m9.figshare.16834546.v2
Data file 18	blood.Trinity.RSEM.retained.clustered.fasta	Fasta file(.fasta)	Figshare https://doi.org/10.6084/m9.figshare.16834546.v2
Data file 19	blood.Trinity.RSEM.retained.clustered.fasta.transdecoder.pep	Fasta file(.pep)	Figshare https://doi.org/10.6084/m9.figshare.16834546.v2

including clupeiform and non-clupeiform species, from NCBI repository (Data file 12). For blood and brain, we found that 40,304 and 34,301 proteins had orthologue relationships with other species accounting >82% of total proteins (Data file 13). Finally, to evaluate the phylogenetic relationship, one-to-one orthologue proteins were retrieved. As the datasets from brain tissue extracted more groups of homologue proteins, we used 204 one-to-one orthologue proteins from brain to reconstruct a phylogenetic tree. We have found that A. sapidissima was clustered well with the clupeiform clade that was supported with maximum bootstrap value (Data file 8). The constructed phylogeny supports all the other existing phylogenetic study regarding their position[24–26]. However, this present resource will leverage the whole genome study of A. sapidissima as well as provide a solid foundation to compare their impressive physiological and behavioural competence with other allies.

The sample was collected from freshwater captivity located at Songjiang District, Shanghai. Normally, when anadromous fish migrate to freshwater, they need to move against strong water currents and interact with particular abiotic factors. However, in captivity, possible absence of such physical properties might provide less chance to specific gene expression than during migration in the wild.

SL: Standard Length; BUSCO: Benchmarking Universal Single Copy Orthologs; ORF: Open Reading Frame; HMM: Hidden Markov Model, GO: Gene Ontology; NCBI: National Canter for Biotechnology Information; WGS: Whole Genome Study, G10K: the international Genome 10K (G10K) consortium

Ethics approval and consent to participate

All experimental procedures including specimen handling were approved by the Animal Ethics Committee of Shanghai Ocean University, China.

Consent for publication

Not applicable

Availability of data and materials

Processed raw data has been deposited in NCBI with open access (https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR16474177 & https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR16474180). Method with its codes and references and all the final product of analysis has been submitted to figshare for public usage. File type and specific accessible links can be found in Table 1.

Competing interests

Authors are declaring no competing of interests.

Funding

This work was supported by “Science and Technology Commission of Shanghai Municipality (19410740500)” and “Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding project”. Except funding, funder has no role in study design, sample collection, data analysis, and interpretation, or in manuscript writing.

Author contributions

C.L. and K.K.S. designed the project and wrote the primary manuscript. L.J., L.W. and Y.H. collected and prepared the samples. K.K.S., L.L., J.H. and T.Z. performed the data analysis. All authors contributed in manuscript editing and revising the manuscript.

Acknowledgements

Our thanks go to the management team at the Lab of Molecular systematics and ecology for maintaining the High Performance Computation Server (HPCS) and supporting our data analysis. We also want to express our gratitude Mr. Roland Nathan Mandal and Miss. Irin Sultana for their technical support.

Limburg KE. American Shad in Its Native Range. American Fisheries Society Symposium 2003, 35:125–140.
Bi YH, Chen XW. Mitochondrial genome of the American shad Alosa sapidissima. Mitochondrial DNA. 2011;22(1-2):9–11.
Wang J, Yu ZS, Wang X, Yang SS, Zhang DG, Zhang Y. The next-generation sequencing reveals the complete mitochondrial genome of Alosa sapidissima (Perciformes: Clupeidae) with phylogenetic consideration. Mitochondrial DNA B. 2017;2(1):304–6.
Guo YJ, Xing ZK, Yang G, Liu JL, Chen CX, Xu DW. American shad muscle nutrition composition determination and analysis. China Feed. 2010;8:39–40.
Brown BL, Smouse PE, Epifanio JM, Kobak CJ. Mitochondrial DNA Mixed-Stock Analysis of American Shad: Coastal Harvests Are Dynamic and Variable. Trans Am Fish Soc. 1999;128(6):977–94.
Rasmussen JL, Regier HA, Sparks RE, Taylor WW. Dividing the waters: The case for hydrologic separation of the North American Great Lakes and Mississippi River Basins. J Great Lakes Res. 2011;37(3):588–92.
Pearcy WG, Fisher JP. Ocean distribution of the American shad (Alosa sapidissima) along the Pacific coast of North America. Fish B-Noaa. 2011;109(4):440–53.
Harris JE, Hightower JE. Movement Patterns of American Shad Transported Upstream of Dams on The Roanoke River, North Carolina and Virginia. North Am J Fish Manag. 2011;31(2):240–56.
Haro A, Castro-Santos T. Passage of American Shad: Paradigms and Realities. Mar Coast Fish. 2012;4(1):252–61.
Grote AB, Bailey MM, Zydlewski JD. Movements and Demography of Spawning American Shad in the Penobscot River, Maine, prior to Dam Removal. Trans Am Fish Soc. 2014;143(2):552–63.
Mulligan KB, Haro A, Noreika J. Effect of backwatering a streamgage weir on the passage performance of adult American Shad (Alosa sapidissima). Journal of Ecohydraulics 2021:1–13.
Hasselman DJ, Bentzen P, Narum SR, Quinn TP. Formation of population genetic structure following the introduction and establishment of non-native American shad (Alosa sapidissima) along the Pacific Coast of North America. Biol Invasions. 2018;20(11):3123–43.
Guo Q, Liu XJ, Ao XF, Qin JJ, Wu XP, Ouyang S. Fish diversity in the middle and lower reaches of the Ganjiang River of China: Threats and conservation. Plos One 2018, 13(11).
IUCN. The IUCN Red List of Threatened Species, vol. 2021-2; 2021.
Cao QQ, Gu J, Wang D, Liang FF, Zhang HY, Li XR, Yin SW. Physiological mechanism of osmoregulatory adaptation in anguillid eels. Fish Physiol Biochem. 2018;44(2):423–33.
Mohindra V, Dangi T, Tripathi RK, Kumar R, Singh RK, Jena JK, Mohapatra T. Draft genome assembly of Tenualosa ilisha, Hilsa shad, provides resource for osmoregulation studies. Scientific Reports 2019, 9.
Xu GC, Bian C, Nie ZJ, Li J, Wang YY, Xu DP, You XX, Liu HB, Gao JC, Li HX, et al. Genome and population sequencing of a chromosome-level genome assembly of the Chinese tapertail anchovy (Coilia nasus) provides novel insights into migratory adaptation. Gigascience 2020, 9(1).
Finlay RW, Poole R, Rogan G, Dillane E, Cotter D, Reed TE. Hyper- and Hypo-Osmoregulatory Performance of Atlantic Salmon (Salmo salar) Smolts Infected With Pomphorhynchus tereticollis (Acanthocephala). Frontiers in Ecology and Evolution 2021, 9(529).
Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021;592(7856):737-+.
Beekman JM, Reischl J, Henderson D, Bauer D, Ternes R, Peña C, Lathia C, Heubach JF. Recovery of microarray-quality RNA from frozen EDTA blood samples. J Pharmacol Toxicol Methods. 2009;59(1):44–9.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng QD, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644-U130.
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8(8):1494–512.
Bryant DM, Johnson K, DiTommaso T, Tickle T, Couger MB, Payzin-Dogru D, Lee TJ, Leigh ND, Kuo TH, Davis FG, et al. A Tissue-Mapped Axolotl De Novo Transcriptome Enables Identification of Limb Regeneration Factors. Cell Rep. 2017;18(3):762–76.
Bloom DD, Lovejoy NR. The evolutionary origins of diadromy inferred from a time-calibrated phylogeny for Clupeiformes (herring and allies). P Roy Soc B-Biol Sci 2014, 281(1778).
Bloom DD, Burns MD, Schriever TA. Evolution of body size and trophic position in migratory fishes: a phylogenetic comparative analysis of Clupeiformes (anchovies, herring, shad and allies). Biol J Linn Soc. 2018;125(2):302–14.
Hughes LC, Orti G, Huang Y, Sun Y, Baldwin CC, Thompson AW, Arcila D, Betancur-R R, Li CH, Becker L, et al. Comprehensive phylogeny of ray-finned fishes (Actinopterygii) based on transcriptomic and genomic data. Proc Natl Acad Sci USA. 2018;115(24):6249–54.
Sarker KK, Lu L, Huang J, Zhou T, Wang L, Hu Y, et al. First report of de novo assembly and annotation from brain and blood transcriptome of an anadromous shad, Alosa sapidissima. Figshare.2021. https://doi.org/10.6084/m9.figshare.17056328.
Sarker KK, Lu L, Huang J, Zhou T, Wang L, Hu Y, et al. First report of de novo assembly and annotation from brain and blood transcriptome of an anadromous shad, Alosa sapidissima. Figshare.2021. https://doi.org/10.6084/m9.figshare.17054948.
Sarker KK. Lu L, Huang J. Zhou T, Wang L. Hu Y, et al. First report of de novo assembly and annotation from brain and blood transcriptome of an anadromous shad, Alosa sapidissima. Figshare.2021. https://doi.org/10.6084/m9.figshare.17054852.
Sarker KK, Lu L, Huang J, Zhou T, Wang L, Hu Y, et al. First report of de novo assembly and annotation from brain and blood transcriptome of an anadromous shad, Alosa sapidissima. Figshare.2021. https://doi.org/10.6084/m9.figshare.16834564.v1.
Sarker KK, Lu L, Huang J, Zhou T, Wang L, Hu Y, et al. First report of de novo assembly and annotation from brain and blood transcriptome of an anadromous shad, Alosa sapidissima. Figshare.2021. https://doi.org/10.6084/m9.figshare.16834546.v1.

Download PDF

Journal Publication

published 28 Mar, 2022

Read the published version in BMC Genomic Data →

Reviews received at journal
19 Feb, 2022
Reviewers invited by journal
19 Feb, 2022
First submitted to journal
16 Jan, 2022

You are reading this latest preprint version

First report of de novo assembly and annotation from brain and blood transcriptome of an anadromous shad, Alosa sapidissima

Status:

Journal Publication

Version 1

Abstract

Objective

Data Description

Limitations

Abbreviations

Declarations

References

Status:

Journal Publication

Version 1