Identification of fucoidan biosynthetic genes
A total of 103 genes related to fucoidan synthesis were annotated based on the genome and transcriptome databases of S. japonica (Table S1). Generally, the ORFs of 81 genes were complete, and the rest of 22 genes were incomplete. The transcriptional levels of MPI3 (GENE_013986), PMM1 (GENE_007314), GM46D1 (GENE_026041) and ST12 (GENE_011842) were higher in each catalytic step, and were believed to be essential for fucoidan synthesis during the kelp growth and development.
Identification and characterization of the ST genes
S. japonica genome has 73 genes automatically annotated as ST genes, among which 27 were further confirmed as S. japonica ST genes. The S. japonica genome is 2.5, 3.58 and 3.83 times larger than that of E. siliculosus, N. decipiens, and C. okamuranus, respectively; the number of annotated ST genes in S. japonica is approximately 3.17, 7.31 and 8.1 times those in E. siliculosus, N. decipiens and C. okamuranus, respectively (Table 1).
These 27 ST genes were named ST1 to ST27 (the average FPKM values from high to low). Name, gene ID, scaffold location, ORF length, exon number, amino acid number, molecular weight, and pI of the 27 genes and their corresponding proteins are summarized in Table 2. The ST proteins ranged from 117 (ST17) to 473 (ST27) amino acids in length and 13.48 kDa (ST17) to 51.37 kDa (ST27) in molecular weight. The predicted pI values of ST proteins ranged from 4.77 (ST17) to 9.4 (ST5).
Amino acid sequence analysis showed that most proteins were non-secretory, and only five proteins (ST2, ST3, ST13, ST19, and ST25) have transmembrane helices, and were predicted to have signal peptide or signal anchor. In addition, five proteins without the transmembrane domain (ST1, ST4, ST6, ST7, and ST14) contained signal peptides. Only ST2 and ST16, were predicted to target the chloroplast, whereas ST3, ST7, ST10, ST17, ST19 and ST22, were predicted to be located in the mitochondria (Table S2).
Sequence analysis of the ST genes
The 27 S. japonica ST proteins could be classified into 5 groups (I - V) (Fig. 1A). A total of 20 conserved motifs were identified (Fig. 1B and Table S3). Group I, II and V contained Sulfotransfer_1 domain (PF00685), group IV contained Sulfotransfer-2 domain (PF03567), and group III contained Gal-O-sulfotr domain (PF06990), which was only present in algae [28]. The ST sequences in the same group contain similar conserved motifs (Fig. 1A, B).
Gene structures analysis showed that most ST genes (26, %) had more than three introns. Only one gene had less than three introns (ST17, 2 introns). The longest intron identified in the ST genes was nearly 15 kb (Fig. 1C).
We analyzed the types and numbers of all alternative splicing sites in S. japonicaST genes in different tissues and developmental stages. A total of 368 sites were identified in all samples. The most abundant alternative splicing site type was the ES type (127). Some types were centrally detected in several specific genes, for example, p5_splice (ST1), p3_splice (ST3 and ST22), ALS (ST11 and ST22) and AFS (ST10 and ST20). In addition, ST genes expressed in the basal blade contained more alternative splicing sites than those expressed in the distal blade. Details of these sites are listed in Table S4.
The PCR products of the cDNA sequences of three selected STs appeared as a single band, except for GENE_014314, as revealed by 1.5% agarose gel electrophoresis (Fig. S1). The results of these four selected STs in plasmid sequencing data of recombinant vector and RNA-Seq analysis were consistent. The coding sequences of the four STs are provided in Table S5.
Scaffold location and gene duplication of the STs
The 27 ST genes loci distributed randomly on 12 scaffolds and 7 contigs in S. japonica (Fig. 2). Only scaffold 3, 4, 6, 14 and 23, contained two or more ST loci. The ST family only contained tandem duplication which covered 33.3% of whole ST gene family. Duplicate ST gene pairs were found on scaffold 6 (ST11 and ST15), 14 (ST9 and ST17) and 23 (ST25 and ST19). A group of three tandem duplicates, ST1, ST4 and ST7, was identified on scaffold 4, and no collinearity among ST family members was observed.
Sequence alignment and phylogenetic analysis
As shown in Fig. 3, the secondary structure of S. japonica ST protein contained abundant alpha-helix (on average 56.38%) , followed by beta-bridge (on average 27.44%). Although there were some differences in amino acid sequences, the sequences in the same group displayed similar secondary structures. In group I, we found two highly conserved motifs (regions I (TxPKSGTxW) and IV (RKGxxGDWKxxFT)) and two conserved active sites (Lys59 and Arg276) (Fig. 4).
A phylogenetic tree was constructed based on the ST amino acid sequences: 27 S. japonica, 21 E. siliculosus, six C. okamuranus and nine P. tricornutum (Fig. 5). Five ST clades in our phylogenetic tree were determined based on the classification result of 15 E. siliculosus ST sequences [21]. Fifty-four of the 63 ST proteins were grouped into four subfamilies and the E. siliculosus STs previously clustered into two different clades (clade A and clade B) fell into one group, which contained 15 STs. In addition, seven, 13 and 19 STs clustered into clades C, D and E, respectively, and the remaining nine STs were unclassified.
Transcript profiles of STs in different tissues and developmental stages
A heatmap of the transcript levels of the 27 ST genes represented by FPKM values in different tissues and developmental stages was established by TBtools according to the RNA-Seq data (Fig. 6). In the tested samples, ST1 had a relatively high expression level, ST27 was barely detected. Most STs expressed more strongly in the basal than in the distal blade.
The trends of transcriptional levels of the 27 ST genes in S. japonica basal blade within all developmental stages are shown in Table 3. The STs exhibited several major expressed patterns: profile0 (ST8, ST20, ST21 and ST24), profile2 (ST6 and ST26), profile3 (ST1, ST10 and ST27), profile6 (ST15 and ST23), profile25 (ST14), profile28 (ST25) and profile29 (ST5, ST12 and ST18). Profile 0 and 29 are the two representative profiles, the former contains genes with a down-regulated expression pattern from January to June while the latter had an up-regulated trend. The most enriched pathways in profile 0 included oxidative phosphorylation, photosynthesis - antenna proteins, photosynthesis, carbon fixation and metabolic pathway. Genes involved in ribosome, nitrogen metabolism, sulfur metabolism and inositol phosphate metabolism were enriched in profile 29 (Table S6).
We analyzed the transcriptional levels of the STs in distal blade, 1/3, 2/3 and basal blade of S. japonica collected in April (Table 4). According to our data, the STs exhibited five major expression patterns: profile0 (ST17, ST18 and ST25), profile1 (ST8), profile4 (ST12 and ST19), profile9 (ST2, ST11, ST14, ST22 and ST24) and profile17 (ST23).The transcriptional levels of most STs were decreased, as observed for profiles 0, 1, 4 and 9, and genes related to basal metabolism and photosynthesis were enriched in these profiles, (Table S7).