Identification of exosomes
In order to directly observe the morphology of exosomes, we employed transmission electron microscopy to observe negatively stained exosomes samples. As shown in Fig. 1A, it had a very obvious single-layer membrane structure under the electron microscope, which appeared as a saucer-shaped or disc-shaped vesicle with one side recessed. NTA analysis was performed to further determine the size of exosomes. The main peak of the particle size of the exosomes we separated was 70.55 nm, and the 80.3% particle size was between 40–150 nm (Fig. 1B). Flow cytometry was used to detect the surface characteristic markers of exosomes. The results revealed that the positive rates of CD81 and CD63 were 92.7% and 85.9%, respectively (Fig. 1C).
Sequencing data statistics of exosomes derived from MAC-T cells
In the current study, a total of 4 cDNA libraries were constructed by isolating total RNA from exosomes derived from S. aureus-infected and non-infected MAC-T cells. The raw reads in the cDNA library of different samples were cont1, 67572136; cont2, 66474707; infect1, 69033293; infect2, 70292999. GC content (%) percentages were found to be cont1, 48.61; cont2, 47.63; infect1, 49.75; infect2, 49.98 (Table 1). We conducted quality control analysis on raw data, and the results are as follows. The quality of the bases in raw reads was counted in each sequencing cycle. The base composition of each cycle is shown in Fig. 2A. As shown in Fig. 2B, the blue line represented the average base quality of the cycle in the base quality distribution box plot. Base error rate is limited to 0.11–0.13 in raw reads (Fig. 2C). We filtered raw data to get clean data, and clean reads, GC%, Q20, Q30, clean bases were counted in Table 2, and the quality control results of clean reads were shown in Fig. 2D, E and F. In addition, when the clean reads were aligned with the bovine reference genome using HISAT2 software by an improved BWT algorithm (FM index), it was determined that the Total Mapped Reads or Fragments belonging to all the samples was above 80% and was mapped in the reference genome (Table 3).
Table 1 Statistics of raw data quality
Sample name
|
Total reads
|
Total reads (bp)
|
GC%
|
Error rate(%)
|
Q20(%)
|
Q30(%)
|
cont1
|
67572136
|
10135820400
|
48.61
|
0.11
|
98.64
|
96.29
|
cont2
|
66474707
|
9971206050
|
47.63
|
0.11
|
98.75
|
96.62
|
infect1
|
69033293
|
9854993950
|
49.75
|
0.13
|
98.26
|
95.53
|
infect2
|
70292999
|
10543949850
|
49.98
|
0.13
|
98.33
|
95.57
|
Note: Total reads: Statistics of total reads, each adjacent four lines contains the information of one read, and the total read number of each file is calculated; Total base: The number of all bases in total data; Clean reads: Filter the original data, remove the linker sequence, remove the contaminated part, and remove the sequence containing more low-quality bases to obtain clean reads; GC%: The percentage of G and C in all bases; Error rate: The error rate of sequencing; Q20, Q30: The percentage of total number of bases where the Phred score is greater than 20 and 30 which indicates base call accuracy.
Table 2 Summary of filter data
Sample name
|
Raw reads
|
Raw bases
|
clean reads
|
GC%
|
Q20(%)
|
Q30(%)
|
Clean bases
|
Clean bases%
|
cont1
|
67572136
|
10135820400
|
65650056
|
48.57
|
99.16
|
97.32
|
9689017234
|
95.59
|
cont2
|
66474707
|
9971206050
|
65006370
|
47.54
|
99.22
|
97.52
|
9514596643
|
95.42
|
infect1
|
69033293
|
9854993950
|
65789887
|
49.67
|
98.95
|
96.74
|
9383925239
|
95.22
|
infect2
|
70292999
|
10543949850
|
68826092
|
49.91
|
98.94
|
96.69
|
10102208096
|
95.81
|
Note: Raw reads: Statistics of raw reads, each adjacent four lines contains the information of one read, and the total read number of each file is calculated; Raw base: The number of all bases in raw data; Clean reads: Filter the original data, remove the linker sequence, remove the contaminated part, and remove the sequence containing more low-quality bases to obtain clean reads; GC%: The percentage of G and C in all bases; Q20, Q30: The percentage of total number of bases where the Phred score is greater than 20 and 30 which indicates base call accuracy; Clean bases: The number and length of sequences in clean reads calculate the number of bases; Clean bases%: Percentage of Clean bases/Raw base.
Table 3 Summary of reads mapped to reference genome
Sample name
|
cont1
|
cont2
|
infect1
|
infect2
|
The effective reads
|
112941906
(100%)
|
123138246
(100%)
|
86025798
(100%)
|
107088594
(100%)
|
Total mapped
|
5194592
(4.60%)
|
3683147
(2.99%)
|
4047510
(4.70%)
|
3269932
(3.05%)
|
Multiple mapped
|
550374
(0.49%)
|
253166
(0.21%)
|
492395
(0.57%)
|
411581
(0.38%)
|
Uniquely mapped
|
4644218
(4.11%)
|
3429981
(2.79%)
|
3555115
(4.13%)
|
2858351
(2.67%)
|
Read1 mapped
|
2600145
(2.30%)
|
1843109
(1.50%)
|
2022180
(2.35%)
|
1633528
(1.53%)
|
Read2 mapped
|
2594447
(2.30%)
|
1840038
(1.49%)
|
2,025,330
(2.35%)
|
1636404
(1.53%)
|
Reads map to '+'
|
2,599,952
(2.30%)
|
1842749
(1.50%)
|
2025784
(2.35%)
|
1636402
(1.53%)
|
Reads map to '-'
|
2594640
(2.30%)
|
1840398
(1.49%)
|
2,021,726
(2.35%)
|
1633530
(1.53%)
|
Reads mapped in proper pairs
|
4815452
(4.26%)
|
3470648
(2.82%)
|
3800530
(4.42%)
|
3059418
(2.86%)
|
Note: The effective reads: The remaining number of clean reads after removing rRNA reads will be used for subsequent genome alignment; Total mapped: The number of sequencing sequences that can be aligned on the genome; Multiple mapped: The number of sequencing sequences with multiple alignment positions on the sequence of reference; Uniquely mapped: the number of sequencing sequences with unique alignment positions on the reference sequence; Read-1, Read-2 mapped: Pair end sequence, whose two parts that can be located on the number of genome respectively, the statistical ratio of the two parts should be roughly the same; Reads map to'+': The number of reads aligned to the positive strand of the genome; Reads map to'-': The number of reads aligned to the genome on the negative strand; Reads mapped in proper pairs: The relative distance of the pair end sequence mapping to the genome conforms to the length distribution of the sequenced fragments.
Distribution of reads in the chromosomes and known types of RNAs
After mapped in the reference genome, we counted the position information of the genomes corresponding to all reads to evaluate the depth of sequencing data coverage. The distribution of reads on each chromosome was displayed in Fig. 3A. Subsequently, the reads aligned on the chromosome were annotated to exonic, intronic, and intergenic. Reads compared to the genome was counted the distribution (Fig. 3B). And the reads distributions in the known RNA types were shown in Fig. 3C.
Differentially expressed genes analysis in exosomes derived from S. aureus-infected and non-infected MAC-T cells
DESeq2 was used to screen differentially expressed genes. We selected the differentially expressed genes between samples through the two levels of multiple of difference (|log2(Fold Change)|>1) and significance level (Q value < 0.05) [16–18]. The statistics of the number of differentially expressed genes in exosomes derived from S. aureus-infected and non-infected MAC-T cells were shown in Fig. 4A. There were a total of 186 genes expression disorders, of which 31 genes were significantly up-regulated and 155 genes were significantly down-regulated. We used the RPKM/COUNT of the differential gene as the expression level. Hierarchical Clustering analysis was performed to cluster genes with the same or similar expression patterns into clusters, and used different colored regions to represent different clustering grouping information to determine clustering mode of different sample control modes [15, 19]. In this study, the hierarchical clustering analysis of differentially expressed genes is shown in Fig. 4B.
GO enrichment analyses were conducted to search for the biological processes, cellular components, and molecular functions of differentially expressed genes in more detail [15, 20]. GO analysis was performed on the up-regulated and down-regulated genes. According to the enrichment point, all disordered genes are classified according to GO terms (Fig. 4C and D). Moreover, KEGG pathway analysis results were calculated according to the enrichment points, and the best gene pathways related to the up-regulated and down-regulated genes were listed in Fig. 4E and F.
Differentially expressed mRNAs and lncRNAs analysis in exosomes derived from S. aureus-infected and non-infected MAC-T cells
The expression level of the transcript was calculated using the RPKM value. We found that there were 78 up-regulated and 353 down-regulated known mRNAs, and 8 up-regulated and 11 down-regulated lncRNAs (Fig. 5A and B). Up-regulated lncRNAs were XR_003034773.1, XR_810477.3, XR_003031139.1, XR_003038102.1, XR_003034201.1, XR_236621.4, XR_003031133.1 and XR_003031134.1; down-regulated lncRNAs were XR_003034734.1, XR_003033312.1, XR_003033314.1, XR_003034774.1, XR_003035599.1, XR_003033311.1, XR_001500758.2, XR_003029422.1, XR_003034775.1, XR_003031994.1 and XR_003036684.1. In addition, the differentially expressed mRNAs and lncRNAs were also clustered in the hierarchical cluster analysis map (Fig. 5C, D).
Prediction of lncRNA target genes and mRNAs
The mode of action of lncRNA regulating target genes could be divided into two categories: cis regulation (In Cis) and trans regulation (In Trans). We made predictions about how the differentially expressed lncRNAs regulated the target genes (Additional file 1 & 2). Differentially expressed lncRNAs were selected to draw a network diagram of the interaction between these lncRNAs and their target genes (Fig. 6), so as to provide reference and help for the overall analysis of the functions of lncRNAs in samples.
GO enrichment and KEGG enrichment analysis of lncRNA target genes and mRNAs
GO analysis was performed on the target gene, and take the P < 0.05 of GO annotation enrichment as the significance threshold. GO terms of biological processes, cellular components and molecular functions were shown 10 GO terms, respectively (Fig. 7A). The Additional file for detailed biological processes, cell components and molecular functions were seen in Additional file 3, 4 and 5. KEGG analysis was performed on the target gene, and use the P < 0.05 of the enrichment degree of KEGG pathway as the significance threshold to obtain the analysis result. KEGG enriched target gene pathways were classified according to environmental information processing, human diseases, metabolism, and biological systems. It was found that lncRNA target genes were most closely related to human diseases (Fig. 7B). KEGG pathway analysis results are shown in the bubble diagram of the KEGG pathway (Fig. 7C). Multiple target genes are involved in the TNF signaling pathway, Notch signaling pathway, Phosphatidylinositol signaling system and p53 signaling pathway in the KEGG target gene enrichment pathway.
Results of RT-qPCR validation was in accordance with RNA-Seq
6 differentially expressed mRNAs (BRSK2, PRKDC, FAT3, SFMBT1, SLF2 and LCOR) and 6 differentially expressed lncRNAs (XR_001500758.2, XR_003029422.1, XR_003036684.1, XR_003034201.1, XR_003034734.1 and XR_003038102.1) were chosen to verify the lncRNA-seq results. The results of qPCR were found to be in accordance with RNA-seq (Fig. 8A, B), suggesting that RNA-seq results were reliable.