The considered gene expression datasets consist of a total of 63 samples, of which 39 belong to ischemic stroke patients and 24 are from controls who were neurologically healthy (non-stroke. To appreciate the distribution of gene expression data within these two groups, a boxplot is depicted in figure 1A. It is observed from the boxplot (figure 1A) that log2 values of gene expression lie between 1.5 to -2.0, while their 2nd quartile (Mean) fluctuates between -0.5 to 0.5, suggesting the gene expression data are uniformly distributed.
Figure 1B depicts the boxplot of the gene expression profile grouped into four samples – male stroke patients, female stroke patients, male non-stroke and female non-stroke. It is observed that the female stroke group appears to be predominantly down regulated when compared to the female control group.
In order to find key diagnostic biomarkers, i.e. differentially expression genes (DEGs), we computed the fold change statistics between the two groups of samples for all the genes. We filtered DEGs with a significance level of 5% (p-value <=0.05) and have at least a two-fold change in their expression. In this way, we obtained 20 DEGs which had at least a two-fold change in the expression level between the two groups. Out of 20 DEGs, only one gene had a three-fold change in the expression level. The list of identified DEGs along with their statistics such as adjusted p-value, moderated t-statistics, B-statistics, log fold change and fold change is given in Table 1.
The scatter plot of fold change scores and log fold change scores are shown in figure 2, respectively. It is observed from both Table 1 and figure 2A that, with the exception of gene C-C motif chemokine receptor 7 (CCR7), all the identified DEGs are down-regulated. The profile graph of the top five highly differentially expressed genes such as ARG1, MMP9, S100A12, ORM1, FCGR3B are shown in figure 3.
Gene-Disease Association Studies
We performed the DAVID analysis of identified DEGs. The gene-disease association studies of these DEGs, along with their different scores are presented in Table 2A. It can be observed from the disease association study using Genetic Association Database (GAD) [32] that four genes with GeneBank IDs NM_004994, NM_001995, NM_006418, and NM_000570 are associated with disease term “Stroke”, while few other genes are associated with the disease terms “brain hemorrhage”, “Guillain-Barre syndrome”, or “Multiple Sclerosis”.
Functional categories and GO analysis
Functional categories analysis helps group the related genes based on their protein domain families as most co-functioning genes belong to the same protein families. Functional categories are usually derived from Gene Ontology (GO) and Pfam databases. In GO database, every GO term is represented as a node in a directed acyclic graph, and functional categories are defined as genes annotated either directly to a node or to any descendant node in the ontology. Results of the functional categories enrichment and GO term enrichment analysis of all the identified DEGs are presented in Table 2B and Table 2C, respectively. The identified DEGs associated with stroke such as MMP9 (NM_004994), ACSL1 (NM_001995), OLFM4 (NM_006418) and FCGR3B (NM_000570) are enriched with different GO terms. For instance, MMP9, OLFM4 and FCGR3B are more than four-fold enriched with the term “Secreted”, more than two-fold enriched with the term “Signal”, and three-fold enriched with the term “Disulfide bond” (Table 2B & Table 2C). Hence, these genes are responsible for the controlled release of substance by cells or tissues, transmission of information in the biological system, and catalysis of the rearrangement intrachain and interchain disulfide bonds in proteins. Any perturbation to these genes may lead the mentioned biological dysfunctions.
Tissue expression analysis
The DAVID tool integrates world-class tissue expression data including GNF-Affy, CGAP-SAGE, CGAP-EST, and Unigene-EST, where we can quickly find the most enriched gene expression patterns across thousands of normal and disease tissues for any given gene lists. Tissue expression analysis allows the identification of biomarkers and gene expression pattern discovery. Tissue expression analysis is performed by the DAVID tool with a threshold of p < 0.05 (Table 2D). The term “Whole Brain_3rd” is enriched with a count of 18 and 90% similarity with the identified DEGs, which states that gene expresses higher than a 3rd quartile of its expression across all the tissues. Out of 20 identified DEGs in this study, 18 genes are significantly enriched for genes expressed in the brain tissue, including four genes involved in stroke (Table 2D). These enriched brain-expression genes illustrate that gene expression profiles in peripheral blood may be relevant for quantitative metabolic phenotypes in stroke.
Protein-Protein Interaction Studies
The PPI studies were performed using the STRING database (https://string-db.org/). We performed PPI studies of only those DEGs which have been found to be associated with “stroke”, as per Genetic Association Database (Ref. Table 2). Stroke associated genes are MMP9 (GeneBank ID: NM_004994), ACSL1(GeneBank ID: NM_001995), OLFM4 (GeneBank ID: NM_006418), and FCGR3B (GeneBank ID: NM_000570). We considered three interaction sources namely, Experiments, Co-expression, and Textmining, with an interaction score of 0.90 (highest confidence) from the STRING database. The networks of these genes are shown in figure 4. These PPI networks also present KEGG and Reactome pathway analysis. For instance, MMP9 interacts with IL6, LCN2, IL1B, TNF, and CXCL8 which are involved in hsa04657 (IL-17 signaling pathway), hsa168256 (immune system), and hsa04060 (cytokine-cytokine receptor interaction) (figure 4a). Similarly, ACSL1 interacts with several other genes, which are involved in hsa03320 (PPAR signaling pathway), hsa00071 (Fatty acid degradation), and hsa01212 (Fatty acid metabolism) (figure 4b). The results suggest that these genes are involved in important biological processes that can be interrupted due to its differential expression.
MicroRNAs target studies
We performed an interaction study of miRNAs that targets identified biomarker genes using miRTargetLink for Human (https://ccb-web.cs.uni-saarland.de/mirtargetlink/). miRNA-gene interaction studies help to understand the underlying role of microRNAs in the pathway. The miRTargetLink contains both experimentally known interactions from miRTarBase and predicted interactions. The microRNAs that targets MMP9, ACSL1, OLFM4, and FCGR3B are shown in figure 5, where only experimentally validated interactions (both weak and strong) were considered. Among these miRNA targets, few have been validated within the literature to be involved in ischemic stroke pathogenesis including hsa-miR-93-3p (Upregulating SOD enzymes) targeting ACSL1 [33, 34], hsa-miR-491-5p (Inhibit cellular invasion) targeting MMP9 [35], hsa-miR-29 (Induction of Fas receptors) targeting MMP9 [36], and hsa- miR-34a-5p (NPC regulation) targeting ACSL1 [37].
Validation of identified DEGs with the PubMed Literature
The identified DEGs were validated with the PubMed literature database as publication enrichment analysis using the STRING database. The important results of the publication enrichment analysis are shown in Table 3. We observed that most of the identified DEGs are markers for early or post-ischemic stroke. For instance, peripheral blood AKAP7 expression has been detected as an early marker for lymphocyte-mediated post-stroke blood-brain barrier disruption [38]. Similarly, several immune-related genes, including ARG1, are identified in post-stroke immunosuppression and ischemic stroke severity [39]. MMP9 and different isoforms of S100 (e.g. S100A12) at the protein level have been implicated as stroke predictors. Further, baseline serum MMP9 is reported to help predict the occurrence of blood-brain barrier disruption and S100 serum protein is associated with worse clinical outcomes. Thus, MMP9 and S100 can be used as prognostic markers in ischemic stroke [23]. By neutralizing the effect of MMPs, tissue inhibitors of matrix metalloproteinases (TIMPs) are responsible for maintaining tissue proteolysis in balance [40]. Thus, TIMPs define the mechanism of regulation by neuroinflammatory stimuli. Chemokine receptor 7 (CCR7) is reported to be increased in peripheral blood leukocytes in mild to moderate ischemic stroke [41]. The orosomucoid 1 (ORM1) is a glycoprotein that suppresses lymphocyte response to lipopolysaccharides, decreases platelet aggregation, and enhances cytokine secretion (refer Table 3). ACSL1 and FCGR3B, including MMP9, reported stroke-specific differential regulation in peripheral whole blood [38]. OLFM4, a gene associated with apoptosis, is also found to be differentially expressed in stroke by Fernandez-Cadenas et al. [42].