Identification of DEGs and NRDEGs in AIS
The gene expression profiling dataset of peripheral whole blood specimens (GSE16561) was retrieved from the Gene Expression Omnibus (GEO) (Barrett et al. 2007) database, and divided into 39 ischemic stroke patients (Stroke) and 24 healthy control subjects (Control). Another GEO dataset of peripheral blood specimens (GSE22255) (Krug et al. 2012) was obtained and divided into 20 ischemic stroke patients and 20 normal patients. The two datasets were merged and batch effects were mitigated using the combat (package sva) function in R (Leek et al. 2012) and verified by principal component analysis (PCA). The third GEO dataset of peripheral blood specimens of Homo sapiens from GPL570 (GSE58294) (Stamova et al. 2014) was downloaded and divided into 69 ischemic stroke patients (group name: Stroke) and 23 normal patients (group name: Normal). As the validation dataset, it was used to validate the hub genes and the comparison map between the two groups was drawn.
To assess the differential impact of gene expression values on AIS, group comparisons were conducted using the R package limma (Ritchie et al. 2015). For differential expression analysis, the following additional thresholds were used: logFC > 0.3 and adjP-value < 0.05 for upregulated genes (up_regulated_genes) and logFC<-0.3 and adjP-value < 0.05 for downregulated genes (down_regulated_genes).
We identified necroptosis-associated genes using GeneCards (Stelzer et al. 2016), which is an integrative and comprehensive database of human genetic information. A search using the term "necroptosis" yielded 630 necroptosis-related genes (Table S1).
A Venn diagram was drawn using the DEGs and necroptosis-related genes and the intersecting genes were identified as NRDEGs in AIS. These genes were then subjected to PPI network analysis.
DEG Function and Pathway Enrichment analyses
GO (Ashburner et al. 2000), an international standard for gene functional classification, is widely used as a tool for functional annotation and enrichment analyses. GO terms are classified as cellular component (CC), molecular function (MF), and biological process (BP). KEGG database is comprised of the pathways of experimentally validated metabolic processes and gene sets of human diseases, and it stores an extensive collection of data on genomes, biological pathways, drugs, chemicals, and diseases. The DEGs underwent GO term and KEGG pathway analyses using the clusterProfiler (Yu et al. 2012) in R package. P < 0.05 was deemed statistically significant.
GSEA evaluation
GSEA, a widely recognized computational method, is frequently employed to assess variations in pathway activities and biological processes within expression datasets. It determines whether a predefined set of genes exhibits notable differences between two biological states (Subramanian et al. 2005). GSEA was conducted on the gene expression data in the combined dataset of GSE16561 and GSE22255 using the clusterProfiler R package in order to study the biological differences between patients with the AIS and Control groups. The "c2.all.v7.5.2.entrez.gmt" gene set was obtained from the Molecular signatures database (MSigDB) (Liberzon et al. 2015) for GSEA of the combined dataset of GSE16561 and GSE22255. Adjusted P < 0.05 was deemed statistically significant. Both p-value and normalized enrichment score (NES) values are presented.
Establishment of Co-expression Modules of DEGs through WGCNA
WGCNA (Langfelder and Horvath 2008) is a systematic biological approach used to characterize the patterns of inter-gene correlations across samples in order to discover modules of highly correlated genes. Therapeutic targets or candidate biomarker genes were identified according to the intensity of the gene set as well as the relationship between the gene set and the phenotype. We analyzed the GSE16561 and GSE22255 datasets using the WGCNA package in R software. For the GSE16561 dataset, we set the minimum gene number, cut height and optimal soft-thresholding power to 50, 135 and 5, respectively, merged modules at a cut height of 0.95, and established a minimum distance of 0.2. For the GSE22255 dataset, the minimum gene number, cut height and optimal soft-thresholding power were set to 50, 80 and 7, respectively, modules were merged at a cut height of 0.4, and the minimum distance was fixed at 0.2. Using this approach, we successfully obtained co-expression modules for the DEGs within the two groups of both datasets.
Development of PPI network
STRING (Szklarczyk et al. 2019) was used to search for known and predicted PPIs and was employed to construct the PPI networks of both DEGs and NRDEGs.
Cytoscape v3.6.1 (Smoot et al. 2011), a well-known open-source software platform that integrates interaction networks, was utilized for the visualization of PPI networks. Cytoscape CytoHumba plugin (Chin et al. 2014) was used to study the network’s hub genes and identified the top 10 genes in the maximum correlation coefficient (MCC). Functional correlations for the key genes were calculated using the R package GOSemSim (Yu 2020).
Construction of Diagnostic Model via LASSO Regression Model
LASSO regression involves the simultaneous screening of variables and complexity adjustment to fit a generalized linear model. Through regularization, a shrinkage penalty is introduced to limit the coefficients. The regularization process uses the sum of the absolute values of all the feature weights, which improves the interpretability of the model to a certain extent. A LASSO regression model was built for the genes in the co-expression modules of the DEGs using the glmnet package in R software (Engebretsen and Bohlin 2019; Mazumder and Hastie 2012). During the model construction process, we carefully screened the selected features and identified the best model for building a diagnostic model for cerebral infarction. Subsequently, we determined that the genes included in the model were the distinctive genes associated with cerebral infarction. To validate the models, ROC curves were drawn using the pROC package (Robin et al. 2011). A box plot was generated to display the characteristic genes of peripheral blood samples from patients with cerebral infarction and normal control patients. The genes with p < 0.05 were selected for visualization.
Molecular Subtype Analysis of Cerebral Infarction
Consensus clustering is an algorithm based on resampling techniques. Its purpose is to identify individual members and their respective subgroup numbers, while also assessing the reliability and validity of the clustering data. The previously identified NRDEGs based on the combined dataset of GSE16561 and GSE22255 were defined as key necroptosis-associated genes. The ConsensusClusterPlus package (Wilkerson and Hayes 2010) in R software was employed to determine the gene expression profiles of the previously screened NRDEGs, and the cluster with the best clustering was selected. Based on these results, different necroptosis patterns were identified.
Immune Infiltration Analysis
The CIBERSORT package (Chen et al. 2018a) was employed, utilizing a deconvolution algorithm with linear support vector regression, to predict the expression matrix associated with immune cell subtypes. This approach was applied to evaluate immune cell infiltration in patients with ischemic stroke and the control group using RNA-seq data. The differential enrichment of immune cells between patients with AIS and the control group was identified in the combined dataset of GSE16561 and GSE22255. Furthermore, we determined the Pearson correlation coefficients between immune cells and nine NRDEGs and examined the relationship between NRDEGs and the levels of immune infiltration.
Network Construction for the Interactions among Hub-mRNA, Hub-microRNA (miRNA), and Hub-long non-coding RNA (lncRNA)
We conducted an analysis of lncRNA and miRNA expression, focusing on their interaction with hub genes at the post-transcriptional level. To identify miRNA-mRNA targeting relationships, we utilized the miRTarBase database, which contains experimentally-validated miRNA-target interactions (MTIs) from over 8,500 articles (Huang et al. 2020). The database has been augmented with the newly released CLIP-seq dataset, resulting in over 500,000 MTIs. Leveraging improved natural language processing (NLP) technology, we collected additional target relationship pairs along with their network functions and annotation information. For identification of miRNAs potentially binding to hub genes, we utilized the miRTarBase 2020 database (https://mirtarbase.cuhk.edu.cn/) (Huang et al. 2020). Additionally, the TarBase database (version 8), which compiles experimentally-validated miRNA targets across multiple species, was used to predict miRNAs interacting with hub genes (Karagkouni et al. 2018). A list of miRNAs predicted by both databases was generated.
To search for miRNA targets, we employed the starBase database, which incorporates high-throughput sequencing datasets (CLIP-seq and Degradome-seq) and offers various visualization tools to explore microRNA targets (Cai et al. 2014). The database includes extensive data on miRNA-mRNA, miRNA-ncRNA, RNA-RNA, and RBP-RNA interactions. Using the starBase database, we predicted lncRNAs interacting with miRNAs. An interaction network comprising hub lncRNAs, hub miRNAs, and hub mRNAs was constructed, and the Cytoscape software was employed to visualize the network as a Sankey diagram.
HT22 Cell Culture and treatment
The HT22 cells (Zhejiang Ruyao Biotechnology Co. Ltd., Zhejiang, China), which are an immortalized mouse hippocampal neuronal cell line, were cultured in Dulbecco’s modified Eagle’s medium (DMEM, Corning, NY, United States) containing 10% foetal bovine serum (FBS, BI, Israel) and 1% double antibodies (37℃, 5% CO2 cell culture incubator). When the cells attached to the wall and reached 80–90% confluence, the culture medium were discarded, washed once with PBS, and digested with 3ml of trypsin for 2 minutes. The process of digestion was terminated by adding medium immediately after the cells became round and detached. The cells were transferred to a centrifuge tube, centrifuged at 1000rpm for 5 minutes, re-suspended in fresh culture medium, and distributed at a 1:2 ratio. The original cells were transferred into a new cell culture bottle for continued culture, with medium changed every 2 to 3 days.
We divided the cells into both a control group and a model group, seeded them into 96-well plates at a density of 5×104, with 3 replicate wells for each group, and then cultured overnight. After 24 hours, we removed the original culture medium in the model group, washed the cells twice with PBS, and then added 300µl of glucose-free, serum-free DMEM medium to each well before being incubated in an anaerobic culture box for 4 hours. After oxygen-glucose deprivation treatment, the cells in the model group were restored to complete DMEM medium, and returned to e cell incubator at 37℃and 5% CO2 for 2 hours. The cells in the normal control group were continuously cultured in complete medium and a cell incubator at 37℃, 5% CO2.
Quantitative polymerase chain reaction (qPCR analysis)
The mRNAs were extracted from HT22 cells by adding Trizol reagent. The Fast Plus RT Master mix reverse transcription reagent kit (Supersmart® 6-minute Heat-resistant first-strand cDNA Synthesis Kit) was used to synthesise the extracted mRNAs into cDNAs, called the reverse transcription, RT reaction. PCR was carried out using the SYBR Green qPCR Mix pre-mixed qPCR reagent kit (Superbrilliant® Third generation ZAPA SYBR Green qPCR premix) to amplify the cDNA in a two-step process and perform fluorescence quantification, with the reaction conditions as follows: 95℃ for 300s; 95℃ for 10s and 60℃/65℃ for 20s, for 40 cycles. The final primer concentration was 0.4µM, with GAPDH as the reference gene. The relative expression levels were calculated using the 2^(−△△Ct) method, and the melting curve of the products were analysed to ensure reaction specificity. See Supplemental Files for primer sequences.
Statistical Analysis
All statistical tests were conducted with R v4.0.2 (https://www.r-project.org). For normally distributed continuous variables, the statistical significance was assessed using an independent Student's t-test. Non-normally distributed continuous variables were compared using the Mann-Whitney U test and the Wilcoxon rank sum test. All p-values were two-sided, and p < 0.05 was deemed statistically significant.