2.1 Technology Roadmap
2.2 ESTIMATE score distribution between patients with sepsis and those without sepsis
In our analysis, we initially employed the ESTIMATE algorithm to calculate four immune-related metrics for sepsis, including the immune score, stroma score, ESTIMATE score, and tumor purity 21.Following this, we delved into comparing these scores between sepsis and non-sepsis patient groups. As depicted in the heatmap of Fig. 2A, there was a notable disparity in the distribution of these four immune-related scores between sepsis patients and normal samples. Notably, sepsis patients exhibited elevated stroma scores (Anova test, p < 2.2e-16, as seen in Fig. 2B) and tumor purity (Anova test, p = 7.9e-09, Fig. 2C), while showing reduced ESTIMATE scores (Anova test, p = 7.9e-09, Fig. 2D) and immune scores (Anova test, p < 2.2e-16, Fig. 2E). This analysis provides valuable insights into the immunological landscape of sepsis, highlighting significant differences in immune and stromal components between sepsis and non-sepsis conditions.
2.3. Identification and correlation analysis of immune-related gene modules in sepsis
In our study, we conducted a WGCNA on the sepsis dataset to pinpoint gene modules associated with sepsis immunity. The analysis revealed that the optimal soft threshold was 5, which achieved the lowest mean connectivity (illustrated in Fig. 3a-A and Fig. 3a-B, respectively). Figure 3a-C displayed the gene clustering numbers, with various modules being differentiated by distinct colors. Subsequent steps involved identifying gene modules in relation to the four immune-related scores.The correlation heatmap between different gene modules and the four immune-related scores is shown in Fig. 3a-D. This heatmap revealed intriguing correlations; for instance, the matrix scores exhibited the strongest association with the darkgreen module. Conversely, the immune scores were most closely linked with the grey module. We observed that both the ESTIMATE score and tumor purity showed the highest correlation with the black module. These findings offer a nuanced understanding of the relationships between specific gene modules and key immune-related scores in the context of sepsis, providing valuable insights into the genetic underpinnings of this complex condition.
In our analysis of gene clustering within different color-coded modules, we observed a notable similarity in expression patterns among genes grouped in the same colored module (as depicted in Fig. 3b-A). Further exploration into the inter-modular relationships revealed relatively low correlation levels between different modules, which is illustrated in Fig. 3b-B. We then focused on detailing the heatmap of correlations between these diverse colored modules and sepsis. This analysis brought to light that the darkmagenta module demonstrated the most significant negative correlation with sepsis (r = -0.78), whereas the brown module exhibited the most substantial positive correlation with the disease (r = 0.7), as shown in Fig. 3b-C.Subsequently, we presented detailed scatter plots illustrating the correlation between the brown module and its associated genes. This scatter plot analysis revealed a significant correlation (p < 0.05, shown in Fig. 3b-D). Based on these findings, the genes within the brown module were ultimately selected as the final identified immune-related genes. This decision was grounded in the strong correlation these genes exhibited with sepsis, underscoring their potential importance in understanding the disease's immunological aspects.
2.4. Expression differences and biological pathway enrichment analysis of IRGs in sepsis
To deepen our understanding of IRGs in sepsis, we first sourced a set of IRGs from the ImmPort database. These were then intersected with disease-related genes identified through WGCNA, and the intersection was visually represented in a Venn diagram (Fig. 4A). This process led to the identification of 108 IRDEGs specifically expressed in the context of sepsis(Table 2). To visually compare their expression patterns, we utilized the R-package 'pheatmap' to create a heatmap. This heatmap, displayed in Fig. 4B, clearly indicated distinct expression patterns of these 108 genes between sepsis and non-sepsis patients.Furthermore, our analysis revealed that genes significantly overexpressed in sepsis patients were predominantly enriched in biological pathways such as Osteoclast Differentiation, B Cell Receptor Signaling Pathway, Th17 Cell Differentiation, and T Cell Receptor Signaling Pathway. Conversely, genes markedly upregulated in healthy patients showed significant enrichment in the Chemokine Signaling Pathway, Th17 Cell Differentiation, JAK-STAT Signaling Pathway, PD-L1 Expression, and the PD-1 Checkpoint Pathway in Cancer, among other pathways. These findings offer crucial insights into the distinct immunological landscapes characterizing sepsis patients compared to healthy individuals."
Table 2
List of sepsis-related immune genes.
IL1R2
|
PROK2
|
IFNAR1
|
GMFG
|
C5AR1
|
HSPA1A
|
IFNAR2
|
TRBV7-7
|
NFAT5
|
IL18R1
|
TNFSF13B
|
PLXNC1
|
ACVR1B
|
IL10RB
|
BTK
|
MR1
|
TRBV6-6
|
NCR3
|
SORT1
|
TLR8
|
CYBB
|
SLC11A1
|
TANK
|
IGF1R
|
LYN
|
VAV1
|
IL21R
|
HGF
|
JAK2
|
IL1B
|
PIK3CB
|
IFNGR2
|
VIM
|
PIK3CG
|
TRBV5-6
|
TRBV5-5
|
MMP9
|
IL1RAP
|
CHUK
|
NFKBIA
|
CSF2RA
|
TRBV4-1
|
LTB4R
|
IL12RB1
|
TRBV7-4
|
PLSCR1
|
IFNGR1
|
AQP9
|
IL10
|
IL18
|
MAP3K8
|
SEMA4A
|
NFKB1
|
TRBV5-1
|
SOCS3
|
TLR1
|
NAMPT
|
PIK3CA
|
FPR1
|
SYK
|
CSF2RB
|
TGFB1
|
TRAV38-1
|
IL18RAP
|
MAPK14
|
APOBEC3A
|
FGR
|
CMTM1
|
TRBV7-8
|
BCL3
|
TRBV9
|
ILK
|
C3AR1
|
IL4R
|
PDGFC
|
HCK
|
CKLF
|
STAT3
|
PPP3CA
|
CRLF3
|
B2M
|
IL1R1
|
CCR1
|
SOS2
|
HSPA1B
|
VAV3
|
TRAJ25
|
CMTM4
|
MAPK1
|
TRBV6-4
|
FCER1G
|
NEDD4
|
TBK1
|
CMTM6
|
CXCL16
|
BCL10
|
FOS
|
TRBV11-2
|
GRB2
|
RETN
|
FPR2
|
TGFBR1
|
NFKBIZ
|
CXCR1
|
IL32
|
KRAS
|
TRAV30
|
NFAT5
|
2.5. Functional Enrichment Analysis of IRDEGs Reveals
To elucidate the potential molecular mechanisms underpinning the IRDEGs, we conducted both GO and KEGG functional enrichment analyses on the 108 IRDEGs. The KEGG analysis revealed significant enrichment of these genes in pathways closely linked to cancer immunity. Notable pathways include Osteoclast Differentiation, Th17 Cell Differentiation, Cytokine-Cytokine Receptor Interaction, B Cell Receptor Signaling Pathway, T Cell Receptor Signaling Pathway, and the JAK-STAT Signaling Pathway, as illustrated in Fig. 5A and Table 3.Furthermore, the GO functional enrichment analysis shed light on their significant contribution to critical biological processes. These encompass a range of processes such as the Cytokine-Mediated Signaling Pathway,
Positive Regulation of Cytokine Production, Leukocyte Mediated Immunity, Immune Receptor Activity, Growth Factor Receptor Binding, and the T Cell Receptor Complex, depicted in Fig. 5B and Table 4.These findings provide a deeper understanding of the roles played by these 108 IRDEGs, particularly in their contribution to key immune pathways and processes.
2.6 Immune-related gene features screened by machine learning algorithm and Venn diagram display
To further screen for more important features in IRDEGs, we used five common Machine Learning algorithms, These include Elastic Net, LASSO regression, RF, Boruta, XGBoost decision trees. LASSO regression identified 53 important genetic features (Fig. 6A); Elastic network identified 38 important genetic features (Fig. 6B); RF identified 108 important genetic features (Fig. 6C); 61 important gene features were identified by Boruta algorithm (Fig. 6D); And XGBoost identified 20 important genetic features (Fig. 6E). As shown in the Venn diagram (Fig. 6F), the five machine learning algorithms collectively identified 11 IIRGs, which we finally identified as marker genes.
2.7. High-performance sepsis prediction model built with six machine learning algorithms
After that, we used six different machine learning algorithms to build sepsis prediction models. The results showed that these six prediction models all had high AUC value (Fig. 7A), and the C-index and F1-score of the models were also high (Fig. 7B), indicating that the prediction model we built had high prediction performance.
2.8. Independent dataset validates machine learning algorithm model related to sepsis prediction
Further, we validated the predictive performance of our model in independent sepsis dataset. In the GSE154918 dataset, we found that the models constructed using six different machine learning algorithms all had relatively high AUC values (Fig. 8A,Table S2), all greater than 0.75, and the AUC values of the pda model were as high as 0.901. At the same time, we also found that the constructed model had a relatively low C-index and F1-score (Fig. 8B), indicating that our model had a good and stable predictive performance for sepsis.
2.9. The importance and contribution of genetic features in different models
Further, we explored the 11 IIRGs in different models. In the NB model, gene MAPK14 made the greatest contribution to sample prediction (Fig. 9a-A). In the LogitBoost model, gene IL10 made the largest contribution to sample prediction (Fig. 9a-B); For GBM model, gene IL21R made the largest contribution to sample prediction (Fig. 9a-C); For cforest model, gene MAPK14 made the largest contribution to sample prediction (Fig. 9a-D). In avNNet model, gene JAK2 made the largest contribution to sample prediction (Fig. 9a-E). For the pda model, the gene MAPK14 contributes the most to the sample prediction (Fig. 9a-F).
In the NB model, gene JAK2 contributes the most to the model (Fig. 9b-A). For the LogitBoost model, gene SOCS3 contributes the most to the model (Fig. 9b-B); In the GBM model, gene NCR3 made the largest contribution to the model (Fig. 9b-C). For cforest, gene MAPK14 made the largest contribution to the model (Fig. 9b-D). For avNNet, gene JAK2 contributed the most in the model (Fig. 9b-E). For pda, the gene MAPK14 contributes the most in the model (Fig. 9b-F).
2.10. Analysis of the relationship between IIRGs and immune infiltration
To further investigate the association between the final 11 IIRGs and immune infiltration in sepsis, we employed both CIBERSORT(Table S3) and ssGSEA ༈Table S4༉methodologies to assess the immune infiltration in patients from a sepsis dataset. Using CIBERSORT, we observed significant disparities in immune cell infiltration between sepsis patients and healthy samples (as depicted in Fig. 10a-A). Specifically, immune cells like T cells CD4 memory resting, T cells CD8, NK cells resting, B cells naive, and T cells CD4 naive were predominantly found in healthy patients. Conversely, other immune cells, including Neutrophils, T cells regulatory (Tregs), Macrophages M0, and Monocytes, were notably abundant in sepsis patients.
Similarly, the ssGSEA analysis revealed marked differences in immune infiltration between sepsis patients and healthy individuals (illustrated in Fig. 10a-B). In sepsis patients, immune cells such as Neutrophils, Th2 cells, Macrophages, Mast cells, iDC, DC, Th17 cells, Treg cells, and aDC were significantly enriched. On the other hand, immune cells like NK CD56bright cells, TNK cells, Th1 cells, NK CD56dim cells, and Tfh cells were more prevalent in healthy patients. These findings offer valuable insights into the immune landscape of sepsis, highlighting the distinct immune cell profiles in sepsis patients compared to healthy individuals and underscoring the potential roles of these 11 IIRGs in modulating immune responses in sepsis.
In our study, we delved into the correlation between the expression of 11 IIRGs and the immune infiltration levels in patients, subsequently visualizing these relationships through correlation circles. Within the CIBERSORT analysis, we discovered a significant correlation between the expression of these 11 IIRGs and various immune cells, notably T cell CD8, T cells CD4 memory resting, Macrophages M0, and Neutrophils (as shown in Fig. 10b-A). Similarly, in the ssGSEA analysis, a strong correlation emerged between the expression of these genes and immune cells such as T cell CD8, B cells, Cytotoxic cells, Macrophages, NK cells, and T cells (depicted in Fig. 10b-B).
2.11. Analysis of the relationship between IIRGs and immune checkpoints
Furthermore, we explored the interplay between the expression of these 11 IIRGs and immune checkpoint genes. Initially, we compared the expression patterns of immune checkpoint genes in sepsis patients and healthy samples. This comparison revealed a notable difference in the expression of these genes between the two groups, suggesting a potential impact of immune factors on the progression of sepsis (illustrated in Fig. 11A).Subsequent analysis focused on the correlation between the expression of these 11 IIRGs and immune checkpoint genes, again represented through correlation circles. Notably, genes such as IL21R, NCR3, and TRAV30 from our characteristic set exhibited a significant positive correlation with the expressions of most immune checkpoint genes (Pearson correlation analysis, P < 0.05), including HLA-DPB1, HLA-DPA1, HLA-DQB2, HLA-DRB1, HLA-DQA1 (shown in Fig. 11B). Conversely, the remaining IIRGs tended to show a negative correlation with the expression of most immune checkpoint-related genes (Pearson correlation analysis, P < 0.05), as exemplified by the same set of genes (depicted in Fig. 11B). These findings highlight complex interactions between IIRGs and immune checkpoints, offering valuable insights into their potential roles in the immune dynamics of sepsis.
2.12. Protein interaction network analysis of important immunity related genes
we explored the interaction of these 11 IIRGs in the STRING database(shown in Fig. 12)., and found that these 11 genes had strong interaction with each other, especially the genes JAK2 and IL10 had a high degree in the network, that is, they had strong interaction with other genes.
2.13. Association analysis of IIRGs and drug sensitivity
In order to explore the relationship between these 11 IIRGs we finally identified and drug sensitivity, we first used the pRRophetic package to calculate the IC50 of 14 drugs corresponding to the samples in the sepsis dataset GSE134347 for CCLE database. Our subsequent results showed that among the 14 drugs, except for PD.0325901, PF2341066, PHA.665752, the IC50 of the remaining drugs was significantly different between sepsis patients and healthy patients (P < 0.05, Fig. 13A). Indicating that there is indeed a difference in drug efficacy between sepsis patients and healthy patients. Furthermore, we also conducted correlation analysis on 11 IIRGs and drug sensitivity IC50, and the results showed that the 11 IIRGs we finally identified were indeed significantly correlated with drug sensitivity (Fig. 13B). For example, the expression of GRB2 gene was strongly and positively correlated with the IC50 of Erlotinib (r = 0.51, P < 0.05), while it was negatively correlated with the IC50 of PD.0325901 (r = -0.61, P < 0.05).