Overview of the entire pipeline. DRPPM-PATH-SURVEIOR is implemented in the R environment, and packages will be automatically installed during runtime. There are four major components of the DRPPM-PATH-SURVEIOR (Fig. 1), which include: 1) The Interactive (UI) Mode. This feature allows for a point-and-click pathway survival analysis. The interactive mode offers the ability to perform select immune deconvolution in real time and perform univariate or complex multivariate analyses of clinical features. 2) The Pipeline (Advanced) Mode. This feature performs a complete survival analysis of pathway databases and gene features. Covariates can be included as part of the systematic screening, and the P-values are corrected by Benjamini-Hochberg. 3) Pathway Connectivity. This feature allows the user to evaluate the similarity between pathways and group pathways that are predictive of the clinical outcome. 4) Hazard Ratio Ranked Gene Set Enrichment Analysis (GSEA). This user interface performs a GSEA analysis based on the gene-level hazard ratio ranking derived from the Pipeline Mode. This feature facilitates the identification of clinically relevant pathways and, in turn, identifies regulators that can drive the underlying gene expression. Additional installation and usage instructions is available in the Supplementary Method Section.
“On-the-fly” Mode: Shiny Interface
DRPPM-PATH-SURVEIOR is a comprehensive framework for analyzing and visualizing “on-the-fly” associations of immune signatures and pathways scores with a clinical endpoint. The application facilitates the partitioning of patients based on pathway scores, estimated immune cells, and gene expression level, followed by univariate Cox-regression survival analysis and multivariate Cox-regression analysis. DRPPM-PATH-SURVEIOR uses several R packages, including survival (v3.2-11), survminer (v0.4.9), GSVA (v1.40.1) [10], and immunedeconv [31](v2.1.0). Pathway score is calculated with the gsva() function based on ssGSEA, GSVA, plage, or zscore. Immune deconvolution is performed with the immunedeconv R package, which includes several deconvolution packages, such as CIBERSORT [32], ESTIMATE [33], and MCP counter [34]. For multivariate analysis, a covariate can be selected from the user-provided meta-information file. The multivariate survival analysis can be performed through additive and multiplicative interaction of two or more variables.
To evaluate the association between pathway and survival S over time t, S(t), is defined through hazard function, h(t), as
$${S}\left({t}\right)={e}{x}{p}(-\underset{0}{\overset{{t}}{\int }}{h}\left({x}\right){d}{x})$$
$${h}\left({t}\right)={{h}}_{0}\left({t}\right)\times {{e}}^{\left({{\beta }}_{1}\times {{X}}_{1}\right)}$$
where h(t) is a hazard rate at time t and h0(t) is the baseline hazard rate, β1 is the coefficient related to hazard with β1 > 0 as high risk and β1 < 0 as low risk for X1, a dichotomized based on the gene or pathway score.
To evaluate the pathway association with survival after adjusting for patient meta information X2 is defined as
$${h}\left({t}\right)={{h}}_{0}\left({t}\right)\times {{e}}^{\left({{\beta }}_{1}\times {{X}}_{1}+{{\beta }}_{2}\times {{X}}_{2}\right)}$$
To evaluate the associated interaction between the pathway and patient meta information is defined by β3 as
$${h}\left({t}\right)={{h}}_{0}\left({t}\right)\times {{e}}^{\left({{\beta }}_{1}\times {{X}}_{1}+{{\beta }}_{2}\times {{X}}_{2}+{{\beta }}_{3}\times {{X}1\times {X}}_{2}\right)}$$
Pipeline Mode: Systematic Pathway-level Survival Analysis
To facilitate the identification of top high-risk pathways and genes, we have developed a pipeline to systematically assess pathways associated with hazard by a Cox proportional hazard function. The user can provide or select individual genes and pathway databases to perform a systematic screening. Each expression feature is stratified based on a median cutoff. The user also has the option of performing a systematic screening with the inclusion of a covariate as an additive or multiplicative interactive model. The p.value is calculated on the likelihood ratio, wald test. An adjusted p.values can be calculated based on Benjamini-Hochberg correction method. In the output table, genes and pathways are ranked by the likelihood ratio p.value.
Connectivity Mode: Pathway Gene-set Connectivity
The Connectivity Mode offers the user the ability to analyze the similarity between pathways associated with survival. The hazard ratio ranked pathways from the pipeline mode can be used as input to the DRPPM-Jaccard-Connectivity R Shiny app to generate hierarchical clustering based on gene-set similarity. A Jaccard function can calculate distance between pathways:
The Jaccard score function J for two pathways A and B is defined as
J(A, B) = | A ∩ B | / | A ∪ B |
where
A contains n set of genes, A = [a1, a2, …, an]
B contains m set of genes, B = [b1, b2, …, bm]
The Jaccard matrix can be visualized as a heatmap.
Next, the pathways can be clustered using the hclust function from R (v4.1.0) into k-groups (user-specified). Clusters can be visualized as a dendrogram. To overlap survival-associated gene expression, genes within the pathway can be displayed as a table with a flexible sorting feature and added annotation information.
GSEA Mode: Hazard Ratio Ranked Gene Set Enrichment Analysis
From the pipeline mode, we can derive a hazard ratio ranked gene list, which can then be applied as input to the DRPPM-HazardRatio-Ranked-GSEA R Shiny app. This application takes a two-column file of the gene symbols and hazards ratios, which is used as input to the GSEA function from clusterProfiler (v4.0.5). The application performs GSEA, and results can be visualized as a table with additional options for visualizing the GSEA plots through the gseaplot2 function from enrichplot (v1.12.3).