A close relationship between HP infection and immune infiltration in GC
To explore the difference of immune status between HP + GC patients and HP- GC patients, ESTIMATE algorithm and ssGSEA were carried out in 155 HP + GC patients and 276 HP- GC patients. The result demonstrated that effective memory T cell (Tem), follicular helper T cells (Tfh), γδ T cell (Tgd), dendritic cells (DC), macrophages and neutrophils were evaluated in HP + GC patients (Fig. 2A). Besides, HP + GC patients had higher immune score and ESTIMATE score than HP- GC patients, indicating the high-level immune infiltration of HP + GC patients (Fig. 2B and 2C). Therefore, it was of both clinical and scientific significance to construct an immune-related predictive signature specifically for HP + GC patients.
Construction of IRSHG via co-expression network analysis
WGCNA was performed on 153 HP + GC patients to identify the immune-related gene modules based on the expression data matrix of IRGs. The optimal soft-thresholding power of 4 was chosen because it met the scale-free topology threshold of 0.92 (Fig. 2D and 2E). The IRGs were clustered into eight modules using one-step network construction method, which was presented in a clustering dendrograms (Fig. 2F). To identify immune-related modules, Pearson correlation analysis was used to analyze the correlation of these modules with immune score and ESTIMATE score. The module‐trait association plot showed the blue module and brown module were highly correlated with immune score and ESTIMATE score (Pearson correlation coefficient > 0.7, p < 0.05), which were considered to be immune-related modules (Fig. 2G). GO and KEGG enrichment analysis revealed that the 252 IRGs in immune-related modules mainly enriched in the biological process (BP) of cytokine − mediated signaling pathway, cellular component (CC) of external side of plasma membrane, molecular function (MF) of immune receptor activity, and KEGG pathway of cytokine − cytokine receptor interaction (Fig. 2H, supplement table 1). Consequently, univariate Cox regression analysis was applied to analyze these IEGs and 23 IRGs were proved to be survival-associated (Fig S1A). Then LASSO Cox regression analyses and multivariate Cox regression were performed in the training set to construct IRSHG for predicting the prognosis of HP + GC patients (Fig. 3A and 3B, Fig S1B). The formula of IRSHG was as follows: IRSHG score = 0.706275 x TGFB1 expression + (-0.072151) x NOX4 expression + 0.462278 x F2R expression + 0.220028 x TLR7 expression + (-0.238008) x CIITA expression + (-0.466826) x RBP5 expression + (-0.795864) x KIR3DL3 expression
IRSHG has a good predictive performance in prognosis prediction
HP + GC patients in the training set, validation set and total set were divided into the high-risk group and the low-risk group based on median IRSHG score. Five different analyses were performed to evaluate the predictive power of IRSHG. Firstly, ROC curve analysis of the total set showed good predictive capability of IRSHG (AUC = 0.739 for 1-year, 0.750 for 3-years and 0.719 for 5-year survival) (Fig. 3C). Secondly, for the total set, HP + GC patients in the low-risk group were proved to have better prognoses than those in the high-risk group using Kaplan-Meier analysis (Fig. 3D). Thirdly, risk score plot demonstrated the risk score distribution, survival status and 7 genes expression profile in the high- and low-risk groups (Fig. 3E). These three analyses were also conducted on the HP + GC patients in the training set and validation set, which further verified the predictive power of IRSHG (Fig S1A and S1B). Through PCA plot and t-SNE analysis, we found that HP + GC patients in different risk groups could be well separated in two directions based on the 7 genes of IRSHG (Fig. 3F and 3G). Compared with four published prognostic signatures for GC patients, IRSHG had the highest AUC value in predicting the prognosis of HP + GC patients (Fig. 3H). Overall, above results demonstrated the excellent predictive power of IRSHG, and the total set was used for subsequent study due to the limited sample size of HP + GC patients.
CNV analysis reveals different mutation profiles in the high- and low-risk groups
Previous studies have reported that tumor microenvironment (TME) was closely associated with copy number alterations in tumor50, 51. Thus the difference of CNV between the high-risk group and the low-risk group was also analyzed. The CNV data of 55 patients in the high-risk group and 58 patients in the low-risk was publicly available, which was used for subsequent analyses. For the copy number load, patients in the high-risk group showed a lower burden of copy number gain and loss than those in the low-risk group at the focal-level, and the burden of copy number gain at arm-level also fell in the high-risk group (Fig. 4A). The distribution of the gistic-score and composite copy number alteration frequency across all chromosomes were compared between the high-risk group and the low-risk group, which was visualized in Fig. 4B. For the high-risk group, significant amplifications demonstrated peaks in 3q26.1 and 20p13, while the frequently deleted genomic regions were 2q22.3, 8p11.22 and 16q12.2. Significant amplifications (3q26.1, 4q13.2 and 7q34) and deletions (1q44, 12p13.31, and 19q13.33) within chromosomal regions were identified in the low-risk group. In addition, mutation landscape of several driver genes was demonstrated across samples organized by IRSHG (Fig. 4C). Compared with the high-risk group, the low-risk group had more mutations in several essential driver genes, such as TP53, EGFR and MET, while mutations in TGFB1, UCA1 and CCNE1 were more prevalent in the high-risk group. Notably, copy number alteration frequency of genes in PI3K signaling pathway (including PIK3CA, PIK3R1 and PTEN) was higher in the low-risk group.
Hallmark pathways enriched in the high- and low-risk groups
GSEA was conducted to investigate the cancer hallmark pathways associated with IRSHG. Several immune-related pathways, including B cell receptor signaling pathway, leukocyte transendothelial migration, natural killer cell mediated cytotoxicity, and Th1 and Th2 cell differentiation were upregulated in the high-risk group (Fig. 4D, supplement table 2). By contrast, the low-risk group was enriched in some metabolic pathways, such as carbon metabolism, DNA replication, drug metabolism, and nitrogen metabolism.
The IRSHG-integrated nomogram further improves prediction ability for prognosis
Univariate and multivariate analyses were applied to integrate clinicopathological characteristics and IRSHG, aiming to identify independent prognostic indicators for HP + GC patients. IRSHG and the clinical stage were determined as independent prognostic indicators for OS (p < 0.05), which were used for the construction of the prognostic nomogram (Fig. 5A). To provide a quantitative instrument for predicting OS of HP + GC patients, a prognostic nomogram built by IRSHG and the clinical stage was created (Fig. 5B). The calibration plot of 1-, 3- and 5-year OS demonstrated that the nomogram performed with moderate accuracy compared to an ideal model, indicating the excellent reliability and ideal consistency of the nomogram (Fig S3A, S3B and S3C). In addition, the DCA curves demonstrated IRSHG-integrated nomogram had a better net benefit compared with IRSHG and pathological stage in predicting 1-year, 3-year, and 5-year OS of HP + GC patients (Fig S3D, S3E and S3F).
Patients with high IRSHG score tend to exhibit higher immune abundance
Multiple immune-related pathways were enriched in the high-risk group. Based on this observation, ESTIMATE algorism and ssGSEA were conducted to explore the difference of immune cell infiltration between two groups (Fig. 5C). Compared with the low-risk group, the high-risk group had higher immune score, ESTIMATE score, and most immune infiltration signatures, including macrophages, Th1 cells, plasmacytoid dendritic cells (pDC), interdigitating dendritic cells (iDC), central memory T cell (Tcm), APC Co-stimulation, Tem, Type II IFN response, mast cells, natural killer cell (NK), DC, Tgd, eosinophils, cytotoxic cells and neutrophils. While Th2 cells, Th17 cells, Treg and major histocompatibility complex class (MHC) II were significantly associated with the low-risk group. Notably, the low-risk group exhibited a tendency toward a higher MHC I score than the high-risk group, although this difference was not statistically significant (p = 0.053). Using the NTP analysis, the high-risk group was found to be related to previously reported transcriptome-based GC molecular subtypes, including Type II in Cao’s classification, invasive type in Lei’s classification, and G-DIF type in Tan’s classification. Meanwhile, the low-risk group was found to be closely associated with Type I in Cao’s classification, metabolic type in Lei’s classification, and G-INT type in Tan’s classification. The metabolic type was inconsistent with the metabolic pathways enriched in the low-risk group, indicating the reliability of the NTP analysis. To analyze the correlation of estimated absolute scores for each immune cell type by ssGSEA in the high- and low-risk groups, Pearson correlation analysis was conducted, and the result was showed in the double correlation heatmap (Fig. 5D).
Patients with low IRSHG score tend to be sensitive to anti-PD1 immunotherapy
To explore the relationship between IRSHG and immunotherapy, Tide analysis was performed to predict the immunotherapy response of HP + GC patients in the high- and low-risk groups. The result showed that HP + GC patients in the low-risk group had lower Tide score than those in the high-risk group, suggesting that they may benefit more from immunotherapy (Fig. 6A). Meanwhile, more potential responders to anti-PD-1 inhibitors were observed in the low-risk group compared with the high-risk group using Chi-square test (Fig. 6B). Therefore, we further compared the expression of PDCD1 between the low-risk group and the high-risk group, and HP + GC patients in the low-risk group was proved to have higher PDCD1 expression (Fig. 6C). Considering that HP + GC patients in the low-risk group may be more sensitive to anti-PD-1 inhibitors than those in the high-risk group, an external anti-PD1 immunotherapy cohort PRJEB25780 was used to verify this prediction. Among 12 anti-PD-1 responders (including complete response and partial response) in the PRJEB25780 cohort, the IRSHG scores of 9 responders were lower than the median IRSHG score of the PRJEB25780 cohort, although not statistically significant (p = 0.11) (Fig. 6D). However, the PRJEB25780 cohort did not provide the HP infection status of each patient, thus this verification may only be used for reference.
Identification of specific chemotherapeutic drugs associated with IRSHG
After removing duplicate and invalid drugs (drugs with NA value in more than 20% of the samples), a total of 2005 drugs were obtained from three different pharmacogenomic databases (CTRP, GDSC and PRISM) for the chemotherapeutic drug sensitivity prediction. OncoPredict, an upgrade version of pRRophetic R package52, was applied to predict the drug sensitivity for each patients using ridge regression model based on the gene expression matrix. The difference of imputed drug sensitivity was compared between the high- and low-risk groups. There were 98 drugs with statistical significance (|log2(fold change)| >0.5, p < 0.05, 59 drugs for the low-risk group and 39 drugs for the high-risk group) identified via this analysis (Fig. 6E, supplement table 4). Interestingly, we found the response to these drugs might closely associated with the CNV of HP + GC patients. For example, patients in the low-risk had more MET and EGFR mutation than those in the high-risk group, which were more sensitive to Mk-2461 (c-MET inhibitor) and Pazopanib (VEGFR inhibitor) (Fig. 6E). Therefore, with the IDWAS function in OncoPredict R package, drug response sensitivity data of 98 identified drugs and CNV data of HP + GC patients were used to explore the potential biomarkers for the use of these drugs. Genes mutated with a frequency > 10% across all patients were included, which might render potential drugs ineffective or effective. Specific association between drugs and mutations was shown in a drug-mutation network (Fig S4, supplement table 5).
CMap analysis uncovers drugs which may reverse IRSHG score
Based on the basic concept called ‘signature reversion’53, the computational drug discovery was conducted to identify drugs with the ability to reverse IRSHG-associated gene expression pattern using CMap data (Fig S5A). Through Pearson correlation analysis, a total of 300 IRSHG-associated genes were identified (Fig S5B, supplement table 6). The result of CMap analysis uncovered several compounds of which gene expression patterns were oppositional to the IRSHG-specific expression patterns, and a lower CMap score indicated higher perturbation ability (Fig S5C, supplement table 7). In addition, some candidate drugs have been proven to be effective in GC with HP-infection. For example, AH-6809 (a kind of EP 2 antagonist, ranked 3th among all drugs) could inhibit HP-induced uPA and uPAR mRNA expressions, which suppressed the process of degradation of the extracellular matrix, tumor invasion, and metastasis of GC54.