Data acquisition and preprocessing
We downloaded RNA sequencing (RNA-Seq) gene expression data of 49 normal groups and 502 tumor groups from the Cancer Genome Atlas ( TCGA ) (https://portal.gdc.cancer.GOv), a variety of corresponding clinicopathological information, including age, gender, survival time, survival status, T stage, M stage, Nstage, TMNstage, At the same time, from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.GOv/GEO). Then, based on the HALLMARK data set in gene set enrichment analysis (GSEA), I downloaded 200 genes related to the hypoxia pathway. From the TIMER (The Tumor Immune Estimation Resource) website (http://cistrome.dfci.harvard.edu/TIMER ), We downloaded 1650 immune function genes.
We used the string (functional protein binding network) tool to construct the PPI network of 200 hypoxia-related genes; its purpose is to find out the core hypoxia gene.
Screening of the co-related genes of immunity and hypoxia (HMCORS)
We used the "limma" package in R language to analyze the differences between normal and tumor groups. The screening criteria were | logFC | > 2 and FDR <0.05. At the same time, the correlation of significantly different genes, immune function genes, and hypoxia core genes were analyzed, respectively. The threshold of screening was the correlation coefficient (| cor | < 0.4, p-value < 0.05). The intersection of immune and hypoxia-related genes was obtained through the online Venny tool( https://bioinfogp.cnb.csic.es/tools/venny ) to obtain the co-related genes of immunity and hypoxia (HMCORS).
KEGG and GO functional enrichment analysis
The annotation, visualization, and integrated discovery (David) performed KEGG and GO functional enrichment analysis for immune hypoxia co-related genes.
Establishment and validation of immune and hypoxia-related risk models
The most significant prognostic genome was first screened by univariate Cox regression analysis and the minor absolute shrinkage and selector operator (lasso). Then the risk coefficient of each sample was calculated according to the risk gene expression and multivariate Cox regression analysis. The risk score was calculated as follows: risk core = risk coefficient 1 * gene expression 1 +.... risk coefficient N * gene expression N. The high and low-risk groups are divided according to the median value of the risk score. The risk groups in the risk model established by the GEO data set are divided according to the optimal detection thresholds. Kaplan Meier (KM) survival analysis was performed for high and low-risk groups, and receiver operating characteristics (ROC) curves were used to evaluate the predictive value of the risk model for three years (1, 3, and 5 years).
Establishment and validation of a nomogram for predicting prognosis
Firstly, univariate and multivariate Cox regression analysis evaluated the risk score and multiple pathological and clinical factors. Then, the nomogram was constructed according to the risk score and numerous clinical characteristics (age, gender, T, M, N, and TMNstage) as prognostic factors. The calibration chart was used to evaluate the stability of the predictive ability of the nomogram.
Survival analysis of high and low-risk groups under different clinicopathological groups
Based on the "survival" package in R language, the clinicopathological parameters were divided into age < = 65, age > 65, MALE, FEMALE, M0, M1, N0-1, N2-3, T1-2, T3-4, stage-II, and stageIII-IV groups. Then the survival of KM was analyzed in different clinicopathological groups.
Scoring of Immune cell infiltration in different risk groups
CIBERSOFT calculates the composition ratio of 22 immune cells in vivo based on the deconvolution alGOrithm and the gene expression in each sample (18). We downloaded the immune score of tcga-LUSC samples calculated, based on the CIBERSOFT method, from the TIMER database (tumor immune estimation resource). At the same time, we used the "e1071", "parallel," and "preprocessor" packages in the R tool to calculate the immune cell ratio of LUSC samples in the GEO database. The standard value of screening is p-value < 0.05. At the same time, we used the "e1071", "parallel," and "preprocessor" packages in the R tool to calculate the immune cell ratio of different risk groups of LUSC in the GEO database. The standard value of screening is p-value < 0.05. Finally, we analyzed the immune scores and immune cell infiltration of different risk groups according to the "ggpubr" package in the R language.
Gene set variation analysis (GSVA)
The "GSVA" package of the R tool and multiple data sets (GO, KEGG, HALLMARK) in the gene enrichment analysis tool (GSEA) was used to analyze the enrichment pathways of different risk groups. P < 0.05 was considered statistically significant.
Prediction of chemosensitivity of patients in different risk groups
We use the "pRRophetic" package in R language to predict the half-maximal antibiotic concentration (IC50) of patients in different risk groups for chemotherapeutic drugs to infer patients’ drug resistance in high-risk and low-risk groups. The primary method is to use the genetics of drug sensitivity in Cancer (GDSC) (www.cancerrxgene.org/) cell line expression Spectrum, TCGA and GEO gene expression profiles were used to construct ridge regression model, and then pRRophetic method in R package was used to calculate IC50 of chemotherapy drugs
Establishment of CeRNA network
the encyclopedia of rna interactomes (ENCORI) and microRNA target prediction database (miRDB) were used to score the corresponding target genes mRNA, miRNA, and lncRNA of risk genes. Then, correlation analysis and survival analysis were carried out for risk genes and predicted target genes. The screening criteria were the default settings in each online website, with correlation coefficient (COR < - 0.1) and p-value < 0.05.
Immune and hypoxia co-related genes
Through the difference analysis between normal and tumor groups, 3632 genes with a significant difference were obtained. Then, the PPI network of 200 hypoxia pathway genes was constructed by using a string database, and then 49 immune core genes with the highest degree of protein interaction were screened. Cytoscape software (Cytoscape 3.7.1) was used Mapping. Draw the correlation between all hypoxia core genes Matrix heat map. The correlation between immune function genes and hypoxia core genes and differentially expressed genes were analyzed, respectively, and 399 hypoxia-related genes (HPCORS) and 1750 immune-related genes (IMCORS) were obtained. Finally, we used Venny online software (Venny 2.1.0) to get the intersection of immune and hypoxia-related genes and draw the Wayne diagram. Finally, 380 immune and hypoxia-related genes (HMCORS) were obtained.
Go, and KEGG functional enrichment analysis was performed on HMCORS
Online David analysis was performed to analyze the go and KEGG function enrichment of 380 HMGs. We found that these HMCORS were mainly enriched in the immune and hypoxia-related pathways of going analysis, such as response to hypoxia, transforming growth factor, beta activated receptor activity, blood vessel matching, regulation of immune response, vasculogenesis, Negative regulation of cytokine secret, immune response, changing growth factor-beta binding, leukotriene production involved in inflammatory response and gluconeogenesis. The KEGG pathways of immunity and hypoxia mainly include glycolysis/gluconeogenesis, Salmonella infection, ECM receptor interaction, and the HIF-1 signaling pathway.
Establish and predict immune and hypoxia-related risk models
Univariate Cox regression analysis was performed on 380 HMCORS, and 44 prognostic genes were screened. Then, Lasso regression analysis was used to select the best prognostic gene. Then the risk score of each sample was calculated through multivariate Cox regression analysis, as follows: risk score = (0.143 * PTGIS expression) + (0.413 * C10orf55 expression) + (0.046 * NAPSA) + (0.142 *MYADM), To visualize the difference between the two groups, we drew PCA principal component analysis. Secondly, the prognosis of the high-risk group in TCGA and GEO data sets was significantly worse than that of the low-risk group. ROC curve analysis showed that the AUC values in 1, 3, and 5 years of the model were 0.635 respectively, 0.681, 0.631. The correlation between four risk genes and risk scores was also analyzed. It was found that PTGIS, C10orf55, NAPSA, and MYADM were highly distributed in the high-risk group and positively correlated with the risk score.
Effects of risk model and clinicopathological parameters on the prognosis of patients
To further understand the impact of clinicopathological factors and risk models on the prognosis of patients, we found that risk score was the most significant independent factor through univariate and multivariate Cox regression analysis. At the same time, In these pathological groups, we found that the prognosis of high-risk group was worse than that of low-risk group, Including age > 65 (P = 0.003), Mal (P < 0.001), M0 (P < 0.001), N0-1 (P = 0.012), N2-3 (P = 0.001), T1-2 (P = 0.001), stage I-II (P < 0.001).
Establishment and evaluation of prognostic nomogram
The risk score and a variety of clinicopathological parameters were used as prognostic factors to establish a nomogram. The calibration map showed that the nomogram showed better prediction accuracy than the ideal model compared with a perfect model. The c-index of the nomogram for OS prediction was 0.652.
Immune cell infiltration landscape in different risk groups
To further explore the potential mechanism of the model in immunity, the CIBERSOFT algorithm is used to analyze the distribution level of immune cells and scores in high-risk and low-risk groups. We found that macrophages M0, CD4 memory resting, neutrophils, immunity, and matrix scoring are mainly distributed in high-risk groups in the TCGA data set, while B cells memory, B cells naive, T cells CD8 are spread primarily on low-risk groups; at the same time, using R-package spearman correlation analysis, the risk score was negatively correlated with B cells naive, T cells CD8, T cells folgular helper and dendritic cells resting, which was opposite to macrophages M0, CD4 memory resting, neutrophils, immune and matrix scores. Tox, PD1, CTLA4, BTL, LAG3, TIM3, MYADM, TGFB1 immune-related factors were higher in the high-risk group. In the GEO dataset, high-risk groups were mainly distributed in macrophages M1, neutrophils, T cells gamma delta, immune and matrix scores. Low-risk groups were primarily distributed in plasma cells, T cells CD8, T cells CD4 naïve, and NK cells resting. Risk scores were positively correlated with B cells naive, T cells CD4 memory resting, T cells gamma delta, neutrophils, eosinophil, immune and matrix scores, In contrast to T cells CD4 naive, T cells CD8 and NK cells resting, CTLA4, BTL, Lag3, tim3, MYADM, and TGFB1 immune-related factors were highly expressed in high-risk groups.
GSVA revealed that the high-risk group was closely related to immune response and hypoxia regulation
We performed gene set variation analysis (GSA) on tcga-LUSC RNA Seq data and GEO gene expression data, based on various data sets in GSEA (GO, KEGG, HALLMARK). The GSVA results of the two databases showed that 31, 34, and 19 related functional pathway genes were enriched in high-risk groups, and 10, 7, and 9 immune and hypoxia-related signal pathways were distributed in high-risk groups, respectively (Table 1). Finally, based on different data sets (go, KEGG, HALLMARK), we found that these hypoxia and immune-related pathways were highly distributed in the high-risk group and positively correlated with the risk score.
Sensitivity of patients in high and low-risk groups to chemotherapy drugs
To further explore the drug resistance of different risk groups, we used a prrophic algorithm to calculate the 50% inhibitory concentration (IC50) of various conventional chemotherapy drugs (bleomycin, cisplatin, docetaxel, doxorubicin, etoposide, gefitinib, gemcitabine, and paclitaxel) in patients in high-risk and low-risk groups. In the TCGA training set, doxorubicin, etoposide, and gefitinib had higher IC50 in high-risk groups. In the GEO test set, cisplatin, etoposide, and gefitinib had higher IC50 in the high-risk group.
Establishment and validation of risk genes related CeRNA network
We used the online analysis websites ENCORI (the Encyclopedia of RNA interactomes) and m miRDB (microRNA target prediction database) to analyze four risk gene-related target genes (mRNA, miRNA, lncRNA). Then, based on KM survival analysis and spersman correlation analysis, The target genes opposite to the correlation of risk genes and the survival analysis results were screened. The network diagram and Sankey diagram of Cerna of risk genes and target genes were constructed by Cytoscape software and "ggalluvial" in R language.