Screening and Validation of Hypoxia-related Signatures for Predicting Prognosis in Patients with Lung Cancer

doi:10.21203/rs.3.rs-4326548/v1

Download PDF

Article

Screening and Validation of Hypoxia-related Signatures for Predicting Prognosis in Patients with Lung Cancer

https://doi.org/10.21203/rs.3.rs-4326548/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Purpose

To screen the hypoxia-related signature in lung cancer, the study was designed and carried out.

Methods

Based on the TCGA-LUNG cohort, using R language-based bioinformatics analysis technology, hypoxia-related signatures were screened and verified by GEO data cohorts. The expression and prognostic value of KRT16 were further validated using immunohistochemical staining in non-small cell lung cancer tissue samples.

Results

The 73 hypoxia-related differentially expressed genes obtained from the preliminary analysis were intersected with the genes obtained by PPI analysis of the genes in the turquoise module, and then 23 hub genes were screened. Based on the 23 hub genes, a hypoxia-risk score predicting model containing 5 genes was constructed (ABCC5, CSTA, ATP11B, CLCA2, KRT16), and its better predictive efficacy was validated by the external data cohort GSE74777 and clinical samples (KRT16). The model performed well in multiple cancers and had excellent stability. Furthermore, this prediction model could also be used for assessments of clinical features, genomic alterations, immune infiltration, immunotherapy efficacy, chemotherapy efficacy, etc. In the multivariate prognostic Cox analysis, good independent predictive efficacy could also be found. In the validation analysis of clinical samples, high expression of KRT16 was found to be detrimental to the survival of patients.

Conclusion

In this study, a 5-gene risk predictive model of the hypoxia risk score was constructed, which demonstrated superior performance and served as a good independent prognostic marker in lung cancer.

Biological sciences/Cancer

Biological sciences/Genetics

Health sciences/Pathogenesis

hypoxia

lung cancer

prognosis

immune infiltration

predictive model

Hypoxia, described as an off switch for gene expression [1, 2], regulated the biological activity of cancer cells by interacting with related genes in cancer [2–5]. It had been reported that hypoxia itself and hypoxia-related indicators had important predictive values for cancer progression, metastasis, treatment response, and the prognosis of patients [6–14]. In addition, cancer predictive signatures or models identified based on hypoxia related indicators were not uncommon [15–19], which were rarely reported in non-small cell lung cancer (NSCLC). With the development of next-generation sequencing technology and the establishment of many genome-related databases, it was possible for us to finish pan-cancer analysis and re-evaluate the genome-related alterations of one cancer based on the online data [20–23], which also made it possible for us to reveal the correlation between certain cancers and hypoxia using online data [15–19].

Lung cancer was still the leading cause of cancer death in both the United States and China [24–26]. Significant progress has been made in our understanding of disease biology, the application of predictive biomarkers, and the improvement of treatments over the past 20 years, and has transformed outcomes for many patients [24–26]. Therefore, we designed this study to analyze the significance of hypoxia-related signatures by using the online sequencing data of NSCLC, which might be helpful to deepen the understanding of the biological characteristics of it. Moreover, the discovery of relevant predictive markers and the construction of predictive models might play a certain guiding role in judging the prognosis of NSCLC patients [15].

2.1 Acquisition of the raw data

Transcriptome and genome data from TCGA-LUNG were downloaded from TCGA.

database, while the hypoxia factor hypoxia-genes (hallmark_hypoxia) data were downloaded from the online database

(http://www.gsea-msigdb.org/gsea/msigdb/genesets.jsp).

2.2 Screening of hypoxia candidate gene set 1 (cd-hypoxia-geneset1)

The “limma” package was used to perform differentially expressed genes (DEGs) analysis between normal and primary tumors. |log2FC| ≥ 1 and padj value < 0.05 were considered to be significantly different. The “clusterProfiler” package was used to perform KEGG pathway and GO term enrichment analyses on all DEGs to further screen hypoxia-related pathways. The genes in the pathway were named hypoxia candidate gene set 1 (cd-hypoxia-geneset1).

2.3 Unsupervised clustering analysis based on hypoxia-DE-genes

The DEGs and hallmark hypoxia gene lists between tumor and normal samples were intersected to obtain hypoxia-DE genes. Based on the expression levels of hypoxia-DE-genes, the “Consensus Cluster Plus” package was used to perform unsupervised clustering analysis of tumor samples, dividing all tumor samples into two clusters, and then KM survival analyses on cluster 1 and cluster 2 groups were performed.

2.4 Screening of hypoxia candidate gene set 2 (cd-hypoxia-geneset2)

The “limma” package was used to perform DEGs analysis between the cluster 1 and cluster 2 subgroups. |log2FC| ≥ 1 and padj value < 0.05 were considered significant differences. The DEGs were named hypoxia candidate gene set 2 (cd-hypoxia-geneset2).

2.5 Construction of WGCNA (weighted gene coexpression network analysis)

A union of hypoxia-genes, cd-hypoxia-geneset1, and cd-hypoxia-geneset2 was taken, and a coexpression network was constructed for those genes in the union using WGCNA.

2.6 Screening of hub genes related to prognosis

Modules related to prognosis were screened for module membership and gene significance analysis. The connectivity within and outside each gene module as well as the total connectivity were calculated. With |GS|≥0.08 and kwithin (intramodule connectivity) ≥5 as the screening thresholds, they were selected as hub candidate genes.

Protein‒protein interaction (PPI) analysis was performed on genes within the modules associated with prognosis. The genes with a degree of connection ≥ 5 in the PPI analysis intersected with the hub candidate genes, and then the hub genes were obtained by screening.

Univariate Cox regression analysis was performed on the hub genes to screen genes whose expression levels had a significant effect on PFS. A p value < 0.05 was used as the screening threshold.

2.7 Construction of the regulatory network of hub genes

Based on the LncMAP database (http://bio-bigdata.hrbmu.edu.cn/LncMAP/index.jsp), the regulatory network of the hub gene, transcription factors (TFs) and lncRNAs was constructed.

2.8 Construction of the hypoxia-related risk prognostic model (hypoxia-risk score)

The 22 genes screened by univariate Cox analysis were subjected to LASSO regression analysis to remove redundant factors. Using SPSS, the “forward LR” method was used to perform multivariate Cox regression survival analysis on the genes screened above, and a risk score formula affecting PFS was constructed.

2.9 Prognostic efficacy analysis of the hypoxia-risk score model

Based on the TCGA-LUNG cohort, the hypoxia-risk score of each sample was calculated and divided into high-risk and low-risk groups according to the median, followed by PFS survival analysis.

2.10 External data validation for the prognostic efficacy of the hypoxia-risk score

model

Based on the validation cohort GSE74777, the hypoxia risk score of each sample was calculated and divided into high-risk and low-risk groups according to the median, followed by PFS survival analysis.

2.11 Stability assessment of the hypoxia-risk score model

In the TCGA-LUNG cohort, PFS survival analysis was performed on the hypoxia-risk score high-risk and low-risk groups of LUAD, LUSC, male, female, age ≤65, and age >65 samples.

2.12 Pancancer analysis of the hypoxia-risk score model

Based on the TCGA-CESC, TCGA-PAAD, TCGA-THYM and TCGA-UCEC cohorts, the hypoxia-risk score of each sample was calculated and divided into high-risk and low-risk groups according to the median, followed by KM analysis of PFS.

2.13 Comparison of predictive efficiency with reported hypoxia models

Based on the reported 26-gene hypoxia model, the hypoxia score was calculated for each sample in the TCGA-LUNG cohort, and an ROC curve that predicted prognosis was drawn.

2.14 Analysis of the clinical characteristics of the high- and low-risk hypoxia risk scores

groups

In the TCGA-LUNG cohort, the clinical characteristics (age, pathological type, sex, clinical stage, etc.) of the hypoxia-risk score high- and low-risk groups were analyzed.

2.15 Correlation analysis between hypoxia-risk score and other scores

In the TCGA database (https://gdc.cancer.gov/about-data/publications/panimmune), data from the TCGA-LUNG cohort related to mutation load, neoantigen load, stemness index, chromosomal instability, and homologous recombination deficiency were obtained, and their correlation with hypoxia-risk score was analyzed.

2.16 Immune infiltration analysis of the high- and low-risk hypoxia risk scores

groups

The score of immune cell infiltration in the TCGA-LUNG cohort was calculated by “CIBERSORT” (model=absolute, permutation=1,000, LM22 signature).

2.17 Analysis of immune score, stroma score, and tumor purity corresponding to

The hypoxia-risk score high- and low-risk groups

In the TCGA-LUNG cohort, the ESTIMATE method was used to calculate the hypoxia-risk score, immune score, matrix score, and ESTIMATE score for low-risk and high-risk samples.

2.18 Somatic mutation analysis of hypoxia-risk score high- and low-risk groups

Based on the TCGA-LUNG cohort, somatic mutation analyses of hypoxia-risk score high- and low-risk groups were carried out by “maftools”.

2.19 Copy number variation analysis of hypoxia-risk score high and low risk

groups

Based on the TCGA-LUNG cohort, the copy number variation of the hypoxia-risk score high- and low-risk groups was analyzed separately.

2.20 Immunotherapy efficacy analysis of high- and low-risk hypoxia-risk scores

groups

The GSE135222 dataset with immunotherapy data was used. The hypoxia-risk score was calculated for each sample and divided into high-risk and low-risk groups according to the median. The KM analysis of PFS was performed, followed by the immunotherapy efficacy of the hypoxia-risk score high- and low-risk groups.

2.21 Chemotherapy efficacy analysis of high- and low-risk hypoxia-risk scores

groups

Based on the TCGA-LUNG cohort, the chemotherapy efficacy of hypoxia-risk score low-risk and high-risk group samples was analyzed separately.

2.22 Chemotherapy resistance assessment in the hypoxia-risk score high and low

risk groups

Based on the TCGA-LUNG cohort, using the "pRRophetic" package, IC50 values of the chemotherapeutic drugs involved in each sample were calculated.

2.23 Independent prognostic efficacy assessment of the hypoxia-risk score

In the TCGA-LUNG cohort, with PFS as the endpoint, univariate Cox analysis and multivariate Cox analysis were performed for clinical characteristics and hypoxia-risk scores, and then forest plots were drawn.

2.24 Construction of the nomogram

In the TCGA-LUNG cohort, nomograms of clinical information and hypoxia-risk score versus PFS were constructed.

2.25 Sample Collection and Immunohistochemistry

The NSCLC samples in this study were collected from the First Affiliated Hospital of Shandong First Medical University from June 2012 to February 2020. All tumor tissue specimens were surgically resected followed by formalin fixation and paraffin embedding (FFPE). Tissue blocks were sectioned into continuous 3-4 μm sections for hematoxylin and eosin (H&E) staining and immunohistochemical (IHC) staining. Slides were IHC stained with specific primary anti-KRT16 rabbit polyclonal antibody (bs-1270R, Bioss Ltd) at 1:400 overnight at 4°C, followed by SP Kit (Rabbit) (sp-0023, Bioss Ltd) and DAB staining according to the manufacturer’s protocol. All H&E-stained and immunohistochemical (IHC)-stained slides were examined independently by two experienced pathologists. The differences among subgroups were analyzed by the Wilcoxon rank sum test. The association of KRT16 expression with clinicopathological characteristics was analyzed by Fisher’s exact test. The OS data were analyzed using HR estimation and the log-rank test.

3.1. Download of the raw data

The flow chart of this study was provided in Figure 1. The transcriptome and genome data of TCGA-LUNG were downloaded from the TCGA database (Table 1), while the hypoxia factor hypoxia genes (hallmark_hypoxia) were downloaded from the online website (http://www.gsea-msigdb.org/gsea/msigdb/genesets.jsp).

3.2 Screening of hypoxia candidate gene set 1 (cd-hypoxia-geneset1)

A total of 5238 differentially expressed genes (DEGs) were screened between normal and primary tumors. The corresponding volcano plots and heatmaps of DEGs were provided in Supplementary Figure 1A and 1B. KEGG pathway enrichment analysis performed on all DEGs showed that no hypoxia-related pathways were found. GO term enrichment analysis performed on all DEGs showed that two hypoxia-related pathways were found: GO:0001666 (response to hypoxia) and GO:0071456 (cellular response to hypoxia). The two pathways contained a total of 118 genes, named hypoxia candidate gene set 1 (cd-hypoxia-geneset1).

3.3 Unsupervised clustering analysis based on hypoxia-DE-genes

The DEGs and hallmark hypoxia gene lists between tumor and normal samples were intersected to obtain hypoxia-DE genes, including 75 genes. According to the expression levels of 75 hypoxia-DE genes, unsupervised clustering of tumor samples was performed, and then all tumor samples were divided into two clusters (Supplementary Figure 1C and Supplementary Figure 1D). KM survival analysis was performed on both cluster 1 and cluster 2 groups, and there was a significant difference in PFS between the two groups (P<0.0001, Supplementary Figure 1E).

3.4 Screening of hypoxia candidate gene set 2 (cd-hypoxia-geneset2)

DEG analysis was performed between the cluster 1 and cluster 2 subgroups, and 2960 DEGs were screened and named hypoxia candidate gene set 2 (cd-hypoxia-geneset2), (Supplementary Figure 1F-1L). Both the heatmap of cd-hypoxia-geneset2 (Supplementary Figure 1F) and the results of principal component analysis (Supplementary Figure 1G) indicated that there were obvious differences between cluster 1 and cluster 2.

3.5 Construction of WGCNA coexpression network

The hypoxia-DE-genes cd-hypoxia-geneset1 and cd-hypoxia-geneset2 were merged, and a total of 3195 genes were obtained. Using WGCNA, a coexpression network was constructed for these 3195 genes (Supplementary Figure 1H-1L).

3.6 Screening of hub genes related to prognosis

Since the turquoise module had a strong correlation with DFI and DSS, module membership and gene significance analyses were performed on the turquoise module and DFI (Supplementary Figure 2A). Subsequently, the connectivity within each gene module and the connectivity outside the module as well as the total connectivity were calculated. The genes in the turquoise module were selected, and |GS|≥0.08 and kwithin (connectivity within the module)≥5 were used as the screening thresholds, and a total of 73 genes were obtained (Supplementary Figure 2B). PPI analysis was performed on the genes in the turquoise module, and the genes with a degree of connection ≥ 5 in the PPI analysis were selected to intersect with the 73 genes screened by WGCNA. Then, 23 hub genes were obtained (SPRR2A, SPRR1A, PIK3CA, KRT6C, KRT6B, KRT17, KRT16, KRT14, JAG1, ITGA6, IRF6, GPC1, DVL3, DSG3, DSC3, DLX5, DLG1, CSTA, CLCA2, BMP7, ATP11B, ARRB1, ABCC5).

Univariate Cox regression analysis was performed on the 23 hub genes to screen genes whose expression level had a significant effect on PFS. Witha p value < 0.05 as the screening threshold, 22 genes were obtained (SPRR2A, SPRR1A, PIK3CA, KRT6C, KRT6B, KRT17, KRT16, KRT14, JAG1, IRF6, GPC1, DVL3, DSG3, DSC3, DLX5, DLG1, CSTA, CLCA2, BMP7, ATP11B, ARRB1, and ABCC5). The PFS KM curves of six representative genes, KRT16, CLCA2, ATP11B, CSTA, ABCC5 and DAPL1, were shown in Figure2A-2F.

3.7 Construction of the hub gene regulatory network

Based on the LncMAP database (http://biobigdata.hrbmu.edu.cn/LncMAP/index.jsp), the TF-gene pairs interacting in lung cancer were screened. According to the lncRNA-gene-TF interaction relationship in lung cancer, gene-lncRNA pairs were screened (both in the LUAD LUSC cohort and the probability both in the discovery set and the probability in the validation set>0.3). Then, the regulatory network of the hub gene and its related TFs and lncRNAs was further constructed (Supplementary Figure 2C).

3.8 Construction of the hypoxia-related risk prognostic model (hypoxia-risk score)

The 22 genes screened by univariate Cox analysis were subjected to LASSO regression analysis to remove redundant factors, and 15 genes were screened, (Supplementary Figure 3A, 3B and 3D). Multivariate Cox regression survival analysis was performed on the 15 genes screened above, and 5 genes were screened (Supplementary Figure 3C). Then, a risk score formula affecting PFS was constructed:

Hypoxia-risk Score = 0.07*KRT16- 0.05*CLCA2+ 0.293*ATP11B- 0.114*CSTA- 0.173*ABCC5

3.9 Analysis of prognostic efficacy of the hypoxia-risk score model

Hypoxia-risk scores were calculated for each sample from the TCGA-LUNG cohort, and they were divided into high-risk and low-risk groups according to the median. KM analysis was performed, and the high-risk group had a shorter PFS time than the low-risk group (Figure 3A), which was consistent with the results of the distribution plot of the hypoxia-risk score (Figure 3B) and the PFS distribution plot (Figure 3C). The expression heat plot showed the low expression of hypoxia-risk score-related 5 genes in the high-risk group (Figure 3D). The predictive prognostic ROC plot illustrated a good prognostic ability of the hypoxia-risk score model (AUC for 1-year survival: 0.615; AUC for 3-year survival: 0.642; AUC for 5-year survival: 0.64) (Figure 3E).

3.10 External data validation of the prognostic efficacy of the hypoxia-risk score

model

Dataset GSE74777 was used for external data validation. The hypoxia-risk score of each sample was calculated and divided into high-risk and low-risk groups according to the median. Similar to the TCGA-LUNG cohort, the KM analysis, distribution of the hypoxia-risk score and PFS distribution plot also showed that patients in the high-risk groups had shorter PFS than patients in the low-risk groups (Figure 3F-3H). Meanwhile, the expression heat plot of hypoxia-risk score-related 5 genes in the GSE74777 dataset showed a similar trend to that in the TCGA-LUNG cohort (Figure 3I).

3.11 Stability assessment of the hypoxia-risk score model

In the TCGA-LUNG cohort, PFS survival analysis was performed on LUAD samples, LUSC samples, male samples, female samples, age ≤65 samples, and age >65 samples. These results showed that in the above subgroups, the high-risk group of the hypoxia-risk score model indicated a worse prognosis than the low-risk group (Figure 4A-4F), which demonstrated the good prognostic efficacy of the hypoxia-risk score model.

3.12 Pancancer analysis of the hypoxia-risk score model

Based on the TCGA-CESC, TCGA-PAAD, TCGA-THYM and TCGA-UCEC cohorts, hypoxia-risk scores were calculated for each sample, and they were divided into high-risk and low-risk groups according to the median. The KM analysis showed that patients in the high-risk groups had shorter PFS than patients in the low-risk groups (Figure 4G-4J).

3.13 Comparison of predictive efficacy with existing hypoxia models

Based on the reported 26-gene hypoxia model, the hypoxia score for each sample in the TCGA-LUNG cohort was calculated, and a predictive prognostic ROC curve was drawn (AUC for 1-year survival: 0.555; AUC for 3-year survival: 0.489; AUC for 5-year survival: 0.458, Figure 4K). Compared with the reported 26-gene hypoxia model, the 5-gene hypoxia-risk score model (AUC for 1-year survival: 0.615; AUC for 3-year survival: 0.642; AUC for 5-year survival: 0.64, Figure 3E) trended toward good prognostic accuracy for lung cancer patients.

3.14 Analysis of the clinical characteristics of the high- and low-risk hypoxia risk scores

groups

In the TCGA-LUNG cohort, the clinical characteristics (age, pathological type, sex, clinical stage) of the hypoxia-risk score in the high- and low-risk groups were analyzed (Figure 5A-5D). Notably, patients with lung adenocarcinoma and high clinical staging had relatively higher hypoxia-risk scores.

3.15 Correlation of hypoxia-risk score with other scores

In the TCGA-LUNG cohort, the hypoxia risk score was positively correlated with mutation load and neoantigen load but negatively correlated with the stemness index, chromosomal instability, and homologous recombination deficiency (Figure 5E-5J).

3.16 Immune infiltration analysis of hypoxia-risk score high- and low-risk groups

Differences in the infiltrating immune populations were observed between the hypoxia-risk score high- and low-risk groups, with a higher frequency of activated NK cells, resting memory CD4+ T cells, and Tregs present in the hypoxia-risk score high group and a higher frequency of resting NK cells present in the hypoxia-risk score low group (Figure 5M).

3.17 Analysis of immune score, stroma score, and tumor purity corresponding to

The hypoxia-risk score high- and low-risk groups

In the TCGA-LUNG cohort, the high-risk samples showed higher immune scores, matrix scores, and ESTIMATE scores than the low-risk samples (Figure 5K, 5L and 5N).

3.18 Somatic mutation analysis of hypoxia-risk score high- and low-risk groups

Based on the TCGA-LUNG cohort, somatic mutation analyses of hypoxia-risk score high- and low-risk groups showed that there were 8 genes with higher mutation frequencies in common between the high- and low-risk groups, including TP53, TIN, CSMD3, MUC16, RYR2, LRP1B, USH2A, and ZFHX4 (Supplementary Figure 4A and 4B). In addition, the samples in the low-risk group had specific gene mutations, such as KMT2D, while the samples in the high-risk group had specific gene mutations, such as KRAS (Supplementary Figure 4A and 4B).

3.19 Copy number variation analysis of hypoxia-risk score high and low risk

groups

The copy number variation analysis of the hypoxia-risk score high- and low-risk groups showed that the most frequent CNVs were 12 gain and 16 loss in both the low-risk group and the high-risk group (Supplementary Figure 4C-4E).

3.20 Immunotherapy efficacy analysis of high- and low-risk hypoxia-risk scores

groups

The GSE135222 dataset with immunotherapy data was used. The KM analysis of PFS indicated no significant differences between the high-risk and low-risk groups (Figure 5O). Immunotherapy efficacy analysis indicated that 36% of patients in the high-risk group benefited from immunotherapy, while 30% of patients in the low-risk group benefited from immunotherapy (Figure 5P). Although patients who benefited from immunotherapy showed a slightly higher hypoxia-risk score, there was no significant difference in hypoxia-risk score compared with those who did not benefit from immunotherapy (Figure 5Q).

3.21 Chemotherapy efficacy analysis of high- and low-risk hypoxia-risk scores

groups

Furthermore, chemotherapy efficacy analysis indicated that 74% of lung cancer patients in the low-risk group achieved a CR or PR to chemotherapy, compared with 65% in the high-risk group (Supplementary Figure 5A). Accordingly, patients who achieved CR to chemotherapy had a lower hypoxia-risk score than patients with PD after chemotherapy (Supplementary Figure 5B). Moreover, the hypoxia-risk score had an ROC-AUC of 0.621 in terms of predicting chemotherapy response (Supplementary Figure 5C).

3.22 Chemotherapy resistance assessment in the hypoxia-risk score high and low

risk groups

The IC50 values of four different chemotherapeutic drugs (cisplatin, docetaxel, gemcitabine and paclitaxel) involved in each sample were calculated, and the results showed that a lower IC50 value corresponded to a lower hypoxia-risk score (Supplementary Figure 5D-5G).

3.23 Independent prognostic efficacy assessment of the hypoxia-risk score

Univariate Cox analysis was performed for clinical characteristics and hypoxia-risk scores and indicated that histological type, clinical stage and hypoxia-risk score could provide prognostic prediction of PFS in the TCGA-LUNG cohort (Supplementary Figure 6A). Then, multivariable analysis confirmed the independent predictive value of the hypoxia-risk score (HR, 2.6; 95% CI, 1.7 to 4.1; P <0.001) (Supplementary Figure 6B).

3.24 Construction of the nomogram

A nomogram prognostic model of clinical information and hypoxia-risk score versus PFS were constructed to predict patient outcomes in lung cancer (Supplementary Figure 6C-6F).

3.25 Overall survival analyses of KRT16 protein expression in patients with NSCLC

In total, samples from 32 NSCLC patients were evaluable for KRT16 protein expression using immunohistochemistry (IHC). KRT16 staining was mainly localized in the cytoplasm, and the staining intensity was scored from 0 to 3+, corresponding to negative (0), weak (1+), moderate (2+) or strong (3+) staining (Figure 6A-6D). The proportion of positive cells with any intensity was estimated as the percentage. The H-score, which involved multiplying the percentages of stained cells by their staining intensity, was defined as the KRT16 IHC score (ranging from 0 to 300) in this study. A KRT16 IHC score of <100 was set as low KRT16 protein expression, and a KRT16 IHC score of ≥100 was set as high KRT16 protein expression. The clinicopathological characteristics of KRT16 protein expression in NSCLC patients were shown in Table 2. Twenty-three patients (71.9%) had tumors with low KRT16 protein expression, and nine patients (28.1%) had tumors with high KRT16 protein expression. Normal alveolar epithelium also had certain KRT16 protein expression, but they had lower KRT16 protein expression than tumor cells (Figure 6E), and the paired plots also showed that KRT16 protein was highly expressed in NSCLC compared to paired normal tissues (Figure 6F). The KRT16 protein high-expression patients had poorer survival than KRT16 protein low-expression patients (HR (95% CI) =3.05(0.60–15.44), log-rank p=0.041) (Figure 6G). Together, these results validated the prognostic value of KRT16 in NSCLC, that is, high KRT16 protein expression served as a poor prognostic factor for NSCLC patients.

Over the past 20 years, with advances in our understanding of disease biology, the application of predictive biomarkers, and the improvement of treatments, the prognosis of patients has improved [24–26]. However, lung cancer remains the most common cancer worldwide and the leading cause of cancer death both in the United States and China [24, 25], especially for NSCLC. In view of the reports of hypoxia in other tumors and the basis of our previous work [1–19, 27, 28], to further evaluate the significance of hypoxia in lung cancer, this study was designed and validated.

The basic information of the raw TCGA-LUNG data was provided in Table 1. There were significant differences in gene expression between cancer and normal tissues, and a total of 5238 DEGs were obtained (Supplementary Fig. 1A and 1B), 75 of which were found to be hypoxia-DEGs. Our study confirmed the presence of hypoxia-DEGs in lung cancer, which was similar to the findings in other tumors [15–19]. Based on the expression levels of 75 hypoxia-DEGs, two clusters were obtained after unsupervised clustering analysis (Supplementary Fig. 1F and 1G), and a statistically significant difference in PFS was found between them (p < 0.0001), (Supplementary Fig. 1H). Subsequent analysis also further verified the differences between the two clusters (Supplementary Fig. 1F and 1G). This potential clinical significance laid the foundation for our subsequent analysis.

By taking the union of hypoxia-genes, cd-hypoxia-geneset1, and cd-hypoxia-geneset2 genes, a coexpression network containing 3195 genes was successfully constructed (Supplementary Fig. 1H-1L), which has rarely been reported in lung cancer [29, 30]. Twenty-three hub genes, including SPRR2A, SPRR1A, PIK3CA, KRT6C, KRT6B, KRT17, KRT16, KRT14, JAG1, ITGA6, IRF6, GPC1, DVL3, DSG3, DSC3, DLX5, DLG1, CSTA, CLCA2, BMP7, ATP11B, ARRB1 and ABCC5, were obtained by intersecting the 73 genes screened in the Turquoise module with the genes screened by WGCNA (Supplementary Fig. 2), of which 22 were found to have an obvious impact on PFS (SPRR2A, SPRR1A, PIK3CA, KRT6C, KRT6B, KRT17, KRT16, KRT14, JAG1, IRF6, GPC1, DVL3, DSG3, DSC3, DLX5, DLG1, CSTA, CLCA2, BMP7, ATP11B, ARRB1, and ABCC5) (Fig. 2A-2F). After further analysis and screening, a risk prediction model containing 5 genes (ABCC5, CSTA, ATP11B, CACA2, KRT16), proven to have good predictive efficacy, was successfully constructed (Supplementary Fig. 2C, Supplementary Fig. 3A-3D and Fig. 3A-3E), which was further verified by the external dataset GSE74777 (Fig. 3F-3I). Furthermore, subsequent PFS survival analysis of the hypoxia-risk score corresponding to the high- and low-risk groups indicated that the stability of the risk predictive model was better (Fig. 4A-4F). Similar results were obtained even for pancancer analyses examining the generality of the model (Fig. 4G-4J). The advantage of the model could also be reflected in the comparison with previously reported prediction models (Fig. 3E and Fig. 4K)[15]. All these results suggested that the 5-gene risk predictive model might have better clinical application potential.

In the TCGA-LUNG cohort, the clinical characteristics (age, pathological type, sex, clinical stage) of the hypoxia-risk score high- and low-risk groups were significantly different (Fig. 5A-5D), which indicated that combining the hypoxia-risk score and clinical features might serve as a potential choice for judging patient prognosis [31, 32]. Similar differences could also be seen in assessments of immune scores, stromal scores, tumor purity, some immune infiltrating cells, somatic mutations, copy number variations, and chemoresistance (Fig. 5K-5N, Fig. 4 and Supplementary Fig. 5D-5G). These differences further confirmed the significance of the hypoxia-risk score for the evaluation of lung cancer patients [33]. In addition, further analysis showed that in the TCGA-LUNG cohort, the hypoxia-risk score was significantly correlated with mutation load, neoantigen load, stemness index, chromosomal instability, and homologous recombination defects (Fig. 5E-5J) [33].

Although hypoxia has been reported to be associated with alterations in the tumor immune microenvironment in many tumors [31–35], no statistically significant analysis results were found when the association between hypoxia and immunotherapy efficacy was assessed in GSE135222 (Fig. 5O-5Q). Similarly, no significant results could be seen in chemotherapy response evaluation in TCGA-LUNG (Supplementary Fig. 5A and 5B), while a better index was found only in the ROC curve for score (AUC = 0.621) (Supplementary Fig. 5C). This meant that the ROC curve related to the hypoxia-risk score might have a better predictive significance for chemotherapy efficacy, which would be helpful for clinical work. In the multivariate prognostic Cox analysis, the risk prediction model of the hypoxia-risk score also showed good independent predictive efficacy (Supplementary Fig. 6), which indicated that the 5-gene predictive model had great potential for clinical application in lung cancer.

Finally, clinical sample validation of KRT16, which is one of 5 hypoxia risk genes, was further performed through immunohistochemical staining. Among the 32 paired tissues (NSCLC tumor tissues vs. their adjacent normal tissues), there were 14 normal tissues with null KRT16 protein expression but not their adjacent tumor tissues, and KRT16 was overexpressed in NSCLC tumor tissues compared with normal tissues (Fig. 6E and 6F). Moreover, we determined that a high level of KRT16 protein predicted poor prognosis in NSCLC patients (Fig. 6G). Therefore, the immunohistochemical results of clinical samples further supported the prognostic value of the risk predictive model in NSCLC.

The 5-gene risk predictive model of the hypoxia risk score was constructed based on the TCGA-LUNG cohort. The model performed well in multiple cancer datasets and had excellent stability. Multivariate prognostic Cox analysis also showed good independent predictive efficacy.

NSCLC

Non-Small Cell Lung Cancer

TCGA

The Cancer Genome Atlas

GEO

Gene Expression Omnibus

GSE

Gene set enrichment

CESC

Cervical squamous cell carcinoma and endocervical adenocarcinoma

PAAD

Pancreatic adenocarcinoma

THYM

Thymoma

UCEC

Uterine corpus endometrial carcinoma

LUAD

Lung adenocarcinoma

LUSC

Lung squamous cell carcinoma

Gene Ontology

DFI

Disease-free interval

DSS

Disease Specific Survival

CNV

Copy number variation

LASSO

Least absolute shrinkage and selection operator

PPI

Protein‒protein interaction

WGCNA

weighted gene coexpression network analysis

DEGs

Differentially expressed genes

KEGG

Kyoto Encyclopedia of Genes and Genomes

AUTHOR CONTRIBUTIONS

The study was designed by Hongtao Liu, Yuan Tian and Qing Sun; Qing Sun was the corresponding author and was responsible for the implementation of the entire process of the study. The data downloads were finished by Hongtao Liu; the data analysis and verification process were completed by Hongtao Liu and Yuan Tian, respectively. The draft of the manuscript was drafted by Yuan Tian and Hongtao Liu. Clinical sample collection and immunohistochemical staining were performed by Guoxia Zhang, Yuxia Cheng and Hongtao Liu. The scientific soundness was reviewed by Liang Guo, Guoxia Zhang, Yuxia Cheng, Hongtao Liu and Qing Sun. The final version of the manuscript was reviewed and checked by all authors. All inconsistencies in this study were ultimately coordinated and resolved by the corresponding author, Qing Sun.

FUNDING

The sources of funding for this study are listed as follows: Shandong Provincial Qianfoshan Hospital Cultivation Fund (QYPY2020NSFC1001; Hongtao Liu), Postdoctoral Innovation Project of Jinan (Yuan Tian) and Shandong Province Major Science and Technology Innovation Project (2019JZZY010108; QingSun).

AVAILABILITY OF DATA AND MATERIALS

All the data involved in this study were downloaded from the online database for free and the datasets generated during the current study are available from the corresponding author on reasonable request.

Ethics approval and consent to participate

This study was approved by the ethics committee of the First Affiliated Hospital of Shandong First Medical University. Ethical approval for the study was granted by The First Affiliated Hospital of Shandong First Medical University Medicine Ethics Committee (Reference Number S526). Written informed consents were obtained from all individual participants involved in the study. All methods were carried out in accordance with the relevant guidelines and regulations.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Data availability

Data are available from corresponding author ([email protected]) on reasonable

request.

Strzyz P. Cancer biology: Hypoxia as an off switch for gene expression. Nat Rev Mol Cell Biol. 2016;17(10):610. https://doi.org/10.1038/nrm.2016.119.
Thienpont B, Steinbacher J, Zhao H, D'Anna F, Kuchnio A, Ploumakis A, Ghesquière B, Van Dyck L, Boeckx B, Schoonjans L, Hermans E, Amant F, Kristensen VN, et al. Tumour hypoxia causes DNA hypermethylation by reducing TET activity. Nature. 2016;537(7618):63-68. https://doi.org/10.1038/nature19081.
Jingsong Zhang, Minghui Wu, Xue Wang. The roles of hypoxia, PTEN, and Rad51 in mediating metastatic prostate cancer cells' responses to PARP inhibitor and topoisomerase 1 inhibitor. Journal of Clinical Oncology. 2012 May 20. doi:10.1200/jco.2012.30.15_suppl.e13564.
Chen X, Iliopoulos D, Zhang Q, Tang Q, Greenblatt MB, Hatziapostolou M, Lim E, Tam WL, Ni M, Chen Y, Mai J, Shen H, Hu DZ, et al. XBP1 promotes triple-negative breast cancer by controlling the HIF1α pathway. Nature. 2014;508(7494):103-107. https://doi.org/10.1038/nature13119.
Wouters BG, Koritzinsky M. Hypoxia signalling through mTOR and the unfolded protein response in cancer. Nat Rev Cancer. 2008;8(11):851-64. https://doi.org/10.1038/nrc2501.
Yang L, Taylor J, Eustace A, Irlam JJ, Denley H, Hoskin PJ, Alsner J, Buffa FM, Harris AL, Choudhury A, West CML. A Gene Signature for Selecting Benefit from Hypoxia Modification of Radiotherapy for High-Risk Bladder Cancer Patients. Clin Cancer Res. 2017;23(16):4761-4768. https://doi.org/10.1158/1078-0432.CCR-17-0038.
Brown JM, Wilson WR. Exploiting tumour hypoxia in cancer treatment. Nat Rev Cancer. 2004;4(6):437-47. https://doi.org/10.1038/nrc1367.
Wilson WR, Hay MP. Targeting hypoxia in cancer therapy. Nat Rev Cancer. 2011;11(6):393-410. https://doi.org/10.1038/nrc3064.
Bourhis J. Hypoxia response pathways and radiotherapy for head and neck cancer. J Clin Oncol. 2006;24(5):725-6. https://doi.org/10.1200/JCO.2005.04.5146.
Koukourakis MI, Bentzen SM, Giatromanolaki A, Wilson GD, Daley FM, Saunders MI, Dische S, Sivridis E, Harris AL. Endogenous markers of two separate hypoxia response pathways (hypoxia inducible factor 2 alpha and carbonic anhydrase 9) are associated with radiotherapy failure in head and neck cancer patients recruited in the CHART randomized trial. J Clin Oncol. 2006;24(5):727-35. https://doi.org/10.1200/JCO.2005.02.7474.
Pouysségur J, Dayan F, Mazure NM. Hypoxia signalling in cancer and approaches to enforce tumour regression. Nature. 2006;441(7092):437-43. https://doi.org/10.1038/nature04871.
Milosevic M, Warde P, Ménard C, Chung P, Toi A, Ishkanian A, McLean M, Pintilie M, Sykes J, Gospodarowicz M, Catton C, Hill RP, Bristow R. Tumor hypoxia predicts biochemical failure following radiotherapy for clinically localized prostate cancer. Clin Cancer Res. 2012;18(7):2108-14. https://doi.org/10.1158/1078-0432.CCR-11-2711.
Moyer MW. Targeting hypoxia brings breath of fresh air to cancer therapy. Nat Med. 2012;18(5):636-7. https://doi.org/10.1038/nm0512-636b.
Fyles A, Milosevic M, Hedley D, Pintilie M, Levin W, Manchul L, Hill RP. Tumor hypoxia has independent predictor impact only in patients with node-negative cervix cancer. J Clin Oncol. 2002;20(3):680-7. https://doi.org/10.1200/JCO.2002.20.3.680.
Eustace A, Mani N, Span PN, Irlam JJ, Taylor J, Betts GN, Denley H, Miller CJ, Homer JJ, Rojas AM, Hoskin PJ, Buffa FM, Harris AL, et al. A 26-gene hypoxia signature predicts benefit from hypoxia-modifying therapy in laryngeal cancer but not bladder cancer. Clin Cancer Res. 2013;19(17):4879-88. https://doi.org/10.1158/1078-0432.CCR-13-0542.
Yang L, Taylor J, Eustace A, Irlam JJ, Denley H, Hoskin PJ, Alsner J, Buffa FM, Harris AL, Choudhury A, West CML. A Gene Signature for Selecting Benefit from Hypoxia Modification of Radiotherapy for High-Risk Bladder Cancer Patients. Clin Cancer Res. 2017;23(16):4761-4768. https://doi.org/10.1158/1078-0432.CCR-17-0038.
Abou Khouzam R, Rao SP, Venkatesh GH, Zeinelabdin NA, Buart S, Meylan M, Nimmakayalu M, Terry S, Chouaib S. An Eight-Gene Hypoxia Signature Predicts Survival in Pancreatic Cancer and Is Associated With an Immunosuppressed Tumor Microenvironment. Front Immunol. 2021;12:680435. https://doi.org/10.3389/fimmu.2021.680435.
Salberg UB, Skingen VE, Fjeldbo CS, Hompland T, Ragnum HB, Vlatkovic L, Hole KH, Seierstad T, Lyng H. A prognostic hypoxia gene signature with low heterogeneity within the dominant tumour lesion in prostate cancer patients. Br J Cancer. 2022 Mar 24. https://doi.org/10.1038/s41416-022-01782-x.
Yang L, Roberts D, Takhar M, Erho N, Bibby BAS, Thiruthaneeswaran N, Bhandari V, Cheng WC, Haider S, McCorry AMB, McArt D, Jain S, Alshalalfa M, et al. Development and Validation of a 28-gene Hypoxia-related Prognostic Signature for Localized Prostate Cancer. EBioMedicine. 2018;31:182-189. https://doi.org/10.1016/j.ebiom.2018.04.019.
ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature. 2020;578(7793):82-93. https://doi.org/10.1038/s41586-020-1969-6.
Sidaway P. Pancreatic cancer: TCGA data reveal a highly heterogeneous disease. Nat Rev Clin Oncol. 2017;14(11):648. https://doi.org/10.1038/nrclinonc.2017.146.
Corces MR, Granja JM, Shams S, Louie BH, Seoane JA, Zhou W, Silva TC, Groeneveld C, Wong CK, Cho SW, Satpathy AT, Mumbach MR, Hoadley KA, et al. The chromatin accessibility landscape of primary human cancers. Science. 2018;362(6413):eaav1898. https://doi.org/10.1126/science.aav1898.
Breast Cancer Association Consortium; Dorling L, Carvalho S, Allen J, González-Neira A, Luccarini C, Wahlström C, Pooley KA, Parsons MT, Fortuno C, Wang Q, Bolla MK, Dennis J, Keeman R, et al. Breast Cancer Risk Genes - Association Analysis in More than 113,000 Women. N Engl J Med. 2021;384(5):428-439. https://doi.org/10.1056/NEJMoa1913948.
Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA Cancer J Clin. 2022;72(1):7-33. https://doi.org/10.3322/caac.21708.
Chen W, Zheng R, Baade PD, Zhang S, Zeng H, Bray F, Jemal A, Yu XQ, He J. Cancer statistics in China, 2015. CA Cancer J Clin. 2016;66(2):115-32. https://doi.org/10.3322/caac.21338.
Thai AA, Solomon BJ, Sequist LV, Gainor JF, Heist RS. Lung cancer. Lancet. 2021;398(10299):535-554. https://doi.org/10.1016/S0140-6736(21)00312-3.
Tian Y, Wang J, Wen Q, Su G, Sun Y. Immune subgroup analysis for non-small cell lung cancer may be a good choice for evaluating therapeutic efficacy and prognosis. Aging (Albany NY). 2021;13(9):12691-12709. https://doi.org/10.18632/aging.202941.
Tian Y, Zhang C, Ma W, Huang A, Tian M, Zhao J, Dang Q, Sun Y. A novel classification method for NSCLC based on the background interaction network and the edge-perturbation matrix. Aging (Albany NY). 2022;14(7):3155-3174. https://doi.org/10.18632/aging.204004.
Gong PJ, Shao YC, Huang SR, Zeng YF, Yuan XN, Xu JJ, Yin WN, Wei L, Zhang JW. Hypoxia-Associated Prognostic Markers and Competing Endogenous RNA Co-Expression Networks in Breast Cancer. Front Oncol. 2020;10:579868. https://doi.org/10.3389/fonc.2020.579868.
Huang D, Liu Q, Zhang W, Huang C, Zheng R, Xie G, Wang H, Jia B, Shi J, Yuan Y, Deng M. Identified IGSF9 association with prognosis and hypoxia in nasopharyngeal carcinoma by bioinformatics analysis. Cancer Cell Int. 2020;20:498. https://doi.org/10.1186/s12935-020-01587-z.
Liu Z, Tang Q, Qi T, Othmane B, Yang Z, Chen J, Hu J, Zu X. A Robust Hypoxia Risk Score Predicts the Clinical Outcomes and Tumor Microenvironment Immune Characters in Bladder Cancer. Front Immunol. 2021;12:725223. https://doi.org/10.3389/fimmu.2021.725223.
Pei JP, Zhang CD, Yusupu M, Zhang C, Dai DQ. Screening and Validation of the Hypoxia-Related Signature of Evaluating Tumor Immune Microenvironment and Predicting Prognosis in Gastric Cancer. Front Immunol. 2021;12:705511. https://doi.org/10.3389/fimmu.2021.705511.
Ouyang W, Jiang Y, Bu S, Tang T, Huang L, Chen M, Tan Y, Ou Q, Mao L, Mai Y, Yao H, Yu Y, Lin X. A Prognostic Risk Score Based on Hypoxia-, Immunity-, and Epithelialto-Mesenchymal Transition-Related Genes for the Prognosis and Immunotherapy Response of Lung Adenocarcinoma. Front Cell Dev Biol. 2022;9:758777. https://doi.org/10.3389/fcell.2021.758777.
Lin W, Wu S, Chen X, Ye Y, Weng Y, Pan Y, Chen Z, Chen L, Qiu X, Qiu S. Characterization of Hypoxia Signature to Evaluate the Tumor Immune Microenvironment and Predict Prognosis in Glioma Groups. Front Oncol. 2020;10:796. https://doi.org/10.3389/fonc.2020.00796.
Zhang L, Wang S, Wang Y, Zhao W, Zhang Y, Zhang N, Xu H. Effects of Hypoxia in Intestinal Tumors on Immune Cell Behavior in the Tumor Microenvironment. Front Immunol. 2021;12:645320. https://doi.org/10.3389/fimmu.2021.645320.

Table 1. The characteristics of TCGA-LUNG cohort.

Characteristics	Tumor(n=1017)	Normal(n=110)
Sex
Male	609	62
Female	407	48
Unknown	1	0
Histologic Type
LUAD	515
LUSC	502
Clinical Stage
Stage Ⅰ	519
Stage Ⅱ	284
Stage Ⅲ	168
Stage Ⅳ	33
Unknown	13

Table 2. Correlation between KRT16 protein expression and clinicopathologic characteristics in NSCLC.

Characteristic	KRT16-Low	KRT16-High	P value
n	23	9
Age, n (%)			0.696
>65	8 (25%)	4 (12.5%)
≤65	15 (46.9%)	5 (15.6%)
Gender, n (%)			0.015
Female	8 (25%)	8 (25%)
Male	15 (46.9%)	1 (3.1%)
T stage, n (%)			0.282
T1	2 (6.2%)	3 (9.4%)
T2	15 (46.9%)	5 (15.6%)
T3	6 (18.8%)	1 (3.1%)
N stage, n (%)			0.837
N0	16 (50%)	5 (15.6%)
N1	5 (15.6%)	3 (9.4%)
N2	2 (6.2%)	1 (3.1%)
Pathological Stage, n (%)			0.385
Ⅰ	7 (21.9%)	1 (3.1%)
Ⅱ	14 (43.8%)	6 (18.8%)
Ⅲ	2 (6.2%)	2 (6.2%)
Histologic type, n (%)			0.830
Adenomcarcinoma	16 (50%)	6 (18.8%)
Mucoepidermoid carcinoma	2 (6.2%)	0 (0%)
Squamous cell carcinoma	5 (15.6%)	3 (9.4%)

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Screening and Validation of Hypoxia-related Signatures for Predicting Prognosis in Patients with Lung Cancer

Status:

Version 1

Abstract

Purpose

Methods

Results

Conclusion

Figures

1. INTRODUCTION

2. METHODS

3. RESULTS

4. DISCUSSION

CONCLUSION

Abbreviations

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1