Transcriptome and Mendelian randomization were combined to screen and validate prognostic genes associated with lipid autophagy in oral squamous cell carcinoma

doi:10.21203/rs.3.rs-4531145/v1

Download PDF

Research Article

Transcriptome and Mendelian randomization were combined to screen and validate prognostic genes associated with lipid autophagy in oral squamous cell carcinoma

https://doi.org/10.21203/rs.3.rs-4531145/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

Cancer cells can use fatty acids produced by lipophagy to continue growing and proliferating, but the regulation of lipophagy in oral squamous cell carcinoma (OSCC) remain poorly understood.

Methods

mRNA profiles, expression quantitative trait loci (eQTL) data, and ieu-b-4961 were scoured from the online databases. In TCGA-OSCC, the differentially expressed genes (DEGs) were screened between the tumors and paracancerous tissues. The weighted gene co-expression network analysis (WGCNA) was applied to obtain the key module genes highly related to lipophagy. Later, differentially expressed lipophagy-related genes (DE-LRGs) were determined by overlapping DEGs and key module genes. Next, the eQTL data of DE-LRGs was an exposure factor and the OSCC was an outcome factor in the two-sample Mendelian Randomization (MR). Meanwhile, sensitivity analyses and MR Steiger filtering were performed, and then candidate genes were selected to construct a prognostic risk model. Based on least absolute shrinkage and selection operator (LASSO)-Cox regression analyses, the prognostic genes were confirmed and a prognostic risk model was built. Afterwards, the tumors of OSCC patients were divided into high- and low-risk teams based on the median risk score. Finally, the immune microenvironment was evaluated using the estimate and single sample gene set enrichment analysis (ssGSEA) algorithms.

Results

A total of 271 DE-LRGs were determined by overlapping 4,712 DEGs and 308 key module genes. Among them, 18 exposure factors could affect directly OSCC as candidate genes. Next, 4 prognostic genes (CLTCL1, TNNC1, ALPK3, and PFKM) were identified, among them, CLTCL1 (odds ratio (OR) = 0.9997, 95% confidence intervals (CI) = 0.9995–0.9999, P_IVW = 0.0020), PFKM (OR = 0.9997, 95% CI = 0.9995–0.9999, P_IVW = 0.0067), and ALPK3 (OR = 0.9990, 95% CI = 0.9983–0.9997, P_IVW = 0.0061) were protective factors and TNNC1 (OR = 1.0005, 95% CI = 1.0001–1.0008, P_IVW = 0.0102) was a risk factor. A prognostic risk model was built, notably, the probability of overall survival (OS) in the low-risk team was higher than that in the high-risk team. Furthermore, we found that the low-risk team had higher immune, stromal, and ESTIMATE scores, and there were 23 differential immune cells between the two risk teams.

Conclusion

Generally, CLTCL1, PFKM, and ALPK3 were protective factors, while TNNC1 was a risk factor for OSCC patients. Our findings provided a new perspective on the treatment and prognosis of OSCC.

Oral squamous cell carcinoma

lipophagy

Mendelian randomization

Transcriptome

Prognostic risk model

immune microenvironment

Oral squamous cell carcinoma (OSCC) is a commonly occurring head and neck cancer^[1]. Cancers originated from oral cavity, tongue, lip and mouth accumulatively represents the 8th most common cancer with an estimated over 300,000 new cases and 150,000 deaths annually^[2].Unfortunately, the 5-year survival rate stands at only 47–66% as most oral cancer cases were detected at a late stage of malignancy ^[2]. A more accurate prediction of the prognosis of patients with OSCC can help physicians select appropriate treatment strategies and improve the survival rate of patients ^[3]. Nowadays, the Tumor-Node-Metastasis (TNM) staging system is employed to predict tumor prognosis and to guide physicians towards the correct treatment choice. However, the main limitations of OSCC-related TNM system is its main focus on the anatomical extension of the disease. The prognosis can be modified by tumor-related factors, such as genetics, patient age, sex, race or comorbidities^[4–5] .Consequently, it is urgently needed to construct the creditable prognostic approaches that can assist clinicians in selecting the suitable individualized therapeutic strategies, thus improving OSCC prognosis.

The autophagic degradation of lipid droplets (LDs), termed lipophagy, is a major mechanism that contributes to lipid turnover in numerous cell types. A plethora of studies have shown the existence of lipophagy in diverse cell types such as macrophages, neurons, lymphocytes, and among organisms^[6]. As a dynamic source of stored lipids, LDs can be rapidly mobilized to release fatty acids that can be either degraded into energy via β-oxidation, used in membrane synthesis, and/or used as lipid signaling molecules^[7]. In tumor cells, continuously activated lipid synthesis and uptake have been found to play an important role in tumor cell resistance to “hunger”. Excess lipids will be stored in LDs, initiate decomposition and provide energy during “hunger”. For example, in hepatocellular carcinoma cells, the metabolic enzyme PCK1 (cytosolic phosphoenolpyruvate carboxykinase 1) is endowed with a nonmetabolic enzyme function, which promotes the activation of the SREBP (sterol regulatory element-binding proteins) signaling pathway and enhances lipid droplet synthesis in tumor cells^[8]. Therefore, cancer cells can continuously grow and proliferate using fatty acids produced by lipoautophagy, but the role of lipoautophagy in OSCC is still poorly understood, the mechanism of action of Lipophagy-related genes (LRGs) in OSCC has not been reported.

Mendelian randomization (MR), an application of instrumental variable analysis, which borrows statistical techniques from economics, aims to test a causal hypothesis in non-experimental data, it allows researchers to analyze the effects of the environment, drug treatments, and other factors on human biology and disease. In an MR analysis, genetic variants, commonly single nucleotide polymorphisms, are used as instrumental variables for the putative risk factor. The three hypotheses of Mendel: correlation, independence and exclusivity, show that Mendel's randomization overcomes some shortcomings of traditional research^[9]. Randomized controlled trials (RCTs) are considered the gold standard design to infer causality. However, RCTs are expensive, time consuming, and often unfeasible to conduct, e.g. poor compliance with long-term follow-up and ethical issues about random treatment allocation. Mendelian randomization studies are often faster and cheaper to conduct, as they can be conducted using existing large-scale GWAS data.In a two-sample MR study, the genetic variant–risk factor association and the genetic variant–outcome association came from independent study populations. An advantage of the two-sample design is that statistical power is typically greater as existing summarized data from large-scale GWAS consortia can be used^[10]. On the interaction mechanism between lipophagy and OSCC, there are relatively few studies applying MR, so this approach will be adopted in this study to explore the mechanism of the interaction between lipophagy and OSCC.

This study, based on a public database, two-sample MR method was used to explore lipophagy-related genes that could have a direct impact on OSCC as candidate genes. By Lasso-Cox regression analysis, a prognostic risk model was constructed and prognostic risk genes were screened. Analyzing the mechanism of action of these prognostic genes in OSCC provides theoretical reference for the treatment and prognosis of OSCC.

2.1 Data resource

From the Cancer Genome Atlas (TCGA, https://portal.Gdc.Cancer.gov) database, we downloaded the TCGA-head and neck squamous cell carcinoma (HNSC). Further, the sample of OSCC was selected and named TCGA-OSCC. The RNA-sequencing (RNA-seq), clinic characters, and overall survival (OS) profiles of the TCGA-OSCC cohort were downloaded. TCGA-OSCC contained 359 tumors and 32 paracancerous tissues of OSCC patients. GSE41613 (GPL570) included the OS profile of 97 OSCC patients, which was resourced from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). Moreover, 9 LRGs were obtained from the Molecular Signatures Database (MSigDB, https://www.gsea-msigdb.org/gsea/msigdb), with ‘lipophagy’ as the keyword.

The genome-wide association study (GWAS) data, ieu-b-4961, was downloaded from the Integrative Epidemiology Unit (IEU) Open GWAS database (https://gwas.mrcieu.ac.uk/) as an outcome variable in two-sample MR. There were 7,723,107 single nucleotide polymorphisms (SNPs) between oral cavity cancer (OCC) patients (n = 357) and healthy controls (n = 372,016) in ieu-b-4961, and all participants were Europeans. Moreover, the expression quantitative trait loci (eQTL) data for differentially expressed LRGs (DE-LRGs) were also obtained from the IEU Open GWAS database as exposure variables.

2.2 Differential expression analysis

By the ‘DEseq2 (v. 3.4.1)’ package^[11], the differentially expressed genes (DEGs) and differentially expressed lncRNAs (DE-LncRNAs) were screened between the tumors and paracancerous tissues of OSCC patients in TCGA-OSCC (|log₂fold-change (FC)|>1, adj.P < 0.05). The volcano maps of DEGs and DE-LncRNAs were drawn via the ‘ggpubr (v. 0.6.0)’ package (https://CRAN.R-project.org/package=ggpubr), meanwhile, the heatmaps of DEGs and DE-LncRNAs were generated via the ‘ComplexHeatmap (v. 2.14.0)’ package^[12] .

2.3 Weighted gene co-expression network analysis (WGCNA)

Based on the 9 LRGs, the LRGs score of each sample in TCGA-OSCC was calculated by the single sample gene set enrichment analysis (ssGSEA) algorithm of the ‘GSVA (v. 1.46.0)’ package^[13]. The LRGs score difference between tumors and paracancerous tissues was compared by the Wilcoxon test (P < 0.05). In order to obtain the key module genes highly related to the LRGs score, the WGCNA was performed by the ‘WGCNA (v. 1.71)’ package ^[13]. Firstly, hierarchical clustering was used to detect and filter out outlier samples. Next, the optimal soft threshold was determined to make the co-expression network more compatible with the scale-free topology. Ultimately, a co-expression network was constructed that contained several modules based on the dynamic tree cutting (minModuleSize = 100). Pearson’s correlation between the modules and LRGs score was analyzed, and then the strongest correlated module with LRGs score was selected as a key module (|cor|>0.3, P < 0.05). Moreover, the key module genes were screened from the key module based on the module membership (MM) and gene significance (GS) (|MM|>0.4, |GS|>0.4).

2.4 Function enrichment analysis and protein-protein interaction (PPI) network

Utilizing the ‘VennDiagram (v. 1.7.3)’ package (https://CRAN.R-project.org/package=VennDiagram), the intersection of DEGs and key module genes was used to obtain the DE-LRGs. To further explore the function of DE-LRGs, the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were executed by the ‘clusterProfiler (v. 4.7.1.3)’ and ‘org.Hs.eg.db (v. 3.16.0)’ packages (P < 0.05) ^[14]. The results of GO and KEGG analyses were displayed via the ‘enrichplot (v. 1.18.0)’ package (https://yulab-smu.top/biomedical-knowledge-mining-book/). Furthermore, a PPI network of DE-LRGs was established to understand the interactions at protein levels based on the STRING database (https://string-db.org) (medium confidence > 0.7).

2.5 Two-sample MR analysis

The two-sample MR analysis was performed by the ‘TwoSampleMR (v. 0.5.8)’ package ^[15]. In two-sample MR analysis, the filtering criteria for instrument variables (IVs) were as follows: (1) IVs must be highly associated with exposure variables; (2) IVs must be not associated with confounding factors; (3) the IVs should only influence the outcome variable through the exposure variables. At that point, the IVs in this study were selected by the extract instruments (P < 5×10^− 8, clump = TRUE, r² = 0.001, kb = 50) and extract outcome data (proxies = TRUE, rsq = 0.8) functions, meanwhile, the weak IVs were filtered out (F < 10). Especially, the MR Egger test ^[16], weighted median ^[17], Inverse variance weighted (IVW) test ^[18], Simple mode ^[19], and weighted mode^[20] were used for the two-sample MR analysis, among them, the IVW test was the main algorithm (P < 0.05). The odds ratio (OR) > 1 was a risk factor and OR < 1 was a protective factor. The scatter plots, forest plots, and funnel plots were applied to display the above results.

2.6 Sensitivity analysis and MR Steiger filtering

The sensitivity analysis was carried out to again test the above results by the ‘TwoSampleMR (v. 0.5.8)’ package ^[19], including heterogeneity, pleiotropy, and Leave-one-out (LOO) tests. More specifically, the heterogeneity test was performed by the MR heterogeneity function (IVW, P > 0.05). The pleiotropy test was proceeded to see if there were confounding factors by the mr_pleiotropy_test and the ‘MRPRESSO (v. 1.0)’ package (P > 0.05) [Verbanck M (2017). MRPRESSO: Performs the Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO) test. R package version 1.0.]. LOO analysis was used to check the effect of each SNP on outcomes using the MR LOO. What’s more, MR Steiger filtering was applied to verify the directionality of the above results. In the end, the exposure factors that passed all the tests were further studied as candidate genes.

2.7 Identification of prognostic genes

In TCGA-OSCC, a total of 359 tumor samples with OS were considered as a training dataset. Later, the signature genes related to prognosis were screened through the ‘survival (v. 0.4.9)’ package (P < 0.2) ^[21], and then the proportional hazards (PH) test of signature genes was performed (P > 0.05). Next, the Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis was performed to further screen the prognostic genes using the ‘glmnet (v. 4.1.4)’ package ^[22].

2.8 Gene set enrichment analysis (GSEA)

To learn more about potential pathways of prognostic genes, the GSEA was executed by the ‘clusterProfiler (v. 4.2.2)’ package ^[23]. ‘C2: KEGG gene sets’ was downloaded from the MSigDB database as background gene sets. Spearman’s correlation coefficients between the prognostic genes and all other genes were calculated and ranked. Subsequently, the prognostic genes were enriched in the KEGG pathways (P < 0.05). In the end, the top 10 KEGG pathways of each prognostic gene were displayed by the ‘enrichplot (v. 1.18.0)’ package (https://yulab-smu.top/biomedical-knowledge-mining-book/).

2.9 Establishment of a prognostic risk model

Based on the prognostic genes, the risk coefficients of them were obtained by the multivariate Cox analysis. The formula for the risk score was as follows:

$$\text{R}\text{i}\text{s}\text{k}\text{s}\text{c}\text{o}\text{r}\text{e} = \sum _{\text{i} = 1}^{\text{n}}\text{c}\text{o}\text{e}\text{f}\left({\text{g}\text{e}\text{n}\text{e}}_{\text{i}}\right)\ast \text{e}\text{x}\text{p}\text{r}\left({\text{g}\text{e}\text{n}\text{e}}_{\text{i}}\right)$$

The ‘coef’ represented the risk coefficients of prognostic genes, and the ‘expr’ represented the expression of prognostic genes. The risk scores of each sample in the training dataset were calculated and ranked. Subsequently, the 359 tumor samples fell into high- (n = 179) and low-risk (n = 180) teams based on the median risk score (median risk score = 0.9661). The risk curves and survival status of the two risk teams were drawn based on the ‘survminer (v. 0.4.9)’ package (https://CRAN.R-project.org/package=survminer). Meanwhile, the Kaplan-Meier (K-M) curve was generated to compare the survival differences between high- and low-risk teams by the Wilcoxon test (P < 0.05). Eventually, the receiver operating characteristic (ROC) curve was applied to assess the forecast accuracy of the prognostic risk model.

2.10 Verification of a prognostic risk model

After the prognostic model was constructed, an external dataset GSE41613 was introduced to evaluate the prognostic model. Similarly, the 97 OSCC patients were divided into high- (n = 48) and low-risk (n = 49) teams based on the based on the median risk score (median risk score = 2.4196). In GSE41613, the risk curves, survival status, K-M curve, and ROC curves were generated by the same method as above.

2.11 Construction of nomogram model

Here, we introduced eight clinic characters to further explore their impact on prognosis in patients with OSCC, including age, race, gender, pathologic TN, stage, grade, radiation therapy, and examined lymph node (ELN). Based on the univariate Cox analysis, the features highly related to prognosis were selected from the clinic characters and risk score (P < 0.05). Afterwards, the PH tests of the above features were conducted (P > 0.05), and then the multivariate Cox analysis was used to screen independent prognostic factors (P < 0.05). Next, a nomogram was generated to predict the OS survival of OSCC patients using the ‘rms (v. 6.5-0)’ package (https://CRAN.R-project.org/package=rms). Meanwhile, the calibration curves, ROC curves, and DCA curves were used to evaluate the predictive accuracy of the model. Moreover, the risk score differences of different clinic subgroups were contrasted by the Kruskal-Wallis (P < 0.05). K-M curves were used to evaluate the survival probability of different clinic subgroups (Kruskal-Wallis, P < 0.05).

2.12 Tumor microenvironment

Firstly, we assessed the differences of the immune score, stromal score, ESTIMATE score, and tumor purity between the two risk teams in TCGA-OSCC using the ‘estimate (v. 1.0.13)’ package ^[24]. Subsequently, the infiltration abundances of 28 immune cells in the two risk teams of TCGA-OSCC were calculated based on the ssGSEA algorithm of the ‘GSVA (v. 1.46.0)’ package ^[25]. The infiltration differences between the two risk teams were contrasted by the Wilcoxon test (P < 0.05), and the differential immune cells were confirmed. Spearman’s correlation between the differentially immune cells and prognostic genes was analyzed (|cor|>0.3, P < 0.05). What’s more, a total of 48 immune checkpoints were collected from the published literature ^[26], and the expression differences of 47 immune checkpoints between the two risk teams contrasted by the Wilcoxon test (P < 0.05). Meanwhile, Spearman’s correlation analysis of differentially expressed immune checkpoints was performed (|cor|>0.3, P < 0.05).

2.13 Molecular regulatory network

Here, the miRNAs were predicted based on the prognostic genes from the miRDB website (https://mirdb.org/). Subsequently, based on these miRNAs, the lncRNAs were predicted from the starBase database (http://starbase.sysu.edu.cn/). The hub lncRNAs were obtained by overlapping the lncRNAs and DE-LncRNAs using the ‘VennDiagram (v. 1.7.3) package (https://CRAN.R-project.org/package=VennDiagram). Finally, based on the hub lncRNAs, miRNAs, and prognostic genes, a lncRNA-miRNA-mRNA network was constructed by the Cytoscape (v. 3.10.0) software ^[27]. In addition, transcription factors (TFs) were predicted on the basis of the prognostic genes by the NetworkAnalyst (https://www.networkanalyst.ca/). Likewise, a TF-mRNA network was generated.

2.14 Drug sensitivity analysis

In TCGA-OSCC, we assessed differences in response to chemotherapy drugs in the two risk groups. A total of 138 chemotherapy drugs were collected from the Genomics of Drug Sensitivity in Cancer (GDSC) database (https://www.cancerrxgene.org/), the biochemical half maximal inhibitory concentration (IC₅₀) differences were compared between the two risk groups by the Wilcoxon test (P < 0.05). In the end, the results were displayed by the Hiplot platform (https://hiplot.com.cn/).

2.15 Statistical analysis

All statistical analyses were carried out by the Rstudio (v. 4.2.2) software. In this study, the networks were constructed by the Cytoscape (v. 3.10.0) software ^[27]. P < 0.05 was considered statistically significant in all conditions (two-tailed).

3.1 A total of 271 DE-LRGs were associated with lipophagy

In TCGA-OSCC, a total of 4,712 DEGs were identified between the tumors and paracancerous tissues, which included 2,050 up-regulated DEGs and 2,662 down-regulated DEGs (Fig. 1A-B). Interestingly, the LRGs score of tumors was significantly lower than that of paracancerous tissues (P < 0.05) (Fig. 1C). Next, a WGCNA was performed to obtain the genes strongly related to LRGs in TCGA-OSCC. Hierarchical clustering indicated that there was no outlier sample in TCGA-OSCC (Fig. S1). Afterwards, optimal soft threshold was 8 (R² = 0.9) (Fig. 1D), and then a sum of 8 modules were confirmed based on the dynamic tree cutting (Fig. 1E). The MEpurple module that strongest related to LRGs score was determined as the key module (cor = 0.7028, P < 0.05) (Fig. 1F), and then 308 key module genes were obtained (|GS|>0.4, |MM|>0.4) (Fig. 1G). Finally, a total of 271 DE-LRGs were identified by overlapping the DEGs and key module genes (Fig. 1H). Notably, the GO results suggested that DE-LRGs were enriched in muscle system processes, muscle contraction, muscle cell differentiation, etc. (Fig. 1I, Table S1). And the DE-LRGs were linked with hypertrophic cardiomyopathy, dilated cardiomyopathy, cardiac muscle contraction, etc. KEGG pathways (Fig. 1J, Table S2). Moreover, a PPI network was generated based on the 271 DE-LRGs, including ACTN2-TTN, CAMK2A-CAMK2B, CASQ2-TRDN, etc. pairs (Fig. S2, Table S3).

3.2 There were 18 exposure factors that could affect directly OSCC patients

The eQTL data of 271 DE-LRGs was an exposure factor, and the OCC (ieu-b-4961) was an outcome factor. Importantly, 19 exposure factors could be directly associated with OSCC patients, including 8 risk factors (P_IVW<0.05, OR > 1) and 11 protective factors (P_IVW<0.05, OR < 1) (Table 1). Obviously, scatter plots suggested that the slope (IVW test) of the protective factors was negative, while the slope (IVW test) of the risk factors was positive (Fig. S3). The forest plots showed that the IVs of the protective factors were on the left side of the line in the IVW test, and the IVs of risk factors were on the right side of the line (Fig. S4). Moreover, the IVs of the exposure factors met Mendel’s second law (Fig. S5). In the two-sample MR results, there was no heterogeneity (P > 0.05), and ZNF385D was eliminated due to confounding factors (P < 0.05) (Table S4). The reliability of the above results was confirmed again by LOO analysis (Fig. S6). MR Steiger filtering inferred that the above 18 exposure factors could only affect OSCC in a direction (correct_causal_direction: TURE, P < 0.05) (Table S5). In brief, the 18 exposure factors were confirmed as candidate genes for a prognostic risk model.

Table 1

Results of univariate COX regression
Gene	HR	P	PH test	OR
CLTCL1	1.281	0.02	0.33	0.999692632
PRUNE2	1.11	0.1367	0.49	0.999890524
TNNC1	1.044	0.089	0.42	1.000460202
ALPK3	1.092	0.1985	0.44	0.999011129
PFKM	1.217	0.0048	0.9	0.999716688
S100A1	1.064	0.1584	0.61	0.999085595
NEB	1.077	0.0466	0.48	0.999680126
STAC3	1.059	0.1077	0.32	0.999359735

3.3 The low-risk team had a better prognosis for OSCC patients

Using the univariate Cox analysis, a sum of 8 signature genes were related to the prognosis of OSCC patients (P < 0.2), and all of them passed PH test (P > 0.05) (Table 1). Subsequently, 4 prognostic genes were screened based on the LASSO analysis (lambda.min = 0.01575649), which included CLTCL1, TNNC1, ALPK3, and PFKM (Fig. 3A). Especially, the 4 prognostic genes were co-enriched in dilated cardiomyopathy, hypertrophic cardiomyopathy (HCM), calcium signaling pathways, etc. (Fig. 3B). Next, a prognostic risk model was built based on the multivariate Cox analysis (Table S6). In TCGA-OSCC, all OSCC patients were classified into high- and low-risk teams, and there were more deaths in the high-risk team (Fig. 3C). Meanwhile, the expression levels of the 4 prognostic genes in the high-risk team were higher than those of low-risk team (Fig. 3D). K-M curve demonstrated that the low-risk team had a better OS probability (P < 0.05) (Fig. 3E). Finally, all of the AUCs were greater than 0.6, which inferred that the model had a good predictive ability (Fig. 3F). Additionally, an independent external dataset, GSE41613, was introduced to verify the model. Likewise, the GSE41613 confirmed the reliability of the prognostic risk model (Fig. 3G-J).

3.4 Nomogram model was built to predict the OS probability of OSCC patients

The risk score, age, and pathologic N were determined as independent prognostic factors using the univariate and multivariate Cox analyses (Fig. 4A-B). Whereafter, a nomogram model was built based on these independent prognostic factors (Fig. 4C). The calibration curves suggested that the actual curves were in good agreement with the predicted curves (Fig. 4D). The DCA showed that the nomogram model had excellent net benefits (Fig. 4E). (Fig. 4F). Furthermore, the white race had a lower risk score than non-white race (Fig. S7). K-M curves approved patients who underwent radiation therapy had a higher OS probability (Fig. S8).

3.5 Immune microenvironment of tumors in OSCC patients was destroyed

Here, we found that the low-risk team had higher immune, stromal, and ESTIMATE scores, and a lower tumor purity in TCGA-OSCC (Fig. 5A). Next, we assessed the difference in infiltration of 28 immune cells between the two risk teams in TCGA-OSCC (Fig. 5B). Among them, there were 23 immune cells that differed between the two risk teams (P < 0.05) (Fig. 5C). In particular, plasmacytoid dendritic cells were highly correlated with ALPK3 (cor = 0.38, P < 0.05) (Fig. 5D). Moreover, a total of 24 immune checkpoints had significant differences between the two risk teams (Fig. 5E). There was a strong correlation between differentially expressed immune checkpoints, among which ICOS and CTLA4 had the strongest correlation (cor = 0.93, P < 0.05) (Fig. 5F).

3.6 Molecular regulatory networks were built based on the prognostic genes

In TCGA-OSCC, there were 2,940 DE-LncRNAs between the tumors and paracancerous tissues (Fig. 6A, 6B). Based on the 4 prognostic genes, 285 miRNAs were predicted (Table S7), and based on these 285 miRNAs, 725 lncRNAs were predicted (Table S8). By overlapping, a total of 66 hub lncRNAs were obtained (Fig. 6C). Subsequently, a lncRNA-miRNA-mRNA network was constructed based on the 4 prognostic genes, 285 miRNAs, and 66 hub lncRNAs, which contained LINC00943-hsa-miR-4319-ALPK3, LINC00839-hsa-miR-454-3p-CLTCL1, LINC01572-hsa-miR-4429-PFKM, etc. (Fig. 6D, Table S9). Meanwhile, a total of 32 TFs were predicted, and GATA2 might regulate 3 prognostic genes, PFKM, TNNC1, and ALPK3 (Fig. 6E). Furthermore, drug sensitivity analysis indicated that 38 drugs were sensitive to patients in the high-risk team and 24 drugs were sensitive to patients in the low-risk team (Fig. 6F).

OSCC is an aggressive tumor and its prognosis has exhibited little improvement in the last three decades^[28]. In recent years, epidemiologists have increasingly sought to employ genetic data to identify 'causal' relationships between exposures of interest and various endpoints.

MR, an instrumental variable approach, can provide a theoretical reference for prognosis of disease^[9], but it has little application in OSCC. This study mainly used transcriptome data combined with MR analysis to screen and identify LRGS-related prognostic genes in OSCC.

This study first performed differential expression analysis based on the TCGA dataset to obtain differential genes between OSCC and normal control tissues, and then intersected with LRG-related genes obtained by WGCNA to obtain 271 candidate genes (Fig. 1H). Next, we carried out MR and sensitivity analysis, selecting 18 candidate genes finally. Based on these 18 genes, univariate COX regression analysis, LASSO regression analysis and multi-factor COX regression analysis were performed. Finally, 4 genes were used to construct the risk model, which were: CLTCL1, TNNC1, ALPK3, PFKM.

CLTCL1 (clathrin, heavy chain-like 1) belongs to the clathrin family and encodes a protein of 1640 amino acids that is highly expressed in muscle tissues. Clathrins are essential for intracellular traffic. CLTCL1 was mutated in 15% of the tumors; all mutations were deleterious. The gene encodes a major structural protein of the coated pits and vesicles involved in endocytosis. By adversely affecting trafficking of growth factor receptors and cycling of integrins and cathedrins, derailed endocytosis is believed to play an important role in cancer. CLTCL1 is overexpressed in several cancer types. The gene was also amplified in OSCC samples, suggesting that the mutations may be activating ^[29]. Troponin C type 1 (TNNC1) is a member of troponin family, which is known to facilitate the interaction between actin and myosin by binding to calcium. It has been reported that overexpression of TNNC1 is commonly observed in various solid tumors. However, in non-muscle cells, TNNC1 may act as a regulatory protein for cellular locomotion, cytoplasmic streaming and cytokinesis, rather than as a structural protein. TNNC1 was over-expressed in ovarian cancer cells, and elevated TNNC1 expression regulated epithelial cancer cell motility and invasion potential via cytoskeleton reorganization. Expressions of TNNC1 were significantly elevated in patients with TSCC. TNNC1 expression was associated with cervical lymphatic recurrence. TNNC1 can predict the prognosis of TSCC and its occult cervical lymphatic metastasis^[30]. PFKM is a crucial regulatory target encoded by the PFKM (phosphofructokinase muscle) gene, as it serves as an activator of muscle glycolysis, which is critical for cancer dissemination. Moreover, an in silico study reported PFKM as a potential therapeutic target for cancer and aerobic glycolysis. PFKM genetic mutation associated with different cancers, including human melanoma, breast cancer, bladder cancer, non-small-cell lung cancer, and glioma has also been observed^[31]. PFKM is reported to be up-regulated in oral cancer^[32]. Alpha-protein kinase 3 (ALPK3), located on chromosome 15q25.2, is a poorly studied protein, but from the limited data available, it seems to be involved in the phosphorylation of cardiac relevant transcription factors and in cardiomyocyte differentiation. Recent evidence has shown that ALPK3 participates in intercalated disc and sarcomere structural organization and murine knock-out models show ventricular hypertrophy and impaired contractility. Abnormal calcium handling has been observed in cardiomyocytes differentiated from stem cells carrying homozygous ALPK3 variants^[33]. ALPK3 has been found to be significantly and frequently mutated in OSCC^[34]. In this study, the expression levels of 4 genes (CLTCL1, TNNC1, ALPK3, PFKM) in tumor tissues were all higher than those in the control group in the TCGA-OSCC dataset, which was consistent with literature reports and verified the results of our data mining.

In this study, we were the first to discover that the four prognostic genes were highly expressed in high-risk patients with OSCC (Fig. 3D). And we certify that the four genes were co-enriched in dilated cardiomyopathy, hypertrophic cardiomyopathy (HCM), and calcium signaling pathways (Fig. 3B). The results of MR showed that the remaining genes were protective factors, while TNNC1 was a risk factor, which may be attributed to the heterogeneity among human populations and the potential interaction among each exposure factor (gene). There are reports of a correlation between the these four genes and OSCC^[29–34], but there is no indication of causation. We were the first to find a possible causal relationship between the two.

After that, we evaluated the risk model through KM-curve and ROC curve, and the results showed that the model had a good effect. Then the external validation set showed that the model had good predictive performance. In order to evaluate the ability of risk score and clinical feature factors as independent prognostic factors for OSCC, a nomogram model was constructed, and the calibration curve and ROC curve were drawn to evaluate the performance of the nomogram, and the results were found to be good. The boxplot of the relationship between risk scores and clinical features was drawn, and the differences in risk scores among different clinical features were found. Those indicates this study may have potential clinical significance. We conducted a series of analyses, including immune checkpoint analysis, immune cell infiltration analysis and GSEA, to explore the potential mechanisms of OSCC patients in the high and low prognostic risk groups. These studies provided ideas for the study of the pathological mechanism of OSCC. Finally, we conducted the drug sensitivity analysis. These provide a reference for the treatment and the design of clinical trials about OSCC.

In summary, This study was the first to combine OSCC transcriptome data, LRGs and MR Analysis to identify prognostic genes of OSCC by differential expression analysis, COX regression analysis, immunoinfiltration analysis and other analytical methods. The identification of these genes may provide certain clinical significance for the diagnosis and treatment of OSCC. Although we had verified 4 prognostic genes for OSCC occurrence through external data, this study lacked animal models and we were unable to verify our findings in vivo. In the future, we can confirm the 4 prognostic genes by constructing animal models of OSCC, so as to helps to accurately stratify OSCC patients at an early stage, thereby improving long-term prognostic outcomes. This will provide theoretical support and reference for the clinical treatment of OSCC patients.

OSCC Oral squamous cell carcinomaT

eQTL Expression quantitative trait loci

DEGs Differentially expressed genes

WGCNA Weighted gene co-expression network analysis

DE-LRGs Differentially expressed lipophagy-related genes

MR Mendelian randomization

LASSO Least absolute shrinkage and selection operator

ssGSEA Single sample gene set enrichment analysis

OR Odds ratio

CI Confidence intervals

OS Overall survival

TNM Tumor-Node-Metastasis

LDs Lipid droplets

RCTs Randomized controlled trials

TCGA The Cancer Genome Atlas

HNSC Head and neck squamous cell

RNA-seq RNA-sequencing

GEO Gene Expression Omnibus

GWAS Genome-wide association study

IEU Epidemiology Unit

SNPs Single nucleotide polymorphisms

OCC Oral cavity cancer

IVs Instrument variables

IVW Inverse variance weighted

OR Odds ratio

ELN Examined lymph node

GDSC Genomics of Drug Sensitivity in Cancer

HCM Hypertrophic cardiomyopathy

Acknowledgements

The research team is grateful to all the participants who took part in this study, as well as the support of the Natural Science Foundation of Gansu Province.

Author information

Authors and Affiliations

Department of Stomatology, the 940th Hospital of Joint Logistics Support Force of Chinese PLA, Lanzhou 730050, Gansu, China.

Fangyu Chen*, Qianqi Yan, Ya Guo, Jin zhang, Linhan Su, Lijian Xue*

Contributions

J.X.L, contributed to the data acquisition and interpretation, critically revised the manuscript ;F.Y.C, contributed to the conception, design, data acquisition, analysis and interpretation, draft and critically revised the manuscript; Q.Q.Y, contributed to the data acquisition, critically revised the manuscript; Y.G., J.Z., L.H.S., contributed to the conception, design, data interpretation, critically revised the manuscript. All authors gave final approval and agree to be accountable for all aspects of the work.

Corresponding authors

Correspondence to Lijian Xue.

Funding

This paper was supported by the Natural Science Foundation of Gansu（23JRRA530）.

Data availability

The data supporting the findings of this study are available upon request from the corresponding author.

Ethics approval and consent to participate

The data used in this study have been ethically approved and informed consent has been obtained from the participants in the original research.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Bugshan A, Farooq I. Oral squamous cell carcinoma: metastasis, potentially associated malignant disorders, etiology and recent advancements in diagnosis. F1000Res. 2020 Apr 2;9:229(2020)
Law ZJ, Khoo XH, Lim PT, Goh BH, Ming LC, Lee WL, Goh HP. Extracellular Vesicle-Mediated Chemoresistance in Oral Squamous Cell Carcinoma. Front Mol Biosci. 2021 Mar 9;8:629888(2021)
Agarbati S, Mascitti M, Paolucci E, et al. Prognostic relevance of macrophage phenotypes in high-grade oral tongue squamous cell carcinomas. Appl Immunohistochem Mol Morphol. 2021;29(5):359-365(2021)
Russo D, Mariani P, Caponio V, et al. Development and validation of prognostic models for oral squamous cell carcinoma: a systematic review and appraisal of the literature. Cancers (Basel) 2021;13(22):996(2021)
Zhang J, Ma C, Qin H, et al. Construction and validation of a metabolic-related genes prognostic model for oral squamous cell carcinoma based on bioinformatics. BMC Med Genomics. 2022 Dec 24;15(1):269(2022)
Cui W, Sathyanarayan A, Lopresti M, Aghajan M, Chen C, Mashek DG. Lipophagy-derived fatty acids undergo extracellular efflux via lysosomal exocytosis. Autophagy. 2021 Mar;17(3):690-705(2021)
Zhang S, Peng X, Yang S, et al. The regulation, function, and role of lipophagy, a form of selective autophagy, in metabolic disorders. Cell Death Dis. 2022 Feb 8;13(2):132(2022)
Xu D, Wang Z, Xia Y, et al. The gluconeogenic enzyme PCK1 phosphorylates INSIG1/2 for lipogenesis. Nature. 2020;580:530-535(2020)
Birney E. Mendelian Randomization. Cold Spring Harb Perspect Med. 2022 May 17;12(4):a041302(2022)
Larsson SC, Butterworth AS, Burgess S. Mendelian randomization for cardiovascular diseases: principles and applications. Eur Heart J. 2023 Dec 14;44(47):4913-4924(2023)
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550(2014)
Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016 Sep 15;32(18):2847-9(2016)
Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics. 2013 Jan 16;14:7(2013)
Yu G, Wang LG, Han Y, He QY. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012 May;16(5):284-7(2012)
Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, Laurin C, Burgess S, Bowden J, Langdon R, Tan VY, Yarmolinsky J, Shihab HA, Timpson NJ, Evans DM, Relton C, Martin RM, Davey Smith G, Gaunt TR, Haycock PC. The MR-Base platform supports systematic causal inference across the human phenome. Elife. 2018 May 30;7:e34408(2018)
Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol. 2015 Apr;44(2):512-25(2015)
Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator. Genet Epidemiol. 2016 May;40(4):304-14(2016)
Burgess S, Scott RA, Timpson NJ, Davey Smith G, Thompson SG; EPIC- InterAct Consortium. Using published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. Eur J Epidemiol. 2015 Jul;30(7):543-52(2015)
Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, Laurin C, Burgess S, Bowden J, Langdon R, Tan VY, Yarmolinsky J, Shihab HA, Timpson NJ, Evans DM, Relton C, Martin RM, Davey Smith G, Gaunt TR, Haycock PC. The MR-Base platform supports systematic causal inference across the human phenome. Elife. 2018 May 30;7:e34408(2018)
Dobutamine stress test. Lancet. 1988 Dec 10;2(8624):1347-8(2018)
Liu TT, Li R, Huo C, Li JP, Yao J, Ji XL, Qu YQ. Identification of CDK2-Related Immune Forecast Model and ceRNA in Lung Adenocarcinoma, a Pan-Cancer Analysis. Front Cell Dev Biol. 2021 Jul 30;9:682002(2021)
Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010;33(1):1-22(2010)
Yu G, Wang LG, Han Y, He QY. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012 May;16(5):284-7(2012)
Yoshihara K, Shahmoradgoli M, Martínez E, Vegesna R, Kim H, Torres-Garcia W, Treviño V, Shen H, Laird PW, Levine DA, Carter SL, Getz G, Stemke-Hale K, Mills GB, Verhaak RG. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun. 2013;4:2612(2013)
Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics. 2013 Jan 16;14:7(2013)
Xue C, Gu X, Zhao Y, Jia J, Zheng Q, Su Y, Bao Z, Lu J, Li L. Prediction of hepatocellular carcinoma prognosis and immunotherapeutic effects based on tryptophan metabolism-related genes. Cancer Cell Int. 2022 Oct 10;22(1):308(2022)
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003 Nov;13(11):2498-504(2003)
Sasahira T, Kirita T. Hallmarks of Cancer-Related Newly Prognostic Factors of Oral Squamous Cell Carcinoma. Int J Mol Sci. 2018 Aug 16;19(8):2413. doi: 10.3390/ijms19082413. PMID: 30115834; PMCID: PMC6121568(2018)
Al-Hebshi NN, Li S, Nasher AT, El-Setouhy M, Alsanosi R, Blancato J, Loffredo C. Exome sequencing of oral squamous cell carcinoma in users of Arabian snuff reveals novel candidates for driver genes. Int J Cancer. 2016 Jul 15;139(2):363-72(2016)
Yang X, Wu K, Li S, Hu L, Han J, Zhu D, Tian X, Liu W, Tian Z, Zhong L, Yan M, Zhang C, Zhang Z. MFAP5 and TNNC1: Potential markers for predicting occult cervical lymphatic metastasis and prognosis in early stage tongue cancer. Oncotarget. 2017 Jan 10;8(2):2525-2535(2017)
Ishfaq M, Bashir N, Riaz SK, Manzoor S, Khan JS, Bibi Y, Sami R, Aljahani AH, Alharthy SA, Shahid R. Expression of HK2, PKM2, and PFKM Is Associated with Metastasis and Late Disease Onset in Breast Cancer Patients. Genes (Basel). 2022 Mar 20;13(3):549(2022)
Bajrai LH, Sohrab SS, Mobashir M, Kamal MA, Rizvi MA, Azhar EI. Understanding the role of potential pathways and its components including hypoxia and immune system in case of oral cancer. Sci Rep. 2021 Oct 1;11(1):19576(2021)
Lopes LR, Garcia-Hernández S, Lorenzini M, Futema M, Chumakova O, Zateyshchikov D, Isidoro-Garcia M, Villacorta E, Escobar-Lopez L, Garcia-Pavia P, Bilbao R, Dobarro D, Sandin-Fuentes M, Catalli C, Gener Querol B, Mezcua A, Garcia Pinilla J, Bloch Rasmussen T, Ferreira-Aguar A, Revilla-Martí P, Basurte Elorz MT, Bautista Paves A, Ramon Gimeno J, Figueroa AV, Franco-Gutierrez R, Fuentes-Cañamero ME, Martinez Moreno M, Ortiz-Genga M, Piqueras-Flores J, Analia Ramos K, Rudzitis A, Ruiz-Guerrero L, Stein R, Triguero-Bocharán M, de la Higuera L, Ochoa JP, Abu-Bonsrah D, Kwok CYT, Smith JB, Porrello ER, Akhtar MM, Jager J, Ashworth M, Syrris P, Elliott DA, Monserrat L, Elliott PM. Alpha-protein kinase 3 (ALPK3) truncating variants are a cause of autosomal dominant hypertrophic cardiomyopathy. Eur Heart J. 2021 Aug 21;42(32):3063-3073(2021)
Lin LH, Chou CH, Cheng HW, Chang KW, Liu CJ. Precise Identification of Recurrent Somatic Mutations in Oral Cancer Through Whole-Exome Sequencing Using Multiple Mutation Calling Pipelines. Front Oncol. 2021 Nov 29;11:741626(2021)

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Transcriptome and Mendelian randomization were combined to screen and validate prognostic genes associated with lipid autophagy in oral squamous cell carcinoma

Status:

Version 1

Abstract

Background

Methods

Results

Conclusion

Figures

1 Introduction

2 Materials and methods

2.1 Data resource

2.2 Differential expression analysis

2.3 Weighted gene co-expression network analysis (WGCNA)

2.4 Function enrichment analysis and protein-protein interaction (PPI) network

2.5 Two-sample MR analysis

2.6 Sensitivity analysis and MR Steiger filtering

2.7 Identification of prognostic genes

2.8 Gene set enrichment analysis (GSEA)

2.9 Establishment of a prognostic risk model

2.10 Verification of a prognostic risk model

2.11 Construction of nomogram model

2.12 Tumor microenvironment

2.13 Molecular regulatory network

2.14 Drug sensitivity analysis

2.15 Statistical analysis

3 Results

3.1 A total of 271 DE-LRGs were associated with lipophagy

3.2 There were 18 exposure factors that could affect directly OSCC patients

3.3 The low-risk team had a better prognosis for OSCC patients

3.4 Nomogram model was built to predict the OS probability of OSCC patients

3.5 Immune microenvironment of tumors in OSCC patients was destroyed

3.6 Molecular regulatory networks were built based on the prognostic genes

4 Discussion

5 Conclusions

Abbreviations

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1