Integration of Single-Cell and Bulk RNA Sequencing Data using Ecotype Machine Learning for Prognostic Biomarker Discovery in Gastric Cancer

doi:10.21203/rs.3.rs-4602919/v1

Download PDF

Research Article

Integration of Single-Cell and Bulk RNA Sequencing Data using Ecotype Machine Learning for Prognostic Biomarker Discovery in Gastric Cancer

https://doi.org/10.21203/rs.3.rs-4602919/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

EcoTyper is a new machine learning framework, this work attempted to constructed an EcoTyper-related prognostic model for gastric cancer (GC).

Methods

The scRNA-seq data and bulk RNA-seq data for GC were obtained from the GEO and TCGA databases, respectively. Cell composition deconvolution was performed using CIBERSORTx. EcoTyper was employed for de novo discovery of scRNA-seq cell states and communities. Weighted Correlation Network Analysis was applied to explore the gene co-expression networks in GC. Subsequently, a risk model for ecotypes was constructed using bulk RNA-seq data.

Results

This work revealed the significant differences in cell distribution between normal and primary samples. Primary tumor samples showed a predominant presence of immune cells, including monocytes/macrophages and neutrophils. These immune cells were classified into two EcoTypers, E1 and E2, with E2 closely linked to primary tumor samples. Using ecotype-related risk scores, GC patients were stratified into high-risk (HR) and low-risk (LR) groups. HR patients exhibited worse overall survival and heightened sensitivity to Mirin, Oxaliplatin, Ruxolitinib, VE-822, and MG-132. Notably, the core gene TGM2 was up-regulated in GC cells, and its silencing reduced GC cell proliferation, migration, and invasion.

Conclusion

This study constructed a meaningful EcoTyper prognostic model, which served as a potential prognostic biomarker for GC treatment. This prognostic model showed significant correlations with immunotherapy and chemotherapy. This research has provided a potential valuable target for GC treatment.

EcoTyper

Gastric cancer

Single-cell RNA sequencing

Prognostic model

Gastric cancer (GC) is one of the common malignant tumors, accounting for the second leading cause of cancer-related deaths in the world [1]. The five-year survival rate of early GC is greater than 90%. If early diagnosis cannot be obtained, the five-year survival rate of advanced GC will decrease to 30–40% [2, 3]. Therefore, early diagnosis of GC is particularly important for the treatment and prognosis of the disease. Traditional tumor markers have limited value in the diagnosis and treatment of GC due to their low sensitivity and specificity [4, 5]. As the gold standard for cancer diagnosis, tissue biopsy is often limited by its large trauma to the body and easy dissemination, the inability to achieve dynamic monitoring in a single biopsy, and the limitations of sample collection that cannot reflect heterogeneity [6]. Due to the presence of a certain misdiagnosis rate and limited differential diagnostic value, imaging examinations can often only be used as clinical references [7, 8]. Therefore, seeking a non-invasive, minimally invasive, predictive model for early diagnosis and prevention of GC is of great significance.

Tumor microenvironment (TME) refers to the internal environment where tumor cells grow, which represents the most direct ecological environment for tumor cells [9]. The tumor tissue is not a homogeneous population, that is, there are multiple cell types in the tumor tissue, and there are different cell subsets under the same cell type, which is called the heterogeneity of TME. The heterogeneity of TME in GC profoundly affects the biological behavior of tumor cells and the overall response of tumor tissue to anti-tumor drugs [10, 11]. In recent years, this heterogeneity of TME is also defined as "tumor ecosystem", that is, tumor cells cooperate with other tumor cells and non-tumor cells in the microenvironment, and adapt and evolve as a whole [12]. The emergence of single cell RNA sequencing technology (scRNA-seq) has brought a breakthrough in the study of TME heterogeneity [13, 14]. ScRNA-seq can identify cell subsets in TME according to the common transcription pattern among cells. Applying scRNA-seq, a "pre-depleted" CD8 subpopulation associated with a better prognosisis found in tumors tissues of non-small cell lung cancer [15]. Another scRNA-seq analysis has revealed that the gene expression pattern of patial-epithelial-mesenchymal transition in malignant epithelium is related to metastasis of head and neck squamous cell carcinoma [16]. Therefore, it is of great significance to use scRNA-seq technology to map the tumor ecosystem of GC in order to further clarify the molecular mechanism of GC.

EcoTyper, a machine learning framework, efficiently identifies cell type-specific transcriptional states and co-association patterns from bulk and single-cell expression data [17, 18]. This method provides a deep understanding of the complex cellular states and interactions in multicellular communities, which may reveal the basic units of cellular tissue in human cancer and contribute to new diagnosis and personalized treatment. In this study, we utilized scRNA-seq data to explore changes in immune cells between GC tissues and normal tissues at the single-cell level. Employing EcoTyper, we discovered novel cell states and ecotypes in GC tissues. Subsequently, a risk model for ecotypes was constructed using bulk RNA-seq data. The effectiveness of this risk model in predicting immunotherapy outcomes was evaluated using the Tumor Immune Dysfunction and Exclusion (TIDE) score. This work may identify novel biomarkers for GC prognostic prediction.

Source of raw data

GSE163558 dataset with 10 × scRNA-seq data were downloaded from Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). It contained 9 fresh human tissue samples from 6 patients. Among them, 3 samples originated from primary tumor tissues (PT1, PT2, and PT3) and 1 sample selected from an adjacent non-tumor tissue (NT1) were used for subsequent analysis. The bulk RNA-seq data and clinical materials of the normal samples (n = 32) and GC samples (n = 375) were downloaded from The Cancer Genome Atlas (TCGA; https://portal.gdc.cancer.gov/). The bulk RNA-seq data and clinical materials in GSE15459 dataset was obtained from GEO.

Data processing of 10 × scRNA-Seq

Low-quality cells were excluded based on the criteria of unique molecular identifiers (UMIs) and total counts for each sample. To identify and remove potential multiplets, we utilized Scrublet (version 1.0). Harmony package was utilized to mitigate batch effects. Scanpy (version 1.7.2) was used to reduce the dimensionality of the cells through principal component analysis (PCA). Clustering and visualization were performed using the Leiden and UMAP algorithms (version 0.5.1). Cell type annotation was conducted based on the expression of specific markers associated with each cell type.

Correlation analysis of tumor immunity

Cell composition deconvolution was performed using CIBERSORTx. Briefly, a signature gene expression matrix was constructed based on scRNA-seq data. The raw count matrix and cell type information were extracted from the Monocyte/Macrophage and Neutrophil subcluster of the Scanpy object. The raw count matrix was then normalized within CIBERSORTx. A signature matrix in counts per million (CPM), incorporating all genes, was generated. Furthermore, FPKM values from TCGA bulk RNA-seq data were transformed into TPM values. Subsequently, cell proportions of TCGA samples were evaluated using CIBERSORTx with bulk RNA-seq data. Finally, batch correction using the S-mode was employed to address cross-platform variation in the deconvolution of TCGA RNA-seq data.

Ecotype discovery and recovery

EcoTyper, an integrated machine learning framework, was utilized to investigate the TME through bulk RNA sequencing or scRNA-seq. EcoTyper was employed for the de novo discovery of cell states and cell communities in scRNA-seq data. Applying non-negative matrix factorization (NMF) on correlation matrices, distinct cell states were successfully identified. To ensure the reliability of the results, we validated the extracted cell state information through cell state rediscovery on expression matrices, incorporating an adaptive false positive index (AFI) for quality control. Additionally, the co-occurrence patterns of cell states were analyzed to identify specific cell communities within the TME.

Recognition of important co-expression modules

Weighted Correlation Network Analysis (WGCNA) was applied to explore the gene co-expression networks in the context of GC. The "WGCNA" package in R was utilized to construct the sample tree, enabling the identification of outlying samples. Subsequently, a WGCNA was built using the GC expression matrix, and the adjacency matrix was transformed into the topology overlapping matrix (TOM). To achieve a scale-free network, a soft-thresholding power of R2 = 0.90 was applied. Dynamic Tree Cut was then utilized to generate co-expression modules, with a minimum module size (minModuleSize) set to 100.

Construction and validation of Ecotype risk features

The core genes of the tumor EcoTyper were determined by identifying the intersection between the results of WGCNA analysis and the EcoTyper analysis using scRNA-seq data. Subsequently, a LASSO regression analysis was conducted using the "glmnet" R package, employing the "cv.glmnet" function for 10-fold cross-validation of prognostic genes. Based on the λ value and the coefficient of the most suitable gene, the risk score was calculated using the following formula:

$$\text{R}\text{i}\text{s}\text{k} \text{s}\text{c}\text{o}\text{r}\text{e}={\sum }_{1}^{\text{n}}{\text{K}}_{\text{n}}\text{*}{\text{A}}_{\text{n}}$$

A is the expression of genes; Kn is the regression coefficient of prognosis-related genes; n is the number of related genes.

The patients were divided into high-risk (HR) and low-risk (LR) groups using the median score as the cutoff. Survival analysis was conducted using the "survminer" package, and the predictive performance of this model was evaluated using receiver operating characteristic (ROC) curve analysis.

Independent prognostic analysis and nomogram construction

To assess the independent prognostic value of the EcoTyper model score, we conducted uniCox and multiCox analyses. A column chart was generated using the "rms" R package to predict the overall survival of clinical patients at 1, 3, and 5 years, incorporating age, grade, gender, stage, T-stage, and the EcoTyper risk score as predictors. The calibration analysis was confirmed the accuracy of the predictions obtained from the column chart.

Functional enrichment analysis

GSEA analysis was conducted using the "ClusterProfiler" R package. The "c2. cp. kegg. v7.4. symbols. gmt" in MSigDB was used to obtain differences in enrichment pathways between different risk groups.

Immunotherapy prediction and chemotherapy sensitivity analysis

Utilizing TIDE (http://tide.dfci.harvard.edu/login/), the immunological dysfunction and exclusion mechanisms in GC was investigated. The evaluation of immunological responses was based on well-established immune checkpoints (ICPs). Additionally, we utilized the "oncopredict" package to predict the half maximal inhibitory concentration (IC50) values of candidate drugs with potential therapeutic efficacy against GC in both high-risk and low-risk groups.

Cell culture

Human normal gastric epithelial cells (GES-1) and GC cell lines (AGS, HGC27 and MKN28) were purchased from Beijing Institute for Cancer Research (Beijing, China) and Cell Bank of Chinese Academy of Sciences (Shanghai, China). These cells were maintained in DMEM (GES-1), Ham F12 (AGS), MEM (HGC27) and RPMI 1640 (MKN28) containing 10% fetal bovine serum (FBS, Gibco, Grand Island, NY, USA), 1% penicillin-streptomycin (Solarbio, Beijing, China) in a humidified atmosphere of 5% CO₂ at 37°C.

Cell transfection

Small interfering RNA (siRNA) specifically targeting TGM2 was constructed by Genepharm (Shanghai, China), and used for TGM2 silencing (si-TGM2). Scrambled siRNA served as control. MKN28 cells were transfected with si-TGM2 or si-NC applying Oligofectamine™ reagent (Thermo Fisher Scientific, San Jose, CA, USA).

Quantitative real-time PCR (qRT-PCR)

To extract total RNA from cells, cells were subjected to homogenization with 1 mL of Trizol reagent (Thermo Fisher Scientific). RNA was served as template to synthesized cDNA utilizing SuperScript Reverse Transcription Kit (Invitrogen, Carlsbad, CA, USA). PCR reaction was carried out on a 7500 Fast RT-PCR System using an SYBR Green Master Mix (Applied Biosystems, Foster City, CA, USA). Data were analyzed by 2^−ΔΔCt method.

CCK-8 assay

Applying Cell Counting Kit-8 (Beyotime, Shanghai, China), proliferation of MKN28 cells was examined. Cells were incubated in an incubator for 24 h. Cells were then into 96-well plate at a density of 2000 cells/100 µL, and incubated with 10 µL CCK-8 reagent for 1 h. Finally, the absorbance value of each well at 450 nm was detected on a Multiskan SkyHigh Microplate Reader (Thermo Fisher Scientific).

Wound healing assay

MKN28 cells were seeded into 6-well plates at a concentration of 5×10⁵/well and cultured at 37°C until 100% confluence. An artificial and straight scratch was created utilizing a 200-µL pipette tip. Then, the suspended cells were removed by washing with PBS. Cells were cultured in serum-free medium for 24 h. At 0 and 24 h after cell culture, wound closure images were captured.

Transwell invasion assay

Transwell invasion assay was carried out to detect cell invasion of MKN28 cells applying a 24-well Transwell plate chamber with 8 µm pore size (Corning Costar, Cambridge, MA, USA). MKN28 cells were seeded into the upper chamber coated with Matrigel. The lower chamber contained FBS-free RPMI 1640 medium. Cells were cultured at 37°C for 24 h, and then the cells on the upper chamber were wiped off with a cotton swab. The invasive cells were fixed with methanol for 10 min and stained with 0.1% crystal violet. Finally, the invasive cells were observed under an optical microscope (Olympus, Tokyo, Japan).

Statistical analysis

Python 3.9 and R software version 4.2.2. were used to perform statistical analysis. p-values and false discovery rate (FDR) q-values less than 0.05 were regarded as significant difference. All cell experiments were run in triplicates and the results were presented as mean ± SD, and analyzed by SPSS 22.0 statistical software (IBM, Armonk, NY, USA). The statistical difference was carried out by applying Student’s t-test and one-way ANOVA. P˂0.05 was considered statistically significant.

Cellular makeup of the carcinoma and para-carcinoma tissues in GC

After stringent quality control, a total of 14,348 high-quality cells and 25,190 genes were selected for subsequent analysis. Subsequently, batch effect integration was conducted on the primary gastric cancer samples and normal samples to mitigate batch effects (Fig. 1A). Furthermore, nine distinct cell types were successfully annotated according to the expression profiles of characteristic cell markers (Fig. 1B-C). To assess tissue preference, the ratio of observed to predicted cell numbers (Ro/e) was calculated for each cluster across different tissues [19]. Among all immune cells, monocytes/macrophages and neutrophils exhibited a predominant distribution in the primary tissues with respect to normal tissues (Fig. 1D). Functional enrichment analysis revealed the main signaling pathways involved in cells, including B cells, endothelial, epithelial, fibroblast, mast cells, monocyte/macrophages, neutrophil, plasma cells and T cells. Among them, monocyte/macrophages were mainly involved in apoptosis, NOD-like receptor signaling pathway and neutrophil extracellular trap formation. Neutrophil cells participated in the IL-17 signaling pathway, NF-kappa B signaling pathway and NOD-like receptor signaling pathway (Fig. 1E).

Subdivision of monocytes/macrophages and neutrophils

A total of 3,274 monocytes/macrophages and neutrophils were isolated. Utilizing specific immune cell markers, these cells were categorized into four distinct subclusters, including dendritic cells, monocytes, macrophages and neutrophil cells (Fig. 2A-B). Dendritic cells were characterized by the expression of CCR7 and LAMP3, while macrophages exhibited marker genes LGMN and CTSB. Monocytes were identified by the expression of FCNA and VCAN, and neutrophils were marked by NAMPT (Fig. 2B). Notably, reliable discrimination or classification of subclusters within neutrophils is currently not achievable.

De novo Discovery of Cell States and Ecotypes

We employed the EcoTyper machine learning framework to conduct cell state and ecotype discovery for dendritic cells, macrophages, monocytes, and neutrophils. Two distinct EcoTyper, namely E1 and E2, were identified, with E1 representing normal tissue and E2 representing primary tissue. Thus, E2 may associated with GC development. E2 contained dendritic cells with cell state S04, monocytes with cell state S02 and neutrophil cells with cell state S03 (Fig. 2C). The distribution of different cell states within the E2 ecotype was shown in Fig. 2D. Additionally, we conducted enrichment analysis to explore the functional characteristics of these three cell types. Dendritic cells were mainly involved in cytokine -cytokine receptor interaction, viral protein interaction with cytokine -cytokine receptor, and chemokine signaling pathway. Monocytes participated in HIF-1 signaling pathway, Ras signaling pathway and focal adhesion. Neutrophil cells were associated with cell adhesion molecules, focal adhesion and ECM-receptor interaction (Fig. 2E).

Gene modules screening and co-expression network construction

The signature matrix for cell states within the E2 ecotype was constructed using the CIBERSORTx. Subsequently, deconvolution analysis and assessment of tumor immune response were performed on the TCGA-STAD dataset. CIBERSORTx calculated and displayed the infiltration abundances of the three cell states in the samples (Fig. 3A). These infiltration abundances were then used as the phenotypic file for sample characterization in the WGCNA. The scale-free network was constructed when the soft thresholding power was set as 8 and scale-free R2 was set as 0.9 (Fig. 3B). Employing the "merge dynamic" algorithm, a total of 12 modules with different colors were identified. Among these modules, the black module showed the highest correlation with dendritic cells and monocytes, while the tan module exhibited the highest correlation with neutrophils (Fig. 3C). The genes in the black module were found to be closely associated with dendritic cells and monocytes, whereas the genes in the tan module were closely associated with neutrophils (Fig. 3D-F). The core genes of the E2 ecotype were determined by identifying the intersection between the results obtained from WGCNA analysis and EcoTyper analysis. A total of 104 genes was obtained (Fig. 3G).

Establishment and validation of EcoTyper risk model

Based on 104 core genes identified by EcoTyper, a prognostic model using LASSO regression analysis was developed. The complete survival information of 350 patients from the TCGA-STAD dataset served as the training set. Following LASSO regression analysis, 15 genes were retained in the final model, including CDA, MCOLN2, S100A9, S100A8, RGS1, CXCR2, TNIP3, RILPL2, MMP25, CCL17, SERPINB2, ICAM3, S1PR5, PGLYRP1, and TGM2 (Fig.s 4A-B). To assess the stability and robust generalization of the prognostic model, we utilized the GSE15459 cohort as an external validation cohort. Applying the same risk formula, the risk scores for each sample in the GSE15459 validation cohort was calculated. Survival analysis conducted on both the training set and validation set showed a significantly extended survival time for patients in the LR group with respect to HR group (Fig. 4C-D). Patients with HR exhibited worse overall survival (OS) than that LR patients (Fig. 4E-F). In the training cohort, the AUC values for 1, 3, and 5 years were 0.669, 0.704, and 0.762, respectively (Fig. 4G). In the validation cohort, ROC curves displayed a favorable predictive capacity with AUC values of 0.655 at 1 year, 0.681 at 3 years, and 0.665 at 5 years (Fig. 4H).

Nomograms based on EcoTyper signatures and clinical characteristics were developed.

To verify the reliability and clinical value of the constructed biological features as prognostic factors, we included a comparison of risk scores for each GC patient with two common clinical indicators, and observed the correlation between each factor and patient prognosis after continuous univariate and multivariate Cox analysis. It was found that Stage, T-stage, and risk score (P < 0.001) were all prognostic factors significantly associated with GC patient prognosis (Fig. 5A-B). We integrated risk scores and their clinical indicators to construct a column chart as a means of predicting survival probabilities at 1, 3, and 5 years of prognosis (Fig. 5C). Calibration analysis showed that the patient's overall survival prediction curves at 1, 3, and 5 years had excellent stability (Fig. 5D). Then, we compared the Nomogram, risk, and common clinical pathological features, and found that the AUC value of the risk model score (AUC = 0.708) was much higher than other pathological features (Fig. 5E).

Gene set enrichment analyses and Drug sensitivity

GSEA analysis uncovered that HR group was mainly involved in arrhythmogenic right ventricular cardiomyopathy, dilated cardiomyopathy, ECM receptor interaction, focal adhesion, hypertrophic cardiomyopathy hcm (Fig. 6A). LR group was mainly involved in autoimmune thyroid disease, oxidative phosphorylation, pentose phosphate pathway and primary immunodeficiency (Fig. 6B). The response of patients to immune checkpoint blockade was predicted through TIDE analysis. HR group exhibited higher TIDE score with respect to LR group (Fig. 6C). We analyzed the relationship between risk scores and the IC50 values of five Food and Drug Administration (FDA)-approved chemotherapies, and immunological agents. As shown in Fig. 6D, Mirin, Oxaliplatin, Ruxolitinib, VE-822 and MG-132 were found to be more sensitive in the HR group.

TGM2 knockdown reduced malignant phenotypes of GC cells.

We employed the R software package "maftools" to investigate and summarize mutation-related TCGA-STAD data. As depicted in Fig. 7A, 15 core genes exhibited varying mutation frequencies. Notably, TGM2, TNIP3, and SERPINB2 displayed higher mutation frequencies compared to other genes. Furthermore, these genes, particularly TGM2, exhibited elevated expression levels in GC tumors, as illustrated in Fig. 7B. Consistently, up-regulation of TGM2, TNIP3 and SERPINB2 was observed in GC cell lines (AGS, HGC27 and MKN28 cells) (Fig. 8A). Moreover, TGM2 was silenced in MKN28 cells. QRT-PCR results showed that the expression of TGM2 was severely decreased in MKN28 cells in the presence of si-TGM2 (Fig. 8B). To test if TGM2 deficiency affect malignant phenotypes of GC cells, CCK-8, wound healing and transwell invasion assays were performed. As shown in Fig. 8C-E, TGM2 knockdown notably inhibited proliferation, migration and invasion of MKN28 cells. Thus, these data indicated that TGM2 knockdown reduced malignant phenotypes of GC cells.

With the great breakthrough of immunotherapy in solid tumors, many scientists have gradually realized the importance of TME for the occurrence and development of tumors [20]. Tumor cells are not isolated individuals, and their microenvironment is an active participant in tumor occurrence and development. The infiltration of various immune cells and interstitial cells in the TME plays a very important role in tumor killing and tumor immune escape [21, 22]. This work analyzed scRNA-seq data about GC samples and marker annotated the cell clusters in GC samples. There was a significant difference in cell distribution between the normal and primary samples. Compared with normal samples, immune cells exhibited a predominant distribution in the primary tumor samples, such as monocytes/macrophages and neutrophils. The therapeutic sensitivity of tumor depends to a large extent on the complex interaction between cancer cells and different components of TME, especially the interaction among immune cells [23, 24]. Among them, tumor-associated macrophage (TAM) are the key cells in the TME. For instance, GC-derived derived mesenchymal stromal cells secret IL-6 and IL-8 to drive M2 polarization of macrophages, thereby accelerating metastasis of GC cells [25]. M2 polarized macrophages-derived exosomes elevate cisplatin resistance in GC cells by delivering miR-21 [26]. Neutrophils are active participants in the TME and have been found to play an important role in tumor development, growth and metastasis [27]. Previous study has confirmed that peripheral neutrophil cells in the blood of cancer patients generally increase, including GC [28, 29]. Templeton et al. have carried out a meta-analysis to observe 100 studies on different types and stages of cancer, and the results showed that neutrophil to lymphocyte ratio > 4 was associated with lower overall survival rate [30]. Zhang et al. have demonstrated that exosomes derived from GC cells induces autophagy and promotes N2 polarization of neutrophil cells through HMGB1/TLR4/NF-κB signaling pathway, which contributes to GC development [31].

EcoTyper is a machine learning framework developed for large-scale identification and validation of cell states and multicellular communities from bulk, single cell, and spatially resolved gene expression databases [17]. EcoTyper is applied to analyze 12 cell lineages of 16 human cancers, and determined 69 cell states. Most of these cell states are unique and commonly present in various tumor tissues, and have significant prognostic significance. Steen et al. have utilized EcoTyper to obtain high-resolution maps of 13 cell types from hundreds of diffuse large B cell lymphoma tumors. Forty-four cell states that can reflect malignant B cells and other cell types are identified in TME of diffuse large B cell lymphoma, revealing a rich cellular ecosystem landscape beyond traditional diffuse large B cell lymphoma classification [18]. In this work, we first applied EcoTyper based on scRNA-seq data to reveal the changes in immune cells between GC tissues and normal tissues at the single-cell level. It showed that monocytes/macrophages and neutrophil cells were closely associated with GC samples. These immune cells were divided into two different EcoTypers, labeled E1 and E2. E2 EcoTyper was closely associated with primary tumor samples, which may participate in GC development.

In this work, we innovatively combined scRNA-seq data with EcoTyper and bulk RNA-seq data to build a novel prognostic model for GC. Patients with HR exhibited worse overall survival than LR patients. Moreover, HR group exhibited higher TIDE score with respect to LR group. Mirin, Oxaliplatin, Ruxolitinib, VE-822 and MG-132 were found to be more sensitive in the HR group. Based on the EcoTyper model risk score, these drugs predicted possible potential for therapeutic drugs under certain conditions.

Based on a prognostic model comprising 15 core genes, it was observed that TGM2, TNIP3, and SERPINB2 exhibited elevated expression levels and mutation frequencies in GC tumors, with a particular emphasis on TGM2. Furthermore, in vitro experiments indicated that the deficiency of TGM2 significantly impeded the proliferation, migration, and invasion of GC cells. Multiple studies have confirmed the carcinogenic role of TGM2 in GC. For instance, research by Zhao et al. has validated that heightened TGM2 expression contributes to the inflammatory response in GC [32]. Additionally, 18β-glycyrrhetinic acid has shown to inhibit malignant phenotypes in GC cells by suppressing TGM2 expression [33]. The use of ¹⁸⁸Re-labeled GX1 dimer, which specifically binds to TGM2, has exhibited promise in inhibiting tumor angiogenesis in GC [34]. Hence, TGM2 emerges as a potential therapeutic target for GC.

This study combined scRNA-seq and transcriptomic and utilized EcoTyper analysis to investigate the cellular ecosystem in GC. A meaningful EcoTyper prognostic model was constructed, which served as a potential prognostic biomarker for GC treatment. This prognostic model showed significant correlations with immunotherapy and chemotherapy. This research has provided a potential valuable target for GC treatment, but further experimental studies are required for validation.

Acknowledgements

Not applicable.

Authors’ contributions

Y.Z. and T.L. conducted the bioinformatics analysis and drafted the original manuscript. T.L. and Y.Q. conceived the study and participated in the study design, performance and coordination. Y.Z., T.L., Y.Q. and K.X. performed the data acquisition and graphics production. Y.Z. and K.X. revised the manuscript. All authors reviewed and approved the final manuscript.

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Availability of data and materials

The datasets analyzed for this study were obtained from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) and TCGA (https://portal.gdc.cancer.gov/).

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors report no conflicts of interests.

Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209–49.
Machlowska J, Baj J, Sitarz M, Maciejewski R, Sitarz R. Gastric Cancer: Epidemiology, Risk Factors, Classification, Genomic Characteristics and Treatment Strategies. Int J Mol Sci. 2020;21(11).
Luo M, Li L. Clinical utility of miniprobe endoscopic ultrasonography for prediction of invasion depth of early gastric cancer: A meta-analysis of diagnostic test from PRISMA guideline. Medicine. 2019;98(6):e14430.
Tsai MM, Wang CS, Tsai CY, Huang HW, Chi HC, Lin YH et al. Potential Diagnostic, Prognostic and Therapeutic Targets of MicroRNAs in Human Gastric Cancer. Int J Mol Sci 2016;17(6).
Wu D, Zhang P, Ma J, Xu J, Yang L, Xu W, et al. Serum biomarker panels for the diagnosis of gastric cancer. Cancer Med. 2019;8(4):1576–83.
Douda L, Cyrany J, Tachecí I. Early gastric cancer. Vnitr Lek. 2022;68(6):371–5.
Yao K, Uedo N, Kamada T, Hirasawa T, Nagahama T, Yoshinaga S, et al. Guidelines for endoscopic diagnosis of early gastric cancer. Dig endoscopy: official J Japan Gastroenterological Endoscopy Soc. 2020;32(5):663–98.
Horisoko E, Tsushima Y, Taketomi-Takahashi A, Tokunaga M, Endo K. Essential pre-treatment imaging examinations in patients with endoscopically-diagnosed early gastric cancer. BMC Med Inf Decis Mak. 2010;10:33.
Yang Y, Meng WJ, Wang ZQ. Cancer Stem Cells and the Tumor Microenvironment in Gastric Cancer. Front Oncol. 2021;11:803974.
Zeng D, Zhou R, Yu Y, Luo Y, Zhang J, Sun H, et al. Gene expression profiles for a prognostic immunoscore in gastric cancer. Br J Surg. 2018;105(10):1338–48.
Jiang Y, Zhang Q, Hu Y, Li T, Yu J, Zhao L, et al. ImmunoScore Signature: A Prognostic and Predictive Tool in Gastric Cancer. Ann Surg. 2018;267(3):504–13.
Ren X, Kang B, Zhang Z. Understanding tumor ecosystems by single-cell sequencing: promises and limitations. Genome Biol. 2018;19(1):211.
Sathe A, Grimes SM, Lau BT, Chen J, Suarez C, Huang RJ, et al. Single-Cell Genomic Characterization Reveals the Cellular Reprogramming of the Gastric Tumor Microenvironment. Clin cancer research: official J Am Association Cancer Res. 2020;26(11):2640–53.
Kang B, Camps J, Fan B, Jiang H, Ibrahim MM, Hu X, et al. Parallel single-cell and bulk transcriptome analyses reveal key features of the gastric tumor microenvironment. Genome Biol. 2022;23(1):265.
Guo X, Zhang Y, Zheng L, Zheng C, Song J, Zhang Q, et al. Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing. Nat Med. 2018;24(7):978–85.
Puram SV, Tirosh I, Parikh AS, Patel AP, Yizhak K, Gillespie S, et al. Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer. Cell. 2017;171(7):1611–e2424.
Luca BA, Steen CB, Matusiak M, Azizi A, Varma S, Zhu C, et al. Atlas of clinically distinct cell states and ecosystems across human solid tumors. Cell. 2021;184(21):5482–e9628.
Steen CB, Luca BA, Esfahani MS, Azizi A, Sworder BJ, Nabet BY, et al. The landscape of tumor cell states and ecosystems in diffuse large B cell lymphoma. Cancer Cell. 2021;39(10):1422–e3710.
Zhang L, Yu X, Zheng L, Zhang Y, Li Y, Fang Q, et al. Lineage tracking reveals dynamic relationships of T cells in colorectal cancer. Nature. 2018;564(7735):268–72.
Roma-Rodrigues C, Mendes R, Baptista PV, Fernandes AR. Targeting Tumor Microenvironment for Cancer Therapy. Int J Mol Sci. 2019;20(4).
Jiang L, Wang YJ, Zhao J, Uehara M, Hou Q, Kasinath V, et al. Direct Tumor Killing and Immunotherapy through Anti-SerpinB9 Therapy. Cell. 2020;183(5):1219–e3318.
Liu Z, Wang T, She Y, Wu K, Gu S, Li L, et al. N(6)-methyladenosine-modified circIGF2BP3 inhibits CD8(+) T-cell responses to facilitate tumor immune evasion by promoting the deubiquitination of PD-L1 in non-small cell lung cancer. Mol Cancer. 2021;20(1):105.
Stakheyeva M, Riabov V, Mitrofanova I, Litviakov N, Choynzonov E, Cherdyntseva N, et al. Role of the Immune Component of Tumor Microenvironment in the Efficiency of Cancer Treatment: Perspectives for the Personalized Therapy. Curr Pharm Design. 2017;23(32):4807–26.
Senthebane DA, Rowe A, Thomford NE, Shipanga H, Munro D, Mazeedi M et al. The Role of Tumor Microenvironment in Chemoresistance: To Survive, Keep Your Enemies Closer. Int J Mol Sci. 2017;18(7).
Li W, Zhang X, Wu F, Zhou Y, Bao Z, Li H, et al. Gastric cancer-derived mesenchymal stromal cells trigger M2 macrophage polarization that promotes metastasis and EMT in gastric cancer. Cell Death Dis. 2019;10(12):918.
Zheng P, Chen L, Yuan X, Luo Q, Liu Y, Xie G, et al. Exosomal transfer of tumor-associated macrophage-derived miR-21 confers cisplatin resistance in gastric cancer cells. J experimental Clin cancer research: CR. 2017;36(1):53.
Wu L, Saxena S, Singh RK. Neutrophils in the Tumor Microenvironment. Adv Exp Med Biol. 2020;1224:1–20.
Coffelt SB, Wellenstein MD, de Visser KE. Neutrophils in cancer: neutral no more. Nat Rev Cancer. 2016;16(7):431–46.
Wang TT, Zhao YL, Peng LS, Chen N, Chen W, Lv YP, et al. Tumour-activated neutrophils in gastric cancer foster immune suppression and disease progression through GM-CSF-PD-L1 pathway. Gut. 2017;66(11):1900–11.
Templeton AJ, McNamara MG, Šeruga B, Vera-Badillo FE, Aneja P, Ocaña A, et al. Prognostic role of neutrophil-to-lymphocyte ratio in solid tumors: a systematic review and meta-analysis. J Natl Cancer Inst. 2014;106(6):dju124.
Zhang X, Shi H, Yuan X, Jiang P, Qian H, Xu W. Tumor-derived exosomes induce N2 polarization of neutrophils to promote gastric cancer cell migration. Mol Cancer. 2018;17(1):146.
Cho SY, Oh Y, Jeong EM, Park S, Lee D, Wang X, et al. Amplification of transglutaminase 2 enhances tumor-promoting inflammation in gastric cancers. Exp Mol Med. 2020;52(5):854–64.
Li X, Ma XL, Nan Y, Du YH, Yang Y, Lu DD, et al. 18β-glycyrrhetinic acid inhibits proliferation of gastric cancer cells through regulating the miR-345-5p/TGM2 signaling pathway. World J Gastroenterol. 2023;29(23):3622–44.
Yin J, Xin B, Hui X, Chai N, Yao L, Hu H, et al. (188)Re-labeled GX1 dimer as a novel dual-functional probe targeting TGM2 for imaging and antiangiogenic therapy of gastric cancer. Eur J Pharm biopharmaceutics: official J Arbeitsgemeinschaft fur Pharmazeutische Verfahrenstechnik eV. 2020;154:144–52.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Integration of Single-Cell and Bulk RNA Sequencing Data using Ecotype Machine Learning for Prognostic Biomarker Discovery in Gastric Cancer

Status:

Version 1

Abstract

Background

Methods

Results

Conclusion

Figures

Background

Methods

Source of raw data

Data processing of 10 × scRNA-Seq

Correlation analysis of tumor immunity

Ecotype discovery and recovery

Recognition of important co-expression modules

Construction and validation of Ecotype risk features

Independent prognostic analysis and nomogram construction

Functional enrichment analysis

Immunotherapy prediction and chemotherapy sensitivity analysis

Cell culture

Cell transfection

Quantitative real-time PCR (qRT-PCR)

CCK-8 assay

Wound healing assay

Transwell invasion assay

Statistical analysis

Results

Cellular makeup of the carcinoma and para-carcinoma tissues in GC

Subdivision of monocytes/macrophages and neutrophils

De novo Discovery of Cell States and Ecotypes

Gene modules screening and co-expression network construction

Establishment and validation of EcoTyper risk model

Gene set enrichment analyses and Drug sensitivity

Discussion

Conclusion

Declarations

References

Additional Declarations

Status:

Version 1