Source of raw data
GSE163558 dataset with 10 × scRNA-seq data were downloaded from Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). It contained 9 fresh human tissue samples from 6 patients. Among them, 3 samples originated from primary tumor tissues (PT1, PT2, and PT3) and 1 sample selected from an adjacent non-tumor tissue (NT1) were used for subsequent analysis. The bulk RNA-seq data and clinical materials of the normal samples (n = 32) and GC samples (n = 375) were downloaded from The Cancer Genome Atlas (TCGA; https://portal.gdc.cancer.gov/). The bulk RNA-seq data and clinical materials in GSE15459 dataset was obtained from GEO.
Data processing of 10 × scRNA-Seq
Low-quality cells were excluded based on the criteria of unique molecular identifiers (UMIs) and total counts for each sample. To identify and remove potential multiplets, we utilized Scrublet (version 1.0). Harmony package was utilized to mitigate batch effects. Scanpy (version 1.7.2) was used to reduce the dimensionality of the cells through principal component analysis (PCA). Clustering and visualization were performed using the Leiden and UMAP algorithms (version 0.5.1). Cell type annotation was conducted based on the expression of specific markers associated with each cell type.
Correlation analysis of tumor immunity
Cell composition deconvolution was performed using CIBERSORTx. Briefly, a signature gene expression matrix was constructed based on scRNA-seq data. The raw count matrix and cell type information were extracted from the Monocyte/Macrophage and Neutrophil subcluster of the Scanpy object. The raw count matrix was then normalized within CIBERSORTx. A signature matrix in counts per million (CPM), incorporating all genes, was generated. Furthermore, FPKM values from TCGA bulk RNA-seq data were transformed into TPM values. Subsequently, cell proportions of TCGA samples were evaluated using CIBERSORTx with bulk RNA-seq data. Finally, batch correction using the S-mode was employed to address cross-platform variation in the deconvolution of TCGA RNA-seq data.
Ecotype discovery and recovery
EcoTyper, an integrated machine learning framework, was utilized to investigate the TME through bulk RNA sequencing or scRNA-seq. EcoTyper was employed for the de novo discovery of cell states and cell communities in scRNA-seq data. Applying non-negative matrix factorization (NMF) on correlation matrices, distinct cell states were successfully identified. To ensure the reliability of the results, we validated the extracted cell state information through cell state rediscovery on expression matrices, incorporating an adaptive false positive index (AFI) for quality control. Additionally, the co-occurrence patterns of cell states were analyzed to identify specific cell communities within the TME.
Recognition of important co-expression modules
Weighted Correlation Network Analysis (WGCNA) was applied to explore the gene co-expression networks in the context of GC. The "WGCNA" package in R was utilized to construct the sample tree, enabling the identification of outlying samples. Subsequently, a WGCNA was built using the GC expression matrix, and the adjacency matrix was transformed into the topology overlapping matrix (TOM). To achieve a scale-free network, a soft-thresholding power of R2 = 0.90 was applied. Dynamic Tree Cut was then utilized to generate co-expression modules, with a minimum module size (minModuleSize) set to 100.
Construction and validation of Ecotype risk features
The core genes of the tumor EcoTyper were determined by identifying the intersection between the results of WGCNA analysis and the EcoTyper analysis using scRNA-seq data. Subsequently, a LASSO regression analysis was conducted using the "glmnet" R package, employing the "cv.glmnet" function for 10-fold cross-validation of prognostic genes. Based on the λ value and the coefficient of the most suitable gene, the risk score was calculated using the following formula:
$$\text{R}\text{i}\text{s}\text{k} \text{s}\text{c}\text{o}\text{r}\text{e}={\sum }_{1}^{\text{n}}{\text{K}}_{\text{n}}\text{*}{\text{A}}_{\text{n}}$$
A is the expression of genes; Kn is the regression coefficient of prognosis-related genes; n is the number of related genes.
The patients were divided into high-risk (HR) and low-risk (LR) groups using the median score as the cutoff. Survival analysis was conducted using the "survminer" package, and the predictive performance of this model was evaluated using receiver operating characteristic (ROC) curve analysis.
Independent prognostic analysis and nomogram construction
To assess the independent prognostic value of the EcoTyper model score, we conducted uniCox and multiCox analyses. A column chart was generated using the "rms" R package to predict the overall survival of clinical patients at 1, 3, and 5 years, incorporating age, grade, gender, stage, T-stage, and the EcoTyper risk score as predictors. The calibration analysis was confirmed the accuracy of the predictions obtained from the column chart.
Functional enrichment analysis
GSEA analysis was conducted using the "ClusterProfiler" R package. The "c2. cp. kegg. v7.4. symbols. gmt" in MSigDB was used to obtain differences in enrichment pathways between different risk groups.
Immunotherapy prediction and chemotherapy sensitivity analysis
Utilizing TIDE (http://tide.dfci.harvard.edu/login/), the immunological dysfunction and exclusion mechanisms in GC was investigated. The evaluation of immunological responses was based on well-established immune checkpoints (ICPs). Additionally, we utilized the "oncopredict" package to predict the half maximal inhibitory concentration (IC50) values of candidate drugs with potential therapeutic efficacy against GC in both high-risk and low-risk groups.
Cell culture
Human normal gastric epithelial cells (GES-1) and GC cell lines (AGS, HGC27 and MKN28) were purchased from Beijing Institute for Cancer Research (Beijing, China) and Cell Bank of Chinese Academy of Sciences (Shanghai, China). These cells were maintained in DMEM (GES-1), Ham F12 (AGS), MEM (HGC27) and RPMI 1640 (MKN28) containing 10% fetal bovine serum (FBS, Gibco, Grand Island, NY, USA), 1% penicillin-streptomycin (Solarbio, Beijing, China) in a humidified atmosphere of 5% CO2 at 37°C.
Cell transfection
Small interfering RNA (siRNA) specifically targeting TGM2 was constructed by Genepharm (Shanghai, China), and used for TGM2 silencing (si-TGM2). Scrambled siRNA served as control. MKN28 cells were transfected with si-TGM2 or si-NC applying Oligofectamine™ reagent (Thermo Fisher Scientific, San Jose, CA, USA).
Quantitative real-time PCR (qRT-PCR)
To extract total RNA from cells, cells were subjected to homogenization with 1 mL of Trizol reagent (Thermo Fisher Scientific). RNA was served as template to synthesized cDNA utilizing SuperScript Reverse Transcription Kit (Invitrogen, Carlsbad, CA, USA). PCR reaction was carried out on a 7500 Fast RT-PCR System using an SYBR Green Master Mix (Applied Biosystems, Foster City, CA, USA). Data were analyzed by 2−ΔΔCt method.
CCK-8 assay
Applying Cell Counting Kit-8 (Beyotime, Shanghai, China), proliferation of MKN28 cells was examined. Cells were incubated in an incubator for 24 h. Cells were then into 96-well plate at a density of 2000 cells/100 µL, and incubated with 10 µL CCK-8 reagent for 1 h. Finally, the absorbance value of each well at 450 nm was detected on a Multiskan SkyHigh Microplate Reader (Thermo Fisher Scientific).
Wound healing assay
MKN28 cells were seeded into 6-well plates at a concentration of 5×105/well and cultured at 37°C until 100% confluence. An artificial and straight scratch was created utilizing a 200-µL pipette tip. Then, the suspended cells were removed by washing with PBS. Cells were cultured in serum-free medium for 24 h. At 0 and 24 h after cell culture, wound closure images were captured.
Transwell invasion assay
Transwell invasion assay was carried out to detect cell invasion of MKN28 cells applying a 24-well Transwell plate chamber with 8 µm pore size (Corning Costar, Cambridge, MA, USA). MKN28 cells were seeded into the upper chamber coated with Matrigel. The lower chamber contained FBS-free RPMI 1640 medium. Cells were cultured at 37°C for 24 h, and then the cells on the upper chamber were wiped off with a cotton swab. The invasive cells were fixed with methanol for 10 min and stained with 0.1% crystal violet. Finally, the invasive cells were observed under an optical microscope (Olympus, Tokyo, Japan).
Statistical analysis
Python 3.9 and R software version 4.2.2. were used to perform statistical analysis. p-values and false discovery rate (FDR) q-values less than 0.05 were regarded as significant difference. All cell experiments were run in triplicates and the results were presented as mean ± SD, and analyzed by SPSS 22.0 statistical software (IBM, Armonk, NY, USA). The statistical difference was carried out by applying Student’s t-test and one-way ANOVA. P˂0.05 was considered statistically significant.