Data collection and processing
Whole-gene mRNA expression matrices, clinical data (phenotypes and overall survival), and single-nucleotide polymorphism mutation data of Cancer Genome Atlas (TCGA)-LUAD samples were downloaded from the UCSC website (https://xenabrowser.net/). In total, 513 tumor samples and 59 normal samples were obtained, and samples with missing information were excluded. The GSE117570 dataset, which comprises 10× scRNA-seq data of 11,485 cells, was obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/), and data on P1, P3, and P4 were selected for analysis. Expression data of signature genes in particular cells were downloaded from the Tumor Immune Single-cell Hub database (http://tisch.comp-genomics.org/home/). Two GEO datasets (GSE31210 and GSE13213) with complete mRNA expression and clinical information were obtained from the GEO database to construct the validation cohort. Genes upregulated in GR were obtained from the Gene Set Enrichment Analysis (GSEA) database (https://www.gsea-msigdb.org/gsea/index.jsp, “COLDREN_GEFITINIB_RESISTANCE_UP” dataset).
scRNA-seq data collection and analysis
The R package Seurat was used to convert raw scRNA-seq data into Seurat objects, followed by quality control and the exclusion of unqualified cells. The gene expression in core cells was standardized using a linear regression algorithm, and the top 2,000 highly variable genes were identified using analysis of variance. Uniform manifold approximation and projection was used to identify cell subpopulations. The FindAllMarkers function in the R package Seurat and the COSG R package were employed to identify marker genes for all cell clusters and differentially expressed genes among the screened marker genes. The R package clusterProfiler was used for Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome enrichment analyses of top 100 marker genes in each cell cluster. A single-sample GSEA score was calculated for each cell cluster based on the expression levels of GR-related genes, and all cell samples were divided into high- and low-score groups based on the median score. Data of 50 hallmark gene sets were downloaded from the GSEA database and the R package GSVA was employed to calculate hallmark scores for each cell cluster.
Identification and validation of a prognostic signature
Univariate Cox proportional risk regression analysis was conducted on GR-related genes in the TCGA-LUAD training cohort, using p < 0.05 as the screening criterion. Ten classical algorithms (single or combined), including random forest, least absolute shrinkage and selection operator, gradient boosting machine, survival support vector machine, supervised principal components, ridge regression, partial least squares regression for Cox, CoxBoost, Stepwise Cox, and elastic network, were used to construct a prognostic signature based on the screened genes, and the C-index was calculated for all signatures to select the most suitable one. The R packages survival, pROC, and ggplot2 were used for survival analysis and time-dependent receiver operating characteristic (ROC) analysis using the TCGA-LUAD training cohort, GSE13213 and GSE31210 validation cohorts, and their combination.
Prognostic analysis of risk groups and clinical phenotypes
The proportions of clinical phenotype (sex, T stage, N stage, M stage, stage, recurrence, metastasis, and overall survival) subcategories in the high- and low-risk groups were calculated, and resistance scores were systematically compared among the subcategories. Univariate and multivariate Cox regression analyses were used to assess the influence of the prognostic signature and other common clinical phenotypic features on the prognosis of patients with LUAD.
Functional enrichment analysis
Correlation coefficients between the resistance score and all genes were calculated, and the 100 genes with the highest positive and negative correlations were screened out. The R package clusterProfiler was used for Gene Ontology (GO), KEGG, and Reactome functional enrichment analyses of the screened genes. After calculating the score for each sample based on the constructed signature, all samples were divided into high- and low-risk groups, and GSEA was applied to the differentially expressed genes between the two groups. The cancer-related hallmark gene set file (h.all.v2023.2.Hs.symbols.gmt) was downloaded from the Molecular Signatures Database (https://www.gsea-msigdb.org/gsea/msigdb). The R package GSEA (v1.38.2) was used to calculate the normalized enrichment score and false discovery rate, and the R package ggplot2 (v3.3.6) was used to display the GSEA results.
Analysis on tumor immune microenvironment (TIME) and tumor stemness
LUAD samples were ordered according to stemness indices (mRNAsi and mDNAsi) and their overall survival and distribution of high- and low-risk groups were shown. Pearson correlation coefficients between stemness indices and resistance scores were calculated. Immune-cell infiltration was analyzed using the MCPCOUNTER, EPIC, XCELL, CIBERSORT, IPS, QUANTISEQ, ESTIMATE, and TIMER algorithms in the TIMER database (https://cistrome.shinyapps.io/timer/), and the infiltration scores of all types of immune cells were compared between the high- and low-risk groups using the R package IOBR. Multiple immune microenvironment-related indices, including the microenvironment, ESTIMATE, immune, and stroma scores, were calculated and compared between the high- and low-risk groups. The expression levels of various chemokines and their receptors were compared between the high- and low-risk groups, and Pearson correlation coefficients between the expression levels of significantly differentially expressed chemokines and the resistance scores were calculated.
Analysis of drug efficacy prediction in gefitinib-resistant populations
Data of patients who received immunotherapy was downloaded from the Tumor Immune Dysfunction and Exclusion (TIDE) database (http://tide.dfci.harvard.edu), and true or false response to immunotherapy and TIDE and exclusion scores were compared between the high- and low-risk groups. GSE135222, an anti-programmed cell death protein 1 (PD-1)/PD-1 ligand (PD-L1) immunotherapy dataset from GEO, was used to evaluate the relationship between the GR signature and LUAD immunotherapy response. Treatment data of several non-immune chemotherapy drugs were downloaded from the Genomics of Drug Sensitivity in Cancer database (https://www.cancerrxgene.org/), and the R package oncoPredict was employed to evaluate the drug half-maximal inhibitory concentration (IC50) values for each sample.
Cell culture and reagents
PC-9 normal NSCLC cells were obtained from the Cell Bank of the Chinese Academy of Science (Shanghai, China). PC-9 gefitinib-resistant cells (strain CTCC-ZHYC-0141) were purchased from Meisen Cell Technology (Zhejiang, China). All cells were maintained at 37 °C in a 5% CO2 humidified atmosphere. PC-9 normal and gefitinib-resistant cells were cultured in RPMI-1640 medium supplemented with 10% fetal bovine serum and penicillin-streptomycin (100 U/ml and 100 mg/ml, respectively) (all from Thermo Fisher Scientific China, Shanghai, China). Before the start of this study, the resistance of the PC-9 resistant strain was confirmed using cell viability assays. The PC-9 gefitinib-resistant strain was maintained in medium supplemented with gefitinib at concentrations of 2.5 mM, 5 mM, and 10 mM for the first to third generations, respectively, and 10 mM from the fourth generation onward. Gefitinib (184475-35-2), AZD 7762 (1246094-78-9), and BI.2536 (755038-02-9) were purchased from Sigma-Aldrich China (Shanghai, China) and dissolved in dimethylsulfoxide at a stock concentration of 50 mM. All drugs were stored at –20 °C in 5-mL aliquots.
RNA extraction and reverse transcription quantitative (RT-q)PCR
Resistance gene expression in PC9 normal and gefitinib-resistant cells was assessed using RT-qPCR. The gene-specific qPCR primers are listed in Table S1. Total RNA was extracted from the cells using TRIzol reagent (Invitrogen Life Technologies) and reverse-transcribed into cDNA using a Takara kit (Takara Biotechnology). qPCR analysis was performed using SYBR Green master mix (Thermo Scientific).
Cell proliferation assay
CCK8 kits purchased from Sigma-Aldrich China (Shanghai, China) were used to detect the inhibitory effects of gefitinib, AZD7762, and BI.2536 on the viability of PC-9 normal and gefitinib-resistant cells. The cells were seeded in 96-well culture plates at 6 × 103 cells/well, cultured for 24 h, and exposed to increasing concentrations of gefitinib, AZD7762, and BI.2536 for an additional 24 h. Ten microliters of CCK-8 solution was added to each well, and the plates were incubated for 2 h. The absorbance at 450 nm was measured using a microplate reader (Thermo Scientific).
Statistical analysis
Data are presented as mean ± SD or SEM and were analyzed using SPSS 26.0. Differences were considered significant at *p < 0.05, **p < 0.01, and ***p < 0.001. The R packages used were mentioned above. Plots were generated using R Studio 4.3.3 or GraphPad Prism 8.0.
Data collection and processing
Whole-gene mRNA expression matrices, clinical data (phenotypes and overall survival), and single-nucleotide polymorphism mutation data of Cancer Genome Atlas (TCGA)-LUAD samples were downloaded from the UCSC website (https://xenabrowser.net/). In total, 513 tumor samples and 59 normal samples were obtained, and samples with missing information were excluded. The GSE117570 dataset, which comprises 10× scRNA-seq data of 11,485 cells, was obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/), and data on P1, P3, and P4 were selected for analysis. Expression data of signature genes in particular cells were downloaded from the Tumor Immune Single-cell Hub database (http://tisch.comp-genomics.org/home/). Two GEO datasets (GSE31210 and GSE13213) with complete mRNA expression and clinical information were obtained from the GEO database to construct the validation cohort. Genes upregulated in GR were obtained from the Gene Set Enrichment Analysis (GSEA) database (https://www.gsea-msigdb.org/gsea/index.jsp, “COLDREN_GEFITINIB_RESISTANCE_UP” dataset).
scRNA-seq data collection and analysis
The R package Seurat was used to convert raw scRNA-seq data into Seurat objects, followed by quality control and the exclusion of unqualified cells. The gene expression in core cells was standardized using a linear regression algorithm, and the top 2,000 highly variable genes were identified using analysis of variance. Uniform manifold approximation and projection was used to identify cell subpopulations. The FindAllMarkers function in the R package Seurat and the COSG R package were employed to identify marker genes for all cell clusters and differentially expressed genes among the screened marker genes. The R package clusterProfiler was used for Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome enrichment analyses of top 100 marker genes in each cell cluster. A single-sample GSEA score was calculated for each cell cluster based on the expression levels of GR-related genes, and all cell samples were divided into high- and low-score groups based on the median score. Data of 50 hallmark gene sets were downloaded from the GSEA database and the R package GSVA was employed to calculate hallmark scores for each cell cluster.
Identification and validation of a prognostic signature
Univariate Cox proportional risk regression analysis was conducted on GR-related genes in the TCGA-LUAD training cohort, using p < 0.05 as the screening criterion. Ten classical algorithms (single or combined), including random forest, least absolute shrinkage and selection operator, gradient boosting machine, survival support vector machine, supervised principal components, ridge regression, partial least squares regression for Cox, CoxBoost, Stepwise Cox, and elastic network, were used to construct a prognostic signature based on the screened genes, and the C-index was calculated for all signatures to select the most suitable one. The R packages survival, pROC, and ggplot2 were used for survival analysis and time-dependent receiver operating characteristic (ROC) analysis using the TCGA-LUAD training cohort, GSE13213 and GSE31210 validation cohorts, and their combination.
Prognostic analysis of risk groups and clinical phenotypes
The proportions of clinical phenotype (sex, T stage, N stage, M stage, stage, recurrence, metastasis, and overall survival) subcategories in the high- and low-risk groups were calculated, and resistance scores were systematically compared among the subcategories. Univariate and multivariate Cox regression analyses were used to assess the influence of the prognostic signature and other common clinical phenotypic features on the prognosis of patients with LUAD.
Functional enrichment analysis
Correlation coefficients between the resistance score and all genes were calculated, and the 100 genes with the highest positive and negative correlations were screened out. The R package clusterProfiler was used for Gene Ontology (GO), KEGG, and Reactome functional enrichment analyses of the screened genes. After calculating the score for each sample based on the constructed signature, all samples were divided into high- and low-risk groups, and GSEA was applied to the differentially expressed genes between the two groups. The cancer-related hallmark gene set file (h.all.v2023.2.Hs.symbols.gmt) was downloaded from the Molecular Signatures Database (https://www.gsea-msigdb.org/gsea/msigdb). The R package GSEA (v1.38.2) was used to calculate the normalized enrichment score and false discovery rate, and the R package ggplot2 (v3.3.6) was used to display the GSEA results.
Analysis on tumor immune microenvironment (TIME) and tumor stemness
LUAD samples were ordered according to stemness indices (mRNAsi and mDNAsi) and their overall survival and distribution of high- and low-risk groups were shown. Pearson correlation coefficients between stemness indices and resistance scores were calculated. Immune-cell infiltration was analyzed using the MCPCOUNTER, EPIC, XCELL, CIBERSORT, IPS, QUANTISEQ, ESTIMATE, and TIMER algorithms in the TIMER database (https://cistrome.shinyapps.io/timer/), and the infiltration scores of all types of immune cells were compared between the high- and low-risk groups using the R package IOBR. Multiple immune microenvironment-related indices, including the microenvironment, ESTIMATE, immune, and stroma scores, were calculated and compared between the high- and low-risk groups. The expression levels of various chemokines and their receptors were compared between the high- and low-risk groups, and Pearson correlation coefficients between the expression levels of significantly differentially expressed chemokines and the resistance scores were calculated.
Analysis of drug efficacy prediction in gefitinib-resistant populations
Data of patients who received immunotherapy was downloaded from the Tumor Immune Dysfunction and Exclusion (TIDE) database (http://tide.dfci.harvard.edu), and true or false response to immunotherapy and TIDE and exclusion scores were compared between the high- and low-risk groups. GSE135222, an anti-programmed cell death protein 1 (PD-1)/PD-1 ligand (PD-L1) immunotherapy dataset from GEO, was used to evaluate the relationship between the GR signature and LUAD immunotherapy response. Treatment data of several non-immune chemotherapy drugs were downloaded from the Genomics of Drug Sensitivity in Cancer database (https://www.cancerrxgene.org/), and the R package oncoPredict was employed to evaluate the drug half-maximal inhibitory concentration (IC50) values for each sample.
Cell culture and reagents
PC-9 normal NSCLC cells were obtained from the Cell Bank of the Chinese Academy of Science (Shanghai, China). PC-9 gefitinib-resistant cells (strain CTCC-ZHYC-0141) were purchased from Meisen Cell Technology (Zhejiang, China). All cells were maintained at 37 °C in a 5% CO2 humidified atmosphere. PC-9 normal and gefitinib-resistant cells were cultured in RPMI-1640 medium supplemented with 10% fetal bovine serum and penicillin-streptomycin (100 U/ml and 100 mg/ml, respectively) (all from Thermo Fisher Scientific China, Shanghai, China). Before the start of this study, the resistance of the PC-9 resistant strain was confirmed using cell viability assays. The PC-9 gefitinib-resistant strain was maintained in medium supplemented with gefitinib at concentrations of 2.5 mM, 5 mM, and 10 mM for the first to third generations, respectively, and 10 mM from the fourth generation onward. Gefitinib (184475-35-2), AZD 7762 (1246094-78-9), and BI.2536 (755038-02-9) were purchased from Sigma-Aldrich China (Shanghai, China) and dissolved in dimethylsulfoxide at a stock concentration of 50 mM. All drugs were stored at –20 °C in 5-mL aliquots.
RNA extraction and reverse transcription quantitative (RT-q)PCR
Resistance gene expression in PC9 normal and gefitinib-resistant cells was assessed using RT-qPCR. The gene-specific qPCR primers are listed in Table S1. Total RNA was extracted from the cells using TRIzol reagent (Invitrogen Life Technologies) and reverse-transcribed into cDNA using a Takara kit (Takara Biotechnology). qPCR analysis was performed using SYBR Green master mix (Thermo Scientific).
Cell proliferation assay
CCK8 kits purchased from Sigma-Aldrich China (Shanghai, China) were used to detect the inhibitory effects of gefitinib, AZD7762, and BI.2536 on the viability of PC-9 normal and gefitinib-resistant cells. The cells were seeded in 96-well culture plates at 6 × 103 cells/well, cultured for 24 h, and exposed to increasing concentrations of gefitinib, AZD7762, and BI.2536 for an additional 24 h. Ten microliters of CCK-8 solution was added to each well, and the plates were incubated for 2 h. The absorbance at 450 nm was measured using a microplate reader (Thermo Scientific).
Statistical analysis
Data are presented as mean ± SD or SEM and were analyzed using SPSS 26.0. Differences were considered significant at *p < 0.05, **p < 0.01, and ***p < 0.001. The R packages used were mentioned above. Plots were generated using R Studio 4.3.3 or GraphPad Prism 8.0.