Identification of fibrosis feature genes
We identified 15 single-cell clusters (Fig. 1a), i.e., clusters 0−14 (Fig. 1b). A large number of differentially expressed genes existed in each cell cluster relative to the other clusters (Fig. 1c), including a large number of lineage-specific and functional genes. We annotated 15 cell-type clusters based on their marker genes, yielding 14 cell types (Fig. 1d). The proportion of each cell type in the three samples was calculated; the results showed that myeloid and endothelial cells were dominant (Fig. 1e). Specific annotations were based on markers [18], as shown in Fig. 1f: T cells (marked with CD3D,CD3E, and CD3G), CD8T cells if both CD8A/CD8B were expressed, CD4T cells if both CD4 were expressed, NK T cells if both NK cell markers were expressed, such as KLRD1 and GNLY, plasma cell-like dendritic cells (marked with GZMB, SOX4, IRF7, LILRA4, and TCF4), endothelial cells (marked with RAMP2, PLVAP, PECAM1, FLT1, ENG, CDH5, or VWF), fibroblasts (labelled with COL1A1, COL1A2, COL3A1, DCN, or PDGFRA), VSMC (labelled with ACTA2, MYH11, TAGLN, CDKN1A, and PDGFRB), myeloid cells (marked with CD14, FCGR3A, LYZ, CD68, and C1QC), B-cells (CD79A, CD79B, CD19, and MS4A1), ciliated airway epithelial cells (marked with EPCAM and TPPP3) and secretory club cells (marked with SCGB1A1). We performed and visualised the Cellchat analysis (Fig. 2a–b). Incoming and outgoing signals were mediated by different ligands and receptors and differed between cell types (Fig. 2c). Notably, many signals were transmitted to endothelial cells, and many signalling pathways emanated from secretory club cells and fibroblasts (Fig. 2d). Endothelial cells received signals mainly through the chemokine pathway (Fig. 2e), while fibroblasts sent signals mainly through the MK and MIF-related pathways (Fig. 2f). Notably, MK- and MIF-related pathways are important for IPF cell chat because they are involved in many cell-to-cell communication processes (Fig. 2g–h). The NMF algorithm identified incoming versus outgoing communication patterns with k=6, as determined by the cophenetic coefficient and silhouette score. We observed certain commonalities in the communication patterns across different cells. For example, in outgoing signalling, T, CD4+T, B, and ciliated airway epithelial cells shared a communication mode (Fig. 2i–n). Pseudotime analysis identified six key differentiation nodes of fibroblasts using the Monocle2 algorithm (Fig. 3a). Branches in single-cell trajectories were generated according to cell fate, and the branched expression analysis modelling function was used to identify genes with branch-dependent expression (Fig. 3b). Genes that were highly expressed in fibroblasts intersected with first differentiation node branch-dependent genes, resulting in 30 fibrosis feature genes that may be involved in key roles in fibrosis (Fig. 3c).
Expression and mutation of fibrosis feature genes in pan-cancer
To determine the expression patterns of fibrosis-related genes in cancers, we determined their expression levels. We found that fibrosis feature genes were differentially expressed in most types of cancers; that is, COL1A1, COL1A2, CTHRC1, and TNFRSF12A were predominantly highly expressed in pan-cancer, and A2M, ADH1B, C7, CFD, CYR61, and PTGDS were predominantly expressed at low levels in pan-cancer (Fig. 4a). Subsequently, the above results were fully validated based on paired differential analyses of cancer versus paracancerous samples and differential analyses of TCGA samples combined with GTEx (Fig. 4b–c). The CPTAC protein data behaved similarly at the TCGA transcriptional level (Fig. 4d). The reproducibility and consistency of such differential results have been demonstrated across multiple databases, tumors, methods, and histologies, suggesting that differences in the expression of fibrosis-related genes may play a role in different cancers. We examined SCNAs in tumor and normal tissues. We examined the percentage of SCNAs that occurred at a high frequency in most cancer types and at a low frequency in only a small number of tumors (Fig. 4e). Spearman's correlation coefficient between gene expression and SCNAs was calculated. The results showed that, in most tumors, fibrosis feature genes were significantly related to SCNAs (Fig. 4f). This suggests that copy number abnormalities in fibrosis feature genes could be common in cancers.
Survival and genetic alterations of fibrosis feature genes in cancers
Cox univariate analysis indicated that fibrosis-related gene expression was strongly associated with patient survival (Fig. 5a). Additionally, we analysed the SNV data of fibrosis feature genes to detect the frequency and type of variants in each cancer subtype. As shown in Fig. 5b, SKCM contained the highest number of mutations, and SNV percentage analysis showed that FN1, COL3A1, CFH, and COL1A2 had the highest mutation frequencies (Fig. 5c). Promoter methylation regulates the expression of genes associated with tumorigenesis. We found that fibrosis feature genes showed complex methylation patterns (Fig. 5d–e). In addition, although most tumor tissues showed insignificant methylation differences compared with normal tissues, the negative correlation between the expression of fibrosis genes such as CCL2, POSTN, CFH, and CTSK and DNA methylation suggests that these genes may be regulated by methylation.
Identification of the FPI
To explore the factors related to fibrosis feature genes, we calculated gene enrichment scores using the GSVA algorithm [19], which was used to establish the FPI model. The FPI was significantly downregulated in most cancers compared with the normal group (Fig. 6a). The distribution pattern of FPI in each organ was visualised using the ‘gganatogram’ package (Fig. 6b). Differential analysis of TCGA pairwise samples was consistent with these results (Fig. 6c). Notably, the FPI levels correlated with eight tumor stages (Fig. 6d–k), BLCA, BRCA, ESCA, KIRC, SKCM, STAD, TGCT, and THCA, with higher expression at higher stages.
Clinical significance of FPI
The role of the FPI in cancer survival was analysed to demonstrate its clinical significance. The pan-cancer survival panorama showed that FPI was associated with multiple survival in a wide range of cancer types (Fig. 7a). Moreover, the correlations were relatively heterogeneous, as, in some cases, fibrosis could act as a risk factor for several cancers. There are also some tumors in which FPI was a protective factor; for example, patients with higher FPI expression show worse survival in BLCA, COAD, ESCA, GBM, KIRC, KIRP, LGG, and STAD, and better survival in DLBC, LIHC, UCES, and UCS. This suggests that the features of fibrosis could play different roles in different cancer types. Additionally, we used forest plots to show the four survival endpoints using Cox survival analysis (Fig. 7b–e) and Kaplan−Meier curves through the log-rank test (p<0.001) (Fig. 7f).
FPI correlates with immune infiltration in cancer
Next, we explored whether fibrotic features were involved in the process of immune infiltration in pan-cancer. In the pan-cancer cohort, the FPI showed significant positive correlations with MHC molecules, immunosuppressive and immune-activating genes, chemokine receptors, and ligands (Fig. 8a). TIP analysis suggested that the high FPI level group was characterised by priming and activating cancer cell antigens (Fig. 8b). To elucidate the specific cell types regulated by FPI in the TME, we explored the correlation between FPI and immune-infiltrating cells using the TIMER2.0 database. Overall, various immune-infiltrating cells were associated with FPI, particularly tumor-associated fibroblasts, endothelial cells, T cells, and macrophages (M2 type) (Fig. 8c).
Association between FPI and cancer pathways
Many tumor-related features and pathways were associated with FPI levels, including angiogenesis, apoptosis, cell cycle, differentiation, epithelial-mesenchymal transition (EMT), invasion, proliferation, metastasis, and stemness (Fig. 9a). We obtained highly expressed genes in the high FPI group for KEGG enrichment analysis. FPI was associated with signal transduction, particularly with the cAMP, calcium, MAPK, and PI3K-AKT signalling pathways (Fig. 9b). Additionally, based on the transcriptomes of the two groups (top and bottom 30% of the fibrosis feature genes), we performed GSEA for the relevant cell signalling in cancer. Many pathways were activated, specifically the EMT, inflammatory response, IL-2, STAT5, and TNF-α signalling via the NF-kB and K-Ras signalling pathways (Fig. 9c). This partly explains the survival heterogeneity, suggesting that fibrosis may be a double-edged sword in tumors. Both induce malignant features and promote the immune reserve.
Potential of FPI in chemotherapy, immunotherapy, and small molecule drug therapy
We used the FPI levels to predict immunotherapy response in patients with urothelial carcinoma of the bladder (Fig. 10a–b); that is, the complete response/partial response (CR/PR) group had lower FPI, and more CR/PR patients were in the low FPI group, indicating that the low FPI group was more suitable for undergoing immunotherapy. Consistent results were observed in the immunotherapy cohort for cutaneous melanoma (Fig. 10c-d). In the Cellminer data, the FPI was positively correlated with the sensitivity Z-score for a large number of drugs (bleomycin, temsirolimus, staurosporine, rapamycin, and irofulven) and negatively correlated with some drugs (CUDC 305, palbociclib, docetaxel, and At-13378) (Fig. 10e). In the gene and chemotherapy analyses, we investigated the potential correlation between drug sensitivity and fibrosis feature genes using two different databases (CTRP and GDSC). Fibrosis feature genes showed different sensitivities to different drugs (Fig. 10f–g). To explore potential therapeutic options that could counteract FPI-mediated tumor promotion, we performed a CMap analysis. We constructed a signature of FPI-associated genes, including the 150 most significantly upregulated and 150 most significantly downregulated genes identified by comparing patients with high and low FPI levels in each cancer type. The FPI-associated signature was compared with the CMap gene signature using the eXtreme sum algorithm to obtain similarity scores for the 1288 compounds. W.13, Stock1n.35874, and exisulind exhibited significantly lower scores for most cancer types, suggesting that they could inhibit FPI-mediated anti-cancer effects (Fig. 10h).
Identification and expression of A2M in Lung Squamous Cell Carcinoma
Further examination of lung squamous cell carcinoma data through machine learning analysis revealed that A2M expression was consistently prominent across various models, as evidenced by residual analysis (Fig. 11a, b). Notably, A2M ranked highly in terms of expression in these computational models (Fig. 11c). Pan-cancer cohort analysis indicated that A2M was underexpressed in tumor tissues (Fig. 12a), with a consistent expression pattern observed in paired samples across different cancers (Fig. 12b). Spatial omics analysis provided insights into the spatial distribution of A2M, showing a gradient decrease in expression from normal to tumor regions in head and neck tumors, lung squamous cell carcinoma (LUSC), and ovarian cancer (OV), which aligns with the whole transcriptome data (Fig. 12c-e).
Immunohistochemical analysis of A2M, accessed through the Human Protein Atlas (HPA) database, suggested that A2M was either undetected or expressed at low levels in the majority of tumors, corroborating the transcriptional findings (Fig. 13a, b). Proteomic mass spectrometry data from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) further substantiated these observations, with A2M protein levels found to be consistently underexpressed across various tumor types (Fig. 13c).
Single-cell RNA sequencing data highlighted that A2M was predominantly expressed in normal cell types, with its tissue structure absent in tumor tissues, elucidating the reason for its low expression in tumor tissues (Fig. 14a-c). Moreover, A2M emerged as a prognostic factor for overall survival in several cancers, including lung squamous cell carcinoma (Fig. 14d, e). Finally, the gene expression levels of A2M and Gene Set Variation Analysis (GSVA) scores for 14 tumor statuses were presented (Fig. 14f), demonstrating a significant positive correlation between A2M expression and numerous malignant tumor characteristics. It is plausible to hypothesize that higher A2M expression by tumor cells may be associated with a poorer prognosis for patients with this subtype of lung squamous cell carcinoma.