3.1 Changing trends in the stroma and CAFs
First, we used ESTIMATE and EPIC to approximately calculate the infiltration of the stroma, CAFs, and immune cells in KIRC. The ESTIMATE result found that tumor tissues had a higher level of stromal and immune infiltration than normal tissues (Figure 1A). In EPIC, the proportion of CAFs showed a tendency to increase with tumor stage and grade progression, but the proportion of CD4+ and CD8+ T cells had no significant change in KIRC (Figure 1B). To visualize the scale changes in all cells in EPIC, we show the proportion of CAFs with the progression of stage and grade using the average fraction pie chart (Figure 1C). Based on the above results, the proportion of CAFs increases during tumor progression, or can be considered as an increase in the rate of proliferation of CAFs (Figure 2).
3.2 Rough CAFs and survival
Because the CAF fraction of EPIC is estimated based on gene signatures from other tissues, we suspected that this fraction may contain different cells and defined them as rough CAFs. In the prognosis significance of the Cox hazards model analysis of rough CAFs, it was found that the rough CAFs were associated with prognosis, but they were not an independent prognosis factor (Table 1). Age, grade, stage, and endothelial cells were all independent prognostic factors. Endothelial cells were associated with good prognosis (Table 1).
3.3 Identification of gene modules correlated to CAFs
TCGA KIRC sequencing samples provided WGCNA with 530 cases to effectively and objectively identify cell-specific gene sets. We chose β = 6 (no scale R2 = 0.814) as a soft threshold to construct a scale-free network. Three modules had a correlation with the CAF fraction of about 0.7. Among them, darkturquoise and lightgreen had a very close relationship in the cluster tree, which could originate from the identical type of cells, but the grey60 module was clearly not homologous to them (Figure 3A). According to the canonical cell types in normal human kidney tissue, with the exception of epithelial cells of different microanatomical regions and immune cells, the remaining cells are mainly vascular endothelial cells, fibroblasts, and myofibroblasts[8]. We compared the endothelium, fibroblast, and myofibroblasts markers[8], with MM.cor of different modules. The results show that the darkturquoise and lightgreen modules are closely related to fibroblasts, and that the grey60 and royalblue modules are closely related to myofibroblasts and endothelial cells, respectively (Figure 3B).
3.4 Markers recognized by single-cell sequencing technology and WGCNA
In order to confirm that WGCNA can identify cell-specific genes, we compared the differences in the identified genes. The markers identified by single-cell sequencing technology can be considered as the current standard for identifying cell-specific expressed genes. Genes specifically expressed by endothelial cells and fibroblasts could be found in the RCC single-cell sequencing results (Figure 3C)[8]. False discovery rate (FDR) < 0.05 is considered significant for the cell-specific expression of genes. Since there are too many endothelial markers, we selected the 500 most specific markers (smallest FDR) for comparison. According to the results shown in Figure 3A and Figure 3B, the royalblue module defines the specific endothelial gene set, and the darkturquoise and lightgreen modules define the specific fibroblast gene set. Although the intersection of the single-cell markers and module genes is not ideal, more than 85% of the representative genes (GS.cor > 0.6 and MM.cor > 0.6) in the module are single-cell markers (Figure 3D).
3.5 WGCNA can recognize cell-specific genes
Through the comparison of single-cell markers, the representative genes in the WGCNA recognition module can be regarded as cell-specific genes. However, not all cells have cell-specific gene sets, which can be illustrated by the WGCNA KIRC results. We only identified five specific gene modules for cells (Figure 3A). CAFs contain two kinds of cells, myofibroblasts and fibroblasts. According to the clustering relationship of modules shown in Figure 2A and the correlation between markers and modules shown in Figure 2B, the grey60 module is a representative gene set of myofibroblasts, and the darkturquoise and lightgreen modules are representative gene sets of fibroblasts. The WGCNA calculation principle is to identify highly co-expressed gene sets, while the expression of cell-specific genes changes according to the change in the proportion of cells. Although other cells may interfere with the expression of these genes, the impact on their highly co-expressed relationship is limited. We have drawn a schematic diagram to show this WGCNA function (Figure 4A). Obtaining cell-specific gene sets relies on a large number of samples with high tumor heterogeneity and cells with specific expression genes.
3.6 Function of cell-specific modules
After analyzing and identifying cell-specific genes on the principle of high co-expression, we used Metascape to further verify the function of these specific genes. The modules representing five types of cell-specific expressed genes were enriched and showed their respective functions. For example, fibroblasts and myofibroblasts, both of which belong to CAFs, are functionally similar, but the latter are more abundant in muscle-related functions (Figure 4B).
3.7 Fibroblast and myofibroblast markers - module representative genes
The representative genes in the module serve as specific markers for fibroblasts and myofibroblasts. In order to ensure the quantity and quality of the markers, we chose GS.cor > 0.6 and MM.cor > 0.6 as the screening conditions (Figure 5A). The lightgreen and darkturquoise fibroblast modules contain a total of 37 genes, including one non-coding protein gene. The grey60 myofibroblast module contains 24 genes. To verify the specificity of the 36 protein-coding fibroblast genes, we used Cancer Cell Line Encyclopedia data. Regarding the possible cell types in KIRC, it was found that the expression of 30 out of 36 genes was significantly higher in fibroblasts than in other cells (Figure 5B). We defined the 36 genes as signatures of CAFs in KIRC.
Among the CAF markers and CAF-specific markers, there were six and three intersections with the signature of CAFs in KIRC, respectively. The 36 key genes showed very high co-expression (Figure 5C). CAF-specific markers showed significantly higher correlation with the signature than non-specific markers did. In CAF-specific markers, FN1 and TNC were not correlated with multiple genes in the signature. Therefore, the CAF markers exhibit tissue specificity.
3.8 Fibroblasts in CAFs – a prognostic factor
Although rough CAFs were related to prognosis, they were not an independent prognostic factor. Now, we have identified two cellular components in rough CAFs estimated by EPIC, fibroblasts and myofibroblasts. Using representative genes calculated by WGCNA as gene signatures, ssGSEA was employed to score the infiltration level of fibroblasts and myofibroblasts in each sample (Figure 6A). According to the redefinition of CAFs as fibroblasts and myofibroblasts, the Cox risk regression analysis suggests that fibroblasts in KIRC are independent prognostic factors (Table 2).
3.9 Fibroblasts in CAFs and related clinicopathological parameters
Fibroblasts in CAFs are indeed related to tumor progression in KIRC. The ssGSEA score of fibroblasts was grouped according to the median of the corresponding cases of different clinicopathological parameters. In TCGA, increased fibroblast infiltration is related to tumor tissue, age < 60, male, stage III & IV, and Grade 3 & 4 parameters (Figure 6B). Moreover, in the data verification of series GSE29609 and GSE53757, we used the same gene signatures to score fibroblast infiltration (Figure 6C). In GSE29609, highly infiltrating fibroblasts were related to a worse prognosis (Figure 6D). Regarding clinicopathological parameters, a higher fibroblast count is associated with pathological grade 3 & 4 and tumor tissue (Figure 6E). Although the TNM staging and fibroblast infiltration level did not show significant differences, the Odds Ratio (OR) value shows that their trend is consistent with the KIRC stage data in TCGA. However, gender and age do not show a clear relationship with fibroblasts.