Identification of the DEGs between the response group and the non-response group of CCRT therapy in LACC
In order to study the difference of gene expression between the CR and NR groups for LACC, the limma package was used to estimate the fold changes and standard errors by fitting a linear model for each gene for the assessment of differential expression. As shown in Fig. 1a, we set the cut-off criteria as p-value <0.05 and |log2FC| > 0.5 to screen the DEGs. There were 1358 significant differential genes in GSE56303-GPL10191, including 860 up-regulated genes (p-value < 0.05 and log2 fold change >0.5) and 498 down-regulated genes (p-value < 0.05 and log2 fold change <-0.5). There are 6048 differential genes in GSE56303-GPL16025, among which 3204 are up-regulated genes and 2844 are down-regulated. There are 4214 differentially expressed genes in GSE56363, including 2133 up-regulated genes and 2081 down-regulated genes. In summary, 580 up-regulated genes and 153 down-regulated genes represent the differentially regulated genes in two or more datasets (Fig. 1b). We used the cumulative hypergeometric distribution function to estimate if the up-regulated genes were statistically significant. The results show that the adjusted P values of the overlapping genes in GSE56306-GPL10191 and GSE56306-GPL16025 is 1.515894E-73, the adjusted P values of the overlapping genes in GSE56363 and GSE56306-GPL10191 is 4.30E-09, the adjusted P values of the overlapping genes in GSE56363 and GSE56306-GPL10191 is 7.384116E-146. In addition, the cumulative hypergeometric distribution function was used to estimate if the down-regulated genes were statistically significant. The results show that the adjusted P values of the overlapping genes in GSE56306-GPL10191 and GSE56306-GPL16025 is 2.024604E-30, the adjusted P values of the overlapping genes in GSE56306-GPL10191 and GSE56363 is 1, the adjusted P values of the overlapping genes in GSE56306-GPL16025 and GSE56363 is 0. Therefore, the up-regulated genes were statistically significant, 580 up-regulated genes were selected for further analysis.
DEGs were enriched to key functions and pathways in LACC
In order to understand the functional roles of DEGs, GO and KEGG pathway enrichment analyses were performed. The GO analysis of the 580 up-regulated DEGs was mainly enriched for the BP terms cell adhesion and extracellular matrix organization. GO analysis in the category CC showed that the DEGs were mainly accumulated in the extracellular matrix, extracellular exosome, membrane. The MF of the DEGs was mainly related to protein binding and calcium ion binding (Fig. 2a). Additionally, Up-regulated DEGs were significantly associated with Human papillomavirus infection (Fig S1), Focal adhesion, Lysosome, Hippo signaling pathway, Protein processing in endoplasmic reticulum, and ECM-receptor interaction signaling pathways (Fig. 2b). We showed the correlation between genes and pathways (Fig. 2c,Table S2). Furthermore, we analyzed the correlation between pathways and pathways, and we found that Human papillomavirus infection, Focal adhesion, and ECM-receptor interaction signaling pathways were correlated (Fig. 2d). In addition, we performed GO and KEGG enrichment analysis for each cohort (GSE56303-GPL10191, GSE56303-GPL16025, and GSE56363). The GO analysis of GSE56303-GPL10191 were mainly enriched in transcription factor activity, RNA polymerase II proximal promoter sequence-specific DNA binding, transport vesicle, and positive regulation of cell adhesion. The GO analysis of GSE56303-GPL16025 was mainly enriched in ATPase activity, chromosomal region, and DNA replication. The GO analysis of GSE56363 was mainly enriched in cell adhesion molecule binding, extracellular matrix, and extracellular matrix organization (Fig S2). Further, from the KEGG pathway enrichment analysis, we found that these 3 cohorts were mainly enriched in the pathway of the Human papillomavirus infection (Fig S3).
Identification of genes associated with clinical features based on WGCNA in LACC
In order to understand the relationship between DEGs and clinical features, we combine the data into one dataset, genes with p-value < 0.05 and log2 fold change >0.5 were involved to perform the WGCNA analysis by using WGCNA package in R software. We have chosen the soft threshold power 6 to define the adjacency matrix based on the criterion of approximate scale-free topology (Fig S4). The network and the 4 identified modules are depicted in Fig. 3a, b. Fig. 3c provides an alternate visualization of the module structure via a multi-dimensional scaling plot (standard R function cmdscale). Furthermore, we performed cluster analysis, Fig. 3d depicts the eigengene network using a dendrogram (hierarchical cluster tree). In general, 4 clusters were grouped into two clusters, the blue and grey modules are highly correlated. Combined with Fig. 3e, the blue modules were related to the therapeutic effect. The blue module has a negative correlation with CR (Correlation =-0.2 and p value=0.04) and a positive correlation with NR (Correlation =0.2 and p value=0.04). To better understand the relationships that exist between disease associated genes. We constructed PPI networks from genes enriched in six pathways (Table S2) and genes obtained by WGCNA analysis (Table S3). The analysis was performed using the STRING database and Cytoscape. PPI network contains 52 nodes and 158 edges (Fig. 3f). A total of 20 hub genes were screened out, the ranking and scores of genes were listed in Table S4, these 20 hub genes were chosen for further analysis.
The key genes were significantly down-regulated and positively correlated in LACC
In order to explore the expression patterns of the hub genes in tumors and normal tissues, the GEPIA 2 was used for detection. Compared with healthy people, the expression levels of COL1A1, COL6A1, COL6A2, LAMA4, COL6A3, LAMC1, HSPG2, ITGA9, CTGF, and PDGFRB in cervical cancer patients were significantly decreased. In addition, the expression levels of PDIA6, RPN1, and RPN2 were higher in cancer patients than in normal subjects (Fig. 4a). Furthermore, COL1A1, COL6A1, COL6A2, LAMA4, COL6A3, HSPG2, ITGA9, CTGF, and PDGFRB had significant positive correlations among each other (Fig. 4b).
Ten Key Genes Dysregulated Upon Chemoradiotherapy
Through ROC curve analysis, we found that COL1A1, COL6A1, COL6A2, LAMA4, COL6A3, LAMC1, HSPG2, ITGA9, CTGF, and PDGFRB had better diagnostic performance in distinguishing responders from non-responders in datasets of GSE56303-GPL16025 and GSE56363. The area under the curve values of these genes was all greater than 0.7 in GSE56303-GPL16025 and GSE56363 (Fig. 5a). More importantly, we compared the changes in the expression levels of these 10 key genes after radiotherapy or chemoradiotherapy, and we found that the expression levels of COL1A1, COL6A1, COL6A2 were significantly increased in cervical cancer patients after radiotherapy or chemoradiotherapy (Fig. 5b).
Six key genes were highly expressed in the Progenitor cells of cervical tissue
In order to understand the gene expression at the single-cell level, we performed the single-cell analysis of a cervical sample. The sample contained 10,000 cells, the filtering parameters were set to nFeature_RNA > 200 & nFeature_RNA < 2000 &Percent.mt< 40 & nCount_RNA < 4000, and 9955 cells were included in the analysis (Fig S5a). We selected 20 principal components (PCs) with an estimated p-value < 0.05 for subsequent analysis (Fig S5b). Afterward, the uniform manifold approximation and projection (UMAP) algorithm was applied, and cells in the cervix were successfully classified into 12 separate clusters (Fig. 6a). We showed the expression levels of the top 5 marker genes in each cluster with a heat map (Fig S5c). According to the expression patterns of the marker genes, these clusters were annotated by singleR, CellMarker, and PanglaoDB (Fig. 6b), the cell marker genes used for cluster annotation are shown in Table S5. Cluster 0, 3, and 4, containing 4126 cells, were annotated as Fibroblasts; clusters 1 and 2, containing 3215, were annotated as Smooth muscle cells; cluster 5, containing 657 cells, was annotated as Tissue stem cells; cluster 6, containing 521 cells, was annotated as Endothelial cells; cluster 7, containing 485 cells, was annotated as Progenitor cells; cluster 8, containing 389 cells, was annotated as Epithelial cells; cluster 9, containing 217 cells, was annotated as T cells; cluster 10, containing 182 cells, was annotated as Basal cells; cluster 11, containing 116 cells, was annotated as Macrophages; cluster 12, containing 47 cells, was annotated as Mast cells. We analyzed the expression of 10 key genes in cervical sample cells, and the results showed that COL1A1, COL6A1, COL6A2, COL6A3, CTGF, and PDGFRB were highly expressed in progenitor cells (cluster 7) (Fig. 6c).