Additional file 1
Table S1: The composition and number of features in each feature set.
Table S2: Summary of C-index and time-dependent AUC.
Table S3: The name and coefficient of features selected in the final image-based model.
Table S4: The likelihood ratio (LR) and its p value of models. “AJCC stage”, “cln” and "cln_im” represent models based on AJCC tumor pathologic stage, baseline variables and the combination of baseline variables and WSI features, respectively. Abbreviations: “All”, all the patients; “AJCC stage<III”, patients within AJCC tumor pathologic stage<III group; “AJCC stage≥III”, patients within AJCC tumor pathologic stage≥III group; “Metastatic”, the group of patients with metastatic tumors; “Locoregional”, the group of patients with locoregional tumors.
Table S5: The median survival time of higher and lower-risk subgroups in each pathologically-defined groups of patients.
Table S6: Summary of treatment information and their RFS associations of the study cohort.
Table S7: Summary of therapeutics type among the 50 patients with pharmaceutical treatment information available.
Table S8: Summary of some omitted clinicopathologic variables routinely used for prognostic analysis in the study cohort.
Figure S1: Three examples of nucleus segmentation results. Figure A shows an image block with a small number of nuclei; B is a block with a higher number of nuclei; C is a block almost all filled with nuclei.
Figure S2: The RFS probability curve of the 152 patients enrolled in this study.
Figure S3: Analysis of variation of cross-validation C-index along with the penalty (log-transformed ). Figure A was for models developed based on baseline variables, while figure B for that of based on both baseline variables and WSI features.
Figure S4: The overall survival probability of subgroups stratified by the risk score. A represents all the patients; B represents the patients in AJCC stage<III; C represents the patients in AJCC stage≥III; D represents the patients with metastatic tumors; E represents the patients with locoregional tumors.
Figure S5: The dot plot of the top 20 GO in BP identified by GOseq package. The DE Ratio is the ratio of differentially expressed genes among all the genes in a specific GO category. The GO Description displays the ID and brief information of each GO. The color of dot shows the adjusted p value of the GO term. The size of dot represents the number of differentially expressed genes.
Figure S6: The dot plot of the top 20 gene ontologies in CC identified by GOseq package. The DE Ratio is the ratio of differentially expressed genes among all the genes in a specific GO category. The GO Description displays the ID and brief information of each GO. The color of dot shows the adjusted p value of the GO term. The size of dot represents the number of differentially expressed genes.
Figure S7: The dot plot of the top 20 gene ontologies in MF identified by GOseq package. The DE Ratio is the ratio of differentially expressed genes among all the genes in a specific GO category. The GO Description displays the ID and brief information of each GO. The color of dot shows the adjusted p value of the GO term. The size of dot represents the number of differentially expressed genes.
Figure S8: The directed acyclic graph of the enriched GO terms in biological process category identified by clusterProfiler package. The color represents the significance of GO terms (more significant from yellow to red). The arrow represents the hierarchical relationship between two terms. The shape of each term represents the top 10 significant GO terms (rectangle) and others (ellipse). In each term the GO ID, brief description, FDR, the number of differentially expressed genes and all genes were displayed.
Figure S9: The directed acyclic graph of the enriched GO terms in cellular component category identified by clusterProfiler package. The color represents the significance of GO terms (more significant from yellow to red). The arrow represents the hierarchical relationship between two terms. The shape of each term represents the top 10 significant GO terms (rectangle) and others (ellipse). In each term the GO ID, brief description, FDR, the number of differentially expressed genes and all genes were displayed.
Figure S10: The directed acyclic graph of the enriched GO terms in molecular function category identified by clusterProfiler package. The color represents the significance of GO terms (more significant from yellow to red). The arrow represents the hierarchical relationship between two terms. The shape of each term represents the top 10 significant GO terms (rectangle) and others (ellipse). In each term the GO ID, brief description, FDR, the number of differentially expressed genes and all genes were displayed.
Figure S11: An illustration of WSI processing and feature extraction. A, image foreground segmentation; B, cropping global ROI; C, zooming in the selected global ROI; D, blocks sampling; E, nucleus segmentation; F, sampling nucleus and cropping its ROI; G, extracting texture features from each nucleus ROI; H, extracting texture features from global ROI.
Whole slide image processing and feature extraction.
Differential gene expression analysis.
Computational formulas of texture features included in the final model.