GCNT3 expression in LUSC from tissue microarrays
The 250 LUSC samples involved in tissue microarrays consist of 23 females and 224 males with an average age of 58; the 119 non-cancer lung samples involved in tissue microarrays consist of 13 females and 103 males with an average age of 56 (GaoTable1). According to the results from tissue microarrays, GCNT3 was significantly upregulated in LUSC tissues (10.888 ± 2.350) compared with non-cancer tissues (3.429 ± 1.565) (P < 0.001) (GaoFigure1 and Supplementary Fig. 1) (GaoTable1). Images of IHC staining confirmed medium or strong immunoreactivity of GCNT3 in cancer nests of LUSC tissues while weak or even negative immunoreactivity was presented in non-cancer lung tissues (GaoFigure2). Statistical analysis of clinic-pathological data indicated that GCNT3 expression in LUSC patients with early T stage (1–2) and higher grade (11.018 ± 2.174, 11.491 ± 1.543) was remarkably higher than that in LUSC patients with advanced T stage (3–4) and lower grade (9.714 ± 3.343, 8.962 ± 3.509) (P = 0.024, P < 0.001) (GaoTable1).
GCNT3 expression in pan-squamous cell carcinoma
From the perspective of pan-squamous cell carcinoma, GCNT3 was overexpressed in the majority of squamous cell carcinomas (CESC, ESCA and LUSC) except for HNSC (GaoFigure3).
GCNT3 overexpression in LUSC validated by integrated data of tissue microarray, external RNA-seq and microarrays
Detailed flowchart of selecting eligible RNA-seq and microarray datasets for comprehensive expression analysis was summarized in Supplementary Fig. 2. A total of 27 microarrays from GEO database and two microarrays from ArrayExpress database were included. Basic information of these datasets was given in Supplementary Table 1. The amalgamation of tissue microarray, and external microarrays and RNA-seq datasets contained huge samples of 1632 LUSC cases and 1478 non-cancer cases. Violin plots for differential GCNT3 expression and ROC curves for distinguishing capacity of GCNT3 in all included datasets revealed obvious overexpression of GCNT3 in LUSC and moderate discrimination ability of GCNT3 overexpression in most datasets (GaoFigure1 and Supplementary Fig. 1). Forest plot of SMD merged from all included datasets supported overexpression of GCNT3 in LUSC (SMD = 0.57, 95%CI = 0.15–0.99) (GaoFigure4) and SROC curves generated from the diagnostic data of all included datasets corroborated moderate performance of GCNT3 overexpression in differentiating LUSC from non-cancer tissues (AUC = 0.61, 95%CI = 0.57–0.65) (Supplementary Fig. 3).
Prognostic value of GCNT3 expression for LUSC
Kaplan-Meier survival analysis and log-rank tests were conducted for RNA-seq dataset of TCGA-LUSC project and 15 GEO microarrays (GSE81089, GSE74777, GSE50081, GSE41271, GSE30219, GSE29013, GSE19188, GSE17710, GSE14814, GSE12428, GSE11117, GSE12472, GSE4573, GSE8894 and GSE5123). GCNT3 overexpression exerted unfavorable impact on the progression-free survival and overall survival of LUSC patients from GSE29013 (HR = 3.288, P = 0.034; HR = 4.776, P = 0.029) GaoFigure5). No significant prognostic results were yielded from other datasets. Prognostic data of overall survival, event-free survival, progression-free survival, recurrence free survival or relapse-free survival from the above included datasets were listed in GaoTable2. Forest plots of pooled HR were divided into subtypes of overall survival, recurrence-free survival and relapse-free survival. The influence of GCNT3 expression on the prognosis of LUSC patients reflected from forest plots were insignificant (Supplementary Fig. 4).
Genetic alteration profile of GCNT3 in LUSC
Record from the OncoPrint module of cBioPortal database showed two cases of missense mutation and six cases with high mRNA in 178 LUSC cases from TCGA database (GaoFigure6), which suggested that the predominant type of genetic alteration for GCNT3 was high mRNA in LUSC. No indels (insertion and deletions) and synonymous mutations of GCNT3 in LUSC were found in other databases. The mRNA expression of GCNT3 was negatively correlated with methylation level of GCNT3 (r=-0.21, P < 0.001) in LUSC
Functional enrichment analysis for GCNT3-correlated genes in LUSC
Differential expression analysis and expression correlation analysis were performed on RNA-seq dataset from TCGA database, in-house microarray and 34 external microarrays with gene expression matrix of LUSC (E-MTAB-5231, E-MTAB-8615, GSE103512, GSE10937, GSE11117, GSE11969, GSE12428, GSE12472, GSE135304, GSE19188, GSE1987, GSE2088, GSE21933, GSE27489, GSE27553, GSE29249, GSE30219, GSE31446, GSE31552, GSE32036-GPL6884, GSE3268, GSE33479, GSE33532, GSE40275, GSE4824-GPL96, GSE4824-GPL97, GSE49155, GSE6044, GSE62113, GSE67061-GPL6480, GSE74706, GSE81089, GSE84784 and GSE8569). A total of 48 genes and 51 genes were defined as genes positively correlated with GCNT3 and genes negatively correlated with GCNT3, respectively (Supplementary Fig. 5). Genes positively correlated with GCNT3 in LUSC mainly assembled in biological processes and KEGG pathways such as cornification, skin development, keratinocyte differentiation, amoebiasis pathway, p53 signaling pathway and protein digestion and absorption pathway (GaoFigure7). Gens negatively correlated with GCNT3 in LUSC were actively enrolled in biological processes and KEGG pathways including cardiac chamber development, cardiac septum development, cardiac chamber morphogenesis, tryptophan metabolism pathway, fluid shear stress and atherosclerosis pathway and malaria pathway (GaoFigure8).