A workflow of this study was illustrated in Figure 1. In summary wet lab procedures include all experiments conducted in the laboratory in combination with their related statistical analyses and dry lab refers to in-silico analyses on datasets and their statistical evaluations.
Laboratory-based (wet) analyses
RNA quality and quantity
Mean absorbance ratios of 260/280 and 260/230 were 1.96 ± 0.11 and 1.97 ± 0.06, respectively for PTC tissues and their adjacent-normal tissues. The intensity of 28s-rRNA band was approximately 1.5-2 folds brighter than 18s-rRNA, indicating the acceptable integrity of extracted RNAs (data not shown).
Expression patterns of target genes
Based on three separate unpublished studies conducted in our lab, the expression of 3 target genes were evaluated (data not provided). The expression normalization was conducted with routinely used reference gene, GAPDH alongside with our just approved SYMPK (Figure 2). For gene-A, sample 2 and sample 15 were both upregulated; while samples 16 and 17 were downregulated. Except these four mentioned samples, 13 samples out of 17 showed antithetical expression pattern. Therefore, 76.5% contradiction was observed only when GAPDH was simply replaced by SYMPK, (Figure 2A). The same irregularities were observed for gene-B (Figure 2B, 52.9% anomalies) and gene-C (Figure 2C, 29.4% anomalies).
Reciprocal analysis of reference genes
Reciprocal normalization of SYMPK and GAPDH was done as one gene was considered as reference and the other as target gene and finally, ΔΔCq was calculated (Figure 3). Unsurprisingly, GAPDH was downregulated against SYMPK (ΔΔCq =-0.48) and SYMPK was upregulated against GAPDH (ΔΔCq = 0.80). When SYMPK was considered as reference gene and GAPDH as target (Figure 3A), mean and median values were -0.48 and -1.04 respectively. Alternatively, in Figure 3B, SYMPK was normalized against GAPDH and mean= 0.8 and median= 1.36 were recorded. Therefore, mean and median were increased by 0.3 values when GAPDH was considered as reference gene. None of these gene expression fluctuations were statistically significant as their p values were greater than 0.05.
Reference genes statistics against target genes
Basic statistics comparison of target genes (A, B and C) and two reference genes (GAPDH and SYMPK) were presented in Table 1. The mean expression of Gene-A and Gene-B in adjacent-normal tissues (29.00 and 28.74 respectively) was closer to the mean expression of SYMPK (29.96); but not that of GAPDH (26.53). A same pattern was also observed in PTC tissues; as SYMPK mean expression (29.59) was closer to Gene-A (28.44) and Gene-B (29.28), but far from GAPDH (25.69). On the other hand, the mean expression of Gene-C was close to the mean expressions of GAPDH, in both adjacent-normal tissues (26.24 and 26.53 respectively) and PTC tissues (25.8 and 25.69 respectively). In both tissues, there is much more variation in GAPDH gene expression (3.85<SD<4.59) than target genes (2.65<SD<3.71). Therefore, SYMPK with the lowest SD and CqCV% value in normal tissues (1.84 and 6.15) and PTC tissues (1.62 and 5.48), was a suitable reference gene.
Bioinformatics exploration
Inter-subtypes analysis
Expression and phenotype data of 14 microarray datasets (Supplementary Table 2) were downloaded and cleaned as explained previously. FVPTC and PTC samples were analyzed as a single phenotypic group, as FVPTC is the most common variant of PTC. A total of 520 samples including 116 normal, 38 FTA, 246 PTC, 39 FTC, 27 PDTC, 52 ATC, 2 MTC entered the study and were compiled for 6331 genes held in common. Microarray probes were matched with corresponding genes and mean counts were considered for redundant genes. Further, “removeBatchEffect” was applied on the data (Figures 4 and 5).
Effect size and FWER adjusted p-value were calculated for different cancer subtypes versus normal tissues (Table 2). Accordingly, GAPDH and SYMPK showed the ES of 0.23 and 0.15 respectively in PTC subtype; however, the ES of GAPDH was statistically significant (0.001). GAPDH was also significantly expressed in FTC (p=0.001), and MTC (p=0.002) subtypes. GAPDH represented significantly large ESs in ATC (0.65) and FTC (0.39) subtypes, and again SYMPK showed insignificantly (p=1) small ESs of 0.07 and 0.15, respectively. Negligible differences between GAPDH and SYMPK expression was recorded in FTA, PDTC and MTC subtypes. Top three best reference genes in PTC subtype were GUSB (-0.02), ACTB (0.031) and HPRT1 (0.037) with the smallest insignificant ES (Figure 6A\B).The best three genes based on other subtypes were SYMPK (0.070), TBP (-0.076) and GUSB (-0.09) in ATC (Figure 6C\D); ACTB (0.036), HPRT1 (0.062) and GUSB (0.064) in FTC (Figure 6E\F); GUSB (0.02), HPRT1 (-0.05) and TBP (-0.062) in FTA (Figure 6G\H); GUSB (0.06), HPRT1 (-0.07) and PYCR1 (-0.10) in PDTC (Figure 6I\J); ACTB (0.01), B2M (0.02) and TBP (-0.07) in MTC (Figure 6K\L).
Intra-sex data analysis
Calculation of ES and FWER adjusted p-value between cancer subtypes and normal tissues, were performed separately in both sexes (Table 3). Female expression ES (FES) - male expression ES (MES) ratio in PTC tissues, was dramatically big for GAPDH (8.04) and drastically small for SYMPK (0.69). Therefore, while SYMPK was expressed equally in both sexes (0.19 and 0.28) in PTC tissues, GAPDH expression was 8 times greater in female than male (0.22 and 0.03). GAPDH gene recorded its best FES/MES ratio=1.03 in ATC subtype and its worst in PTC subtype (8.04); while SYMPK gene showed closer ratio to one in all subtypes of thyroid cancer (0.68-1.75). On the other hand, GAPDH expression was 5 times greater in male than female (0.43 and 0.09) in FTC subtype. In PDTC, GAPDH was downregulated in male (-0.11) and upregulated in female (0.22) and therefore a negative FES/MES ratio= -1.95 was observed. In contrast, a positive FES/MES ratio (0.33) was recorded for SYMPK in PDTC tissues. FTA tissues were all taken from females and could not be calculated under these circumstances. Effect size of reference genes in female were depicted according to their subtypes (Figure 7-1). Best reference gene in female with PTC (Figure 7-1B), PDTC (Figure 7-1F), and FTA (Figure 7-1J) was ACTB, while B2M and TBP were respectively the best in female with FTC (Figure 7-1D), and ATC (Figure 7-1H) subtypes. Figure 7-2, illustrated ES of reference genes in male according to their subtypes. Best reference gene in male with PTC (Figure 7-2B), PDTC (Figure 7-2F), and FTC (Figure 7-2D) was HPRT1, while SYMPK was the best in male with ATC (Figure 7-2H). In brief, better reference genes were those with FES/MES ratio close to 1.
Intra-subtype, inter-sex combined analysis
To elucidate gene expression differences between male and female, each pathology was analyzed separately (Table 4, Figure 8). The best reference gene in normal tissues was TBP with E=0.002 and it was one of the three candidate reference genes also in PTC (E=0.021) and FTC (E=0.15). In all subtypes except PDTC, the expression of SYMPK was higher in female than male, while the expression of GAPDH showed the exact opposite pattern.
Raw expression counts of 560 samples including 502 PTC and 58 normal tissues were downloaded from TCGA database and the analysis results were depicted in Figure 9. The expression ESs of our selected reference genes are summarized in Table 5. The three genes with the lowest ESs were ACTB (-.001), TBP (-0.017) and SYMPK (0.034). GAPDH showed the highest ES value= 0.06 among these eight reference genes.
Datasets basic statistics
Basic statistics of RNAseq data were represented for PTC tissues and their adjacent-normal tissues from TCGA database (Table 6). Top three genes based on the lowest SD values were respectively SYMPK (SD: 0.41), GUSB (SD: 0.44), and both TBP and HPRT1 (SD: 0.47) in PTC tissues. Same genes but with dissimilar ranking, HPRT1 (SD: 0.30), GUSB (SD: 0.31), and SYMPK (SD: 0.35) were recorded for normal tissues adjacent to the PTC tumors. In both PTC and their adjacent- normal tissues, GAPDH was ranked at the fourth position with SD= 0.50 and SD=0.39.
Basic statistics of adjacent-normal tissues and each of PTC, FTC, ATC, PDTC, MTC and FTA subtypes were calculated based on microarray pooled meta-data (Table 7). In normal tissues, the genes with the lowest CV% values were GUSB (2.77), B2M (2.86) and SYMPK (3.10) respectively and GAPDH was ranked as the fifth with the value =3.37. In PTC tissues, GUSB (2.48), GAPDH (2.56) and ACTB (2.86) respectively showed the lowest CV% values; SYMPK was the fifth in the list (3.38).
The basic statistic of all 6331 genes of GEO dataset (Supplementary Table 3), and all 25705 genes of TCGA dataset (Supplementary Table 4) were presented for the ease of use. In these tables, the mean and the SD value of prospective target genes were given to make it possible to compare them with the basic statistics of candidate reference genes.