Overview of the multi-omics study of PTC
To comprehensively understand the molecular basis of PTC with different RRs, 102 PTC patients were collected, and five different types of omics including genomics, transcriptomics, metabolomics, proteomics and phosphorylated (phospho)-proteomics were performed (Fig. S1a). Whole exome sequencing (WES) based genomics data were from 97 tumor tissue samples and 33 paired normal tissues, the RNA-sequencing (RNA-seq) based transcriptomics data (16925 genes) were from 92 tumor tissue samples and 34 paired normal tissue samples, metabolomics profiling (503 metabolites) were conducted on 102 tumor tissue samples and 37 paired normal tissue samples, and proteomics (3147 proteins) and phospho-proteomics (652 phospho-proteins) profiling were performed on 37 paired tumor-normal tissues (Fig. S1a).
Genomic profiling of the PTC patients
An average of 74 nonsynonymous somatic point mutations and 2 indels were identified in the 97 Chinese PTC patients. Consistent with most genome studies about PTC [12, 13], the most frequent somatic mutation gene was BRAF (47%, all belong to V600E mutation, Fig 1a). In addition, frequently mutated cancer-associated genes also included MUC16 (36%), RNF213 (8%) and MSH6 (7%) (Fig. 1a), showing higher mutation frequencies than the TCGA PTC dataset [13]. Here, the MUC16 mutations were specifically enriched in the PTC patients with high RR (Fig. 1b), and also associated with multiple pathological factors including high RR (P=0.027), recurrence (P=0.010), metastatic lymph node size larger than 3cm (LNM.3cm) (P=0.018), T3 stage (P=0.032), N1b stage (P=0.0061) and M1 stage (P=0.021) (Fig. 2c, examined by hypergeometric distribution). Meanwhile, this Chinese PTC cohort did not contain mutations in RAS which were frequently mutated in previous PTC studies [12-14].
There were also frequent TERT promoter mutations (C228T, 14%) in the PTC patients. The mutations were also significantly enriched in the high RR patients (Fig. 2b and 2d), and frequently overlap with certain pathological or clinical factors including high RR (P=0.0026), recurrence (P=0.0030), LNM.3cm (P=0.011), extrathyroidal extension (ETE) (P=0.0013), lymph node metastasis (LNM) (P=0), or extranodal extension (ENE) (P=0.0077) (Fig. 2d, examined by hypergeometric distribution).
Gene rearrangements in RET, NTRK and BRAF have been frequently identified in PTC [15]. Here, RET fusions (CCDC6-RET 8%, NCOA4-RET 5%) were the most frequent fusions, and multiple NTRK fusions (NTRK3-ETV6, TPR-NTRK1, ETV6-NTRK3) were also identified (Fig. 1a). In addition, several other gene fusions (FBXO25-SEPTIN14, TLK2-FAM157A, ZNF33B-NCOA4) showing rare frequencies in previous PTC studies were also identified (Fig. 1a). Interestingly, several gene fusions (NCOA4-RET, TLK2-FAM157A, ZNF33B-NCOA4, TPR-NTRK1) also showed specific enrichment in the high RR (Fig. 1b).
Multi-omics based comparison of tumor and normal tissues of PTC patients
In addition to the genomic alterations, differentially expressed molecules (DEMs) were recognized by comparing between tumor and matched normal samples based on the multi-omics profiling data (Fig. S2). As a result, four types of DEMs, including 1674 genes (P<0.01, DeSeq2 [16]), 1864 proteins (P<0.01, Wilcox-test, paired), 391 phospho proteins (P<0.01, Wilcox-test, paired) and 334 metabolites (P<0.01, Wilcox-test, paired) were recognized (Fig. 2a). The most significant DEMs in mRNA level included GGTLC3, C1QL1, PRG4, etc. The most significant metabolites included free fatty acids (FFAs), serine, citric acid, triglycerides (TGs), sphingosines (SPHs), etc. Increased FFAs in PTC tumors have also been identified by other metabolomics studies [17]. Proteins like tenascin (TNC), and Fibronectin 1 (FN1) and dipeptidyl peptidase 4 (DPP4) and phospho-proteins of major vault protein (MVP) and Fibronectin 1 (FN1) showed remarkable up-regulation in the PTC tumor tissues, while proteins thyroid peroxidase (TPO), desmin (DES) and fatty acid binding protein 4 (FABP4), and phospho-proteins of thyroglobulin (Tg), DES, hemoglobin subunit delta (HBD) and hemoglobin subunit beta (HBB) were down-regulated in the PTC tumors (Fig. 2b). TNC was reported to show remarkably high expressions in medullary TC [18], here, we found it was also up-regulated in the PTC tumors. The up-regulation of FN1 was observed in all types of TC [19]. TPO, an essential enzyme for the production of thyroid hormones, is expressed mainly in normal thyroid cells [20]. The expression levels of TPO were decreased in the PTC tumors when compared to the normal ones.
Pathways enriched by the four types of DEMs were identified respectively (P<0.05, Hypergeometric Distribution). A large fraction of the enriched pathways were metabolism pathways even for the DEMs in terms of genes, proteins and phospho-proteins (Fig. 2a), and the pathways enriched by genes, proteins and phospho-proteins simultaneously all fell in metabolism pathways including valine, leucine and isoleucine degradation, pyruvate metabolism, glycolysis/gluconeogenesis, as well as arginine and proline metabolism (Fig. 2c), where glycolysis and pyruvate metabolism were also enriched by the differentially expressed metabolites (Fig. 2d). Meanwhile, multiple metabolism pathways like oxidative phosphorylation and citrate cycle were enriched by at least three types of DEMs (Fig. 2d). These together suggest the remarkable metabolic alterations of the PTC tumor tissues.
The multi-omics based pathway analysis enable a comprehensive description of the pathway alteration. Taken glycolysis as one example (Fig. 2e), we observed that although most enzymes were down-regulated considering the mRNA expressions (e.g., HK1, PGM1), many of them were up-regulated in the protein or phospho-protein levels (e.g., ENO1, PCK1, PDHA1), suggesting the complicated post-transcriptional modifications in the glycolysis process of PTC patients. Meanwhile, the metabolite changes were mainly reflected in the reduced levels of glucose, fructose, glycerate-3P and pyruvate and increased levels of lactate (i.e., L-Lactate) (Fig. 2e). Increased levels of lactate in TC and many other caner types have been widely reported [9]. The multi-omics based pathway alteration patterns help further explain potential mechanisms including alteration of the direct enzyme LDHA (in both mRNA and protein levels) and associated up/down-stream changes (e.g., ENO1, PKM and pyruvate).
In addition to the metabolism pathways, two cell death relevant pathways, necroptosis and apoptosis, were also significantly enriched by the DEMs in terms of metabolites, proteins and phospho-proteins (Fig. 2d). Most of the DEMs in the necroptosis and apoptosis pathways were up-regulated in the PTC tumor tissues than the normal tissues (Fig. S3a-S3b).
Multi-omics based molecular features of PTC with different RRs
The molecular expression features underlying different RRs of PTC were also characterized (Fig. 3a). The high RR PTC patients showed higher expression levels in multiple lipids like TGs, FFAs and the other metabolites like histamine and kynurenine. The high RR also displayed higher expressions in genes like MMP13, CST1, COL11A1, proteins like Tg, PTRRG, VWA1 and phospho-proteins like EPPK1, ALDH1A1 and LAMC1. The intermediate RR PTC patients showed higher expressions in metabolites like several FFAs and kynurenine, genes like IGFN1, LOC391322 and ZNRD1, proteins like FTL, FABP5 and APOB, and phospho-proteins like C1QB, HBB and HBD. The low RR patients showed higher expressions in metabolites like PG (18:2_18:2) and OAHFA (18:2_18:1), genes like JSRP1, TCAP and TNNI2, proteins like ACADL, ABHD11 and FN1 and phospho-proteins like TNC, FN1 and POSTN. Comparing to the alterations between tumor and normal samples, the high RR PTC tumor samples showed reversed alterations compared to intermediate or low RR ones considering various molecules (Fig. 3a and Fig. S4a-S4d). For instance, although the PTC tumor samples showed significant reduced protein levels in Tg compared to the normal samples, the high RR PTC samples were with higher protein expressions of Tg comparing to other tumor samples (Fig. 3a and Fig. S4a-4d).
The expression profiles of the RR relevant molecules especially for the FFAs (FFA 26:2, FFA 24:2, FFA 26:4) and several proteins or phospho-proteins were highly associated (Fig. 3b, spearman correlation > 0.65 or spearman correlation < -0.65 ). The FFA 26:2, Tg, FN1 and 5-Lipoxygenase (ALOX5), phospho-FN1 and phospho-TNC harbored a relative hub position in the correlation network, suggesting their crucial roles in interactive regulations or signaling communications. ALOX5, as a non-heme iron-containing enzyme, can catalyze the peroxidation of polyunsaturated fatty acids [21]. Aberrant expression of ALOX5 has been observed in various types of cancers including PTC [22]. Here, we also found ALOX5 showed specific low expressions in high RR PTC patients, and its alterations were associated with changes in many FFAs (Fig. 3b-d).
The different RRs also displayed remarkable differences in the pathway profiles. For high RR, the metabolites in various metabolism pathways, e.g., biosynthesis of amino acids and glycolysis, were up-regulated, while the protein levels of metabolic enzymes were mainly down-regulated (Fig. 3c). Except of the direct metabolism enzymes, there were other proteins showing remarkable associations with metabolite changes in PTC (Fig. 3f), e.g., protein Tg, phospho-protein MSN (Fig. 3g-3h). For the other pathways, the high RR showed up-regulations in PI3K-AKT and TGF-beta signaling pathways (based on the mRNA expressions) and thyroid hormone synthesis (based on the protein expressions) (Fig. S4d).
Integrative correlation analysis of the multi-omics data
Based on a supervised multi-omics integrative analysis method called DIABLO (Data Integration Analysis for Biomarker discovery using Latent cOmponents) [23] to simultaneously maximize the correlations among different types of omics and identify key molecules which can discriminate different sample groups (i.e., high, intermediate and low RR groups and the normal sample group). As a result, the general correlations between metabolism, proteomics and phospho-proteomics were high (no less than 0.88) suggesting common information among metabolism, proteomics and phospho-proteomics. By contrast, the correlations between transcriptomics and the other types of omics were relatively low (less than 0.6, Fig. 4a), implying the complicated post-transcription modifications.
The key molecules were further clustered into four network modules based on their expression profiles and inter-correlations, and different modules showed distinctive expression profiles (Fig. 4b) and interaction patterns (Fig. 4c). The molecules in the first module (M1) were mainly composed of extracellular matrix (ECM) relevant proteins including FBLN5, NID1, NID2, COL4A2, TINAGL1, VWA1 and different chains of the laminin proteins (LAMA4, LAMB1, LAMC1) [24], they showed high inter-correlations and possessed higher expression levels in the high RR groups than the other tumor samples (Fig. 4b-4c), highlighting the key role of ECM interactions in PTC recurrence. The second module (M2) was composed of multiple metabolism relevant phospho-proteins like PGK1, PSMF1, PRDX1 and TOM1, they showed lower expressions in the tumor tissues, especially the high RR tumor tissues (Fig. 4b). Their expressions were also associated with the proteins IGKV3-15, RPS27A, genes MYH11 and HTR2A (Fig. 4c). The third module (M3) was the largest module, and molecules in M3 mainly showed higher expressions in the tumor tissues (regardless of the RRs) than the matched normal tissues (Fig. 4b). There were three sub-modules in M3 which were aggregated by metabolites, genes and proteins/phospho-proteins (Fig. 4c, M3). The inter-correlated metabolites in M3 were mainly lipids including phosphatidylcholines (PCs), phosphatidylethanolamines (PEs) and sphingomyelins (SMs). A large fraction of the proteins/phosph-proteins were involved in autophagy (STAT3/ATP6V1B2/ATP6V1C1/HGS/LAMTOR2/VPS13C/HSP90AA1) [25]. The genes were involved in glutathione metabolism (GGTLC3/GGT2) [25], immune response (IFNE/PRSS2) [25] and exocytosis secretion of thyroid-stimulating hormone (TRHR). Meanwhile, TRHR, C1QL4 and AMY1B possessed inter-connection positions in the network of M3. Molecules in the fourth module (M4) mainly showed higher expressions in the intermediate and low RR groups (Fig. 4b). Metabolites including FFAs, Diacylglycerols (DGs) and Phosphatidylglycerols (PGs) and the fatty acid binding protein FABP5 formed an intermediated layer linking the genes and proteins or phospho-proteins in the network of M4, and multiple proteins and phospho-proteins (ERAP2/CYBB/CD74/DYNC1H1/RAB7A) in M4 were involved in antigen processing and presenting [25] (Fig. 4c), indicating the potential interactions between fatty acid metabolism and adaptive immune functions in PTC.
Integrative stratification of PTC patients into four subtypes based on transcriptomics and metabolomics
Notably, although most high RR samples showed low expressions of molecules in M4, part of them also show similar expression profiles with the intermediate and low RR samples (Fig. 4b), implying alternative molecule subtypes different from the ATA risk classification may exist. We re-stratified the PTC patients into four subtypes based on a consensus integrative clustering analysis (see Methods, Fig. S5a) of the transcriptomics and metabolomics data (proteomics and phospho-proteomics were not considered here since the two types of omics were highly associated with metabolomics).
The four redefined subtypes showed significant differences in the transcriptional and metabolism profiles (Fig. 5a) as well as multiple clinical and mutation features including RR (P = 1.61×10-5, chi-square test), T stage (P = 4.17×10-2, chi-square test), N stage (P = 2.45×10-3, chi-square test), BRAF mutation (P = 5.67×10-3, chi-square test), TERT promoter mutation (P = 3.49×10-3 for overlap between CS4 and TERT promoter mutation, examined by hypergeometric distribution ) (Fig. 5a-5b) and ENE (P = 6.95×10-3, chi-square test, Fig. S5b). The subtype CS1 contained more low RR patients, while the other three subtypes CS2 to CS4 had more high RR patients (Fig. 5b). The subtype CS2 and CS3 had more T3 stage, N1b stage patients but had less BRAF and TERT promoter mutations (Fig. 5b). The subtype CS4 had the highest BRAF and TERT promoter mutation frequencies (Fig. 5b). Moreover, the four subtypes possessed different prognosis outcomes in terms of recurrence free survival (Fig. 5c), where the subtype CS2 and CS3 respectively showed the worst and best prognosis among the three high RR enriched subtypes (Fig. 5d-5e). By contrast, the prognosis differences based only on the RRs were not significant (Fig. S5c). Taken together, the transcriptomics and metabolomics data help redefine four meaningful PTC subtypes.
Multi-dimensional characterization of the four PTC subtypes
The four subtypes were with remarkably distinctive molecular profiles, and each subtype possessed various specifically up or down regulated genes (Fig. 6a) and metabolites (Fig. 6b). Plasminogen (PLG) was reported to show significantly lower expressions in serum samples of PTC patients than the nodular goiter patients [26]. Here, the mRNA expression levels of PLG were higher in the low RR dominated subtype CS1 than the other subtypes (Fig. 6a). HSP6A was found to be a potential biomarker to predict the prognosis of TC [27]. ECM1 is associated with tumor invasiveness and poor prognosis in various cancer types [28]. Both HSP6A and ECM1 showed CS2 specific higher expressions (Fig. 6a). Considering metabolites, the subtype CS1 had higher levels in FFAs, PGs, Fructose 1,6−diphosphate, etc.; the subtype CS2 had higher levels in stachydrine; the subtype CS3 had higher levels in TGs, citric acid, etc.; while the subtype CS4 showed higher levels in acylcarnitines, adenosine, histamine, etc. (Fig. 6b).
In addition to the molecular features, the four subtypes also showed differences in the other key aspects including tumor sizes, number of metastatic lymph nodes, tumor differentiation scores (TDSs) [6], BRAF-scores and RAS-scores. The subtype CS2 and CS3 showed larger tumor sizes than the subtype CS1 (Fig. 6c). The subtype CS1 had less number of metastatic lymph nodes than the other subtypes (Fig. 6d). The subtype CS2 showed higher TDSs than the subtype CS1 and CS4, and the subtype CS3 had higher TDSs than the subtype CS1 (Fig. 6e). Moreover, both subtype CS2 and CS3 showed higher RAS scores and lower BRAF scores comparing to the subtype CS1 and CS4 (Fig. 6f-6g). Correspondingly, the subtypes CS2 and CS3 had lower BRAF mutation frequencies than CS1 and CS4 (Fig. 5b).
Furthermore, the pathway profiles for the four subtypes were also identified. Although no significant differences in terms of the tumor sizes, number of metastatic lymph nodes, TDSs, BRAF and RAS scores were observed between the subtype CS2 and CS3 (Fig. 6c-6g, Fig. S6a), they displayed noteworthy opposite trends in the pathway profiles (Fig. 6h-6i). For the metabolism pathways, the mRNA expressions of enzymes in various metabolism pathways were up-regulated for CS2 and down-regulated for CS3 (Fig. 6h). Reversely, most immune relevant pathways were up-regulated for CS3 but down-regulated for CS2 (Fig. 6i). The up-regulation in various metabolism enzymes in CS2 imply a high metabolite consumption and can partly explain why few metabolites show higher levels in the CS2 subtype (Fig. 6b).
Single cell RNA-seq analysis has been performed to illustrate the tumor microenvironment of PTC [29]. We used a deconvolution method to predict the tumor microenvironment compositions of the PTC tumor samples based on the bulk sample RNA-seq data in our study and a previously reported single cell RNA-seq dataset of PTC [29] (see also Methods). As result, the four subtypes also displayed distinctive cell compositions, especially for the epithelium sub-populations (Fig. 6j, Fig. S6b), implying the different subtypes are oriented from different types of malignant thyrocytes (Fig. 6j).
Potential targets or biomarkers of different subtypes
According to the clinical, molecular and pathway features of the four subtypes, we summarized the four subtypes as low RR and BRAF-like (CS1), high RR and metabolism type (CS2), high RR and immune type (CS3), and high RR and BRAF-like (CS4).
Candidate druggable targets for each subtype were identified (Fig. 7a). These distinctive expression profiles of the four subtypes in the druggable targets, suggesting that different therapeutic strategies should be applied to different subtypes in PTC. Carbonic anhydrase 12 (CA12/CAXII) is one metabolic enzyme and has emerged as a one promising cancer therapeutic target [30]. Germinal center kinase (GCK) has been identified as a therapeutic target in multiple myeloma with RAS mutation [31]. Both CA12 and GCK can be candidate targets for the high RR and metabolism subtype (CS2) as both CA12 and GCK showed higher expressions for CS2 (Fig. 7a). PRMT8 can be a potential therapeutic target for colon tumor [32]. CARTPT may affect thyroid stimulating hormone levels due to its association with the neuroendocrine function of hypothalamus [33]. PRMT8 and CARTPT can be potential therapeutic targets of the high RR and immune subtype (CS3, Fig. 7a). CEACAM6 has been regarded as a potential target for cancer immunotherapies [34], and it showed a CS4 specific up-regulation among the four subtypes (Fig. 7a).
In addition, metabolites which showed significantly associations with these potential targets were also identified (Fig. 7b). These metabolites can be potential biomarkers to indicate different targets. For instance, CEACAM6 showed high positive correlations with PE (16:0p_16:1) and PE (16:0p_18:1), then high levels of the two PEs in certain PTC patients may suggest the applicability of the CEACAM6 targets.
Furthermore, to predict the four meaningful PTC subtypes based on the omics data, we constructed a subtype predictor based on the expression levels of the top ranked genes and metabolites (see Methods). The predictor can classify the four subtypes accurately, with the areas under the receiver operator characteristic curves (AUCs) 1, 0.97, 0.97, 0.99 for CS1 to CS4 on the testing dataset (Fig. 7c). Meanwhile, the predicted probabilities of being the subtype CS2 were associated with recurrence-free survival where high CS2 probabilities were associated with unfavourable prognosis (Fig. 7d).