For this study, we selected three cell line models to investigate differences in the reference proteome, novel isoforms and the alternative proteome. Two of these cell lines (PEO-4 and SKOV-3 cells) are derived from ascitic fluid from ovarian adenocarcinomas. Particularly, PEO-4 cells have a high-grade serous histology and were collected after clinical resistance from a patient who previously received cisplatin, 5-fluorouracil and chlorambucil treatment (43). PEO-4 cells have been xenografted into immune-deprived mice and found to be tumorigenic (44). SKOV-3 cells are clear cell carcinoma cells and resistant to tumour necrosis factor, diphtheria toxin, cisplatin and adriamycin (45). According to Hernandez et al. (46) and Hallas-Potts et al. (47), PEO-4 cells have a lower tumorigenicity than SKOV-3 cells when injected in nude mice. The T1074 ovarian cancer cell line was immortalized by SV40 virus and originally derived from normal human ovarian surface epithelial cells.
Differential gene expression analysis
In order to generate custom databases using OpenCustomDB, RNA-Seq data is required. From these reads, the assessment of differential gene expression can be performed. Mapping the RNA-Seq reads to the genome using RSEM and STAR enabled the identification of 117,636 transcripts expressed in 70% of four replicates between cell lines. Of these, 96,442 transcripts were shared by the three cell lines. Additionally, 1567, 2391, and 1780 transcripts were only identified in T1074, PEO-4 and SKOV-3 cells respectively (Fig. 2A). Total RNA-seq data analysis showed that 37,197 transcripts were differentially expressed (DESeq2, FDR < 0.05). Hierarchical clustering (Fig. 2B and Supplemental Table 1) indicated six main transcript clusters: upregulation in PEO-4 (cluster 1, 3117) in SKOV-3 (cluster 2, 3220), or in both PEO-4 and SKOV-3 (cluster 3, 1138 transcripts); and downregulation in SKOV-3 (cluster 4), in PEO-4 (cluster 5), and in both cancerous cells (cluster 6, 12,129 transcripts).
Mapping RNA-Seq reads to the human genome Hg38 allowed us to find 29,245 expressed genes among the three cell lines. Among these expressed genes, 420, 407 and 540 were identified to be specific for T1074, SKOV-3 and PEO-4 cells respectively (Figure. 3A). Figure 3B displays the different categories of genes annotated and the major category of these genes were annotated as non-coding (pseudogenes and lncRNAs, 60.9%), while approximately 37% of the genes were annotated as coding genes. Hierarchical clustering was performed on the expression values obtained from the DESeq2 workflow. A total of 17,368 genes were identified as significantly differentially expressed between the three cell lines (Fig. 3C and Supplemental Table 2), and of these, 2142 and 1949 genes were upregulated in PEO-4 and SKOV-3 cells respectively. On the other hand, 3345 and 2692 genes were downregulated in PEO-4 and SKOV-3 cells respectively. Between the two cancerous cell lines, 632 genes were identified as upregulated and 6608 as downregulated.
RNA-Seq based databases
We used RNA-Seq data from the ovarian epithelial cell (T1074) and the OvCa cell lines (PEO-4 and SKOV-3) to generate two cell-specific protein databases for each cell line. Figure 4 summarizes the protein types of the sequences stored in these databases. The distribution is similar for the three cell lines used and the custom 100K DB contained around 15% of wild-type (WT) RefProts, 2% of variant RefProts, 5% of WT novel isoforms, less than 1% of variant novel isoforms, 73% of WT AltProts and 5% of variant AltProts (Fig. 4).
The OpenCustomDB workflow was used to generate comprehensive transcript databases (Full DB) without limiting the maximum number of entries to 100,000. These databases included 448,569, 443,177 and 437,568 entries for T1074, PEO-4 and SKOV-3 cells, respectively. For example, for T1074 cells, 68,759 WT RefProts (15.33%), 5366 variant RefProts (1.2%), 43,609 WT novel isoforms (9.7%), 2529 variant isoforms (0.6%), 319,612 WT AltProts (71.3%) and 8694 variant AltProts (1.9%) were stored in the database. Similar ratios were observed for PEO-4 and SKOV-3 cells (Fig. 4).
Of the AltProts predicted, we mapped their transcriptomic origin by extracting information from OpenProt (Fig. 5). AltORFs overlapping a CDS in a shifted reading frame, or in 3’UTRs and ncRNA were found to be the main sources of predicted AltProts.
Additionally, a comparison was performed between the databases across the three cell lines (see Supplemental Fig. 1). In total, 282,287 AltProts were found to overlap across the three cell lines and, 15,109, 11,026 and 8897 unique AltProts were predicted in T1074, PEO-4 and SKOV-3 cells, respectively. Among the cancerous cell lines, 8055 AltProts were found to overlap. Approximately 39,000 sequences of novel isoforms were predicted to be shared across the three cell lines, with specific novel isoforms also identified in each cell line and in both cancerous cells. Almost 60,000 RefProts were found to overlap across all cell lines, with approximately 6000 being specific for each cell line. The same analysis was performed on the 100K DB, with 52,483 AltProts, 3116 novel isoforms and 10,346 RefProts being predicted to overlap across all three cell lines. A main advantage of these databases is that they contain predicted AltProt variants specific of each sample; for instance, 4321 specific AltProt variants were predicted for PEO-4 cells and, 4355 for SKOV-3 and 3540 for T1074 cells. This also shows that both cancerous cells have an increased number of transcript variants, which may be translated into mutated AltProts.
Proteome analysis of subcellular compartments
To evaluate the deeper differences in the proteome of these three different cell lines. The MS/MS data sets obtained from analysing each subcellular proteome of the three cell lines were analysed using Proteome Discoverer V2.5. Three different child processing workflows that contained three sequential Sequest HT (48) nodes were used with the databases as described in the material and methods section. We considered a protein as identified when it was present in at least one subcellular compartment in 70% of the replicates of at least one cell line. Figure 6A displays the distributions of all identified proteins. 6301 RefProts were identified in T1074 cells, 6268 in PEO-4 cells and 6319 in SKOV-3 cells. Among the identified RefProts, 234 (T1074 cells), 224 (PEO-4 cells) and 233 (SKOV-3 cells) were variants of RefProts. In addition, 137 novel isoforms were identified in T1074 cells, and 136 in PEO-4 and SKOV-3 cells. A total of 8 variants of novel isoforms were annotated in T1074 cells, and 9 in SKOV-3 and PEO-4 cells. Finally, over 500 AltProts were identified in each cell line with similar numbers of AltProts identified in SKOV-3 cells (577), T1074 (556) and PEO-4 cells (549). The number of AltProt variants identified was 12 for PEO-4 cells, and 13 for T1074 and SKOV-3 cells. Additionally, the distribution of WT and variant proteins is shown in Fig. 6B.
Subcellular fractionation was used to link (a) cellular compartment(s) to identified AltProts (Fig. 7A). The membrane-bound fraction of all three cell lines contained the highest number of identified AltProts. In Figs. 7B and C, some general descriptions of the identified AltProts are displayed. Here, the majority of the AltProts identified possess a 3’UTR origin. Additionally, the vast majority (80.9%) have a molecular weight less than 10 KDa.
In addition, we identified cell line-specific RefProts, novel isoforms and AltProts. In T1074 cells, nine specific AltProts were identified, including the variant AltProt IP_290059@Asp99fs, which was found in the cytoskeletal fraction. SKOV-3 cells also had nine cell-specific AltProts, but without any variants, and PEO-4 cells had two specific wild-type AltProts identified. The characteristics of the cell line-specific AltProts are described in Supplemental Table 3. Overall, 508 AltProts were identified shared by all three cell lines, including 11 variants.
Among the identifications, 30 AltProts were identified in both cancerous cell lines. The variant IP_715944@Leu44Pro was identified in the cytoskeletal fraction of both cell lines. The WT AltProt IP_715944 is a 4.82 KDa protein composed by 47 amino acids. It is coded in the DHX8 gene. The variant of this AltProt is the result of a base substitution (c.131T > C) observed in the transcript ENST00000587574, which changed the proline at position 44 to a leucine. To verify the impact of the mutation, the sequence was analysed using protein BLAST (49), InterProScan (50) and Phobious (51). No significant similarity or any change in the predicted domains were identified.
Next, we performed a label-free quantitative analysis on the subcellular proteomes (n = 4), which led to the identification of 1,022 RefProts with significantly altered levels (ANOVA, q-value < 0.05) in the cytoplasmic fraction, 995 in the membrane-bound fraction, 561 in the nuclear fraction, and 159 in the chromatin and 590 in cytoskeletal fractions. The used RNA-Seq derived databases allowed us to identify and quantify variant proteins, and 88 RefProt variants were found at significantly different levels in the three cell lines. Of these variants, 39 were found in the cytoplasm, 39 in membrane-bound structures, 15 in the nucleus, 6 in the chromatin fraction and 23 in the cytoskeleton. Note, that 22 of the 88 RefProt variants were found in more than one cellular fraction.
Hierarchical clustering (Supplemental Fig. 2A and Supplemental Table 4) pointed to six main groups of proteins: up-regulation in (1) PEO-4 cells, (2) SKOV-3 cells, and (3) in both cancerous cells; and down-regulation in (4) SKOV-3 cells, (5) PEO-4 cells, and (6) in both cancerous cells. Table 1 displays the number of significantly deregulated WT and RefProt variants quantified in the three cell lines.
Table 1
Wild-type and variant RefProts significantly varied (ANOVA, q-value < 0.05). The number of WT and variant RefProts is displayed for the six main clusters identified upon LFQ proteomics.
Cluster | WT RefProts | RefProt variants |
Upregulated | PEO-4 cells | 482 | 10 |
SKOV-3 cells | 383 | 6 |
Both cancerous cells | 666 | 29 |
Downregulated | PEO-4 cells | 195 | 4 |
SKOV-3 cells | 328 | 16 |
Both cancerous cells | 1154 | 54 |
An identical hierarchical clustering was performed on novel isoforms, resulting in the identification of 53 wild-type novel isoforms and three novel isoform variants that were significantly varied (ANOVA, q-value < 0.05) between the three cell lines (Supplemental Fig. 2B and Supplemental Table 5). One of these novel isoform variants, II_587587@Asn359Asp, was found upregulated in both cancerous cell lines in the cytoplasm and membrane-bound fractions. This protein is a novel isoform expressed from the PMPCB gene. A second variant, II_702738@Ala184Thr[Leu79LeuAsn72Asn], was found to be downregulated in SKOV-3 cells in the nuclear fraction. This novel isoform is encoded by the WDR18 gene and possesses a substitution in position 184 and three silent mutations. II_597059@Glu65GlnAsn139AspAla57ValLys122ArgIle6ValGlu80Lys[Val118Val] was identified as upregulated in SKOV-3 cells in the chromatin-bound fraction. This protein is a novel isoform of HLA-H, which possesses seven mutations, one of which is a silent mutation.
The same workflow was used to compare the AltProt profiles between the three studied cell lines. In total, 73 AltProts were found at significantly altered levels and 41 of these were upregulated in the ovarian cancer cells, with 12 upregulated only in PEO-4 cells, nine in SKOV-3 cells, and 20 in both. Four AltProts were found to be downregulated only in PEO-4 cells or only in SKOV-3 cells, while 36 AltProts were downregulated in both cells (Supplemental tables 6 and 7). Figure 8 shows the distribution of the significantly altered AltProts over the five different subcellular fractions. We found 11 AltProts to be significantly regulated in more than one unique compartment. IP_067626, IP_070304, IP_108778, IP_147518, IP_178464, IP_213023, IP_246003 and IP_282949 were downregulated in both cancerous cells. Interestingly, IP_582685 (translated from a ncRNA transcript of the pseudogene GDI2P1) was identified upregulated at the membrane-bound fraction of both cancerous cells. Moreover, it was also found upregulated in the cytoplasmic and nuclear fractions of SKOV-3 cells. IP_062385 (translated from the 3’UTR part of the transcript ENST00000457946.1 coded by ZMYM4 gene) was found upregulated in both cancerous cells’ cytoplasmic fractions, while it was downregulated in the cytoskeletal fraction of these cells. A similar observation was made for IP_774693 (translated from an ncRNA of TUBAP2): this AltProt was upregulated in the membrane-bound fractions of the cancerous cells yet, downregulated in their cytoplasmic fractions.
Note that only two AltProt variants were found at significantly different levels. IP_174777 is a 53-amino acid AltProt encoded from the 3’UTR RNA of the TMEM245 gene. During the creation of our databases, a single base substitution (23A > G) in transcript ENST00000374586 led to the prediction of the variant IP_174777@Asn8Ser. This mutant AltProt was identified as significantly downregulated in both cancerous cells, compared to the epithelial ovarian cell line. The second AltProt variant identified as downregulated in the cancerous ovarian cell lines was IP_304294@Leu32fs. The WT AltProt, IP_304294, is a 57-amino acid protein coded by the MTMR1 gene and is translated from the 3’UTR of the transcript ENST00000445323. A guanine deletion at position 93 results in a reading frame shift at leucine 32. This shortens the protein to 44 amino acids and substituted the last 13 amino acids. For both proteins, a cytoplasmic domain was predicted by Phobius, and this prediction remained unchanged after the frame shift.
Proteome and transcriptome functional annotation
To integrate and interpret the data obtained from the differentially expressed reference proteome and transcriptome, we used the Database for Annotation, Visualization and Integrated Discovery (DAVID) (52). This online tool allows users to perform GO term enrichment, cluster redundant enriched terms, visualize enriched pathway maps and extract gene functionality and literature.
The RefProts identified as upregulated in cancerous cells were submitted to DAVID and showed that two major cancer-related KEGG pathways (39) were significantly enriched: central carbon metabolism in cancer (hsa05230; p-value: 1.90E-04) and chemical carcinogenesis - reactive oxygen species (hsa05208; p-value: 5.26E-06). The KEGG pathway proteoglycans in cancer (hsa05205; p-value: 0.026) was significantly enriched among the downregulated cancer RefProts.
Regulated protein clusters in SKOV-3 cells were found significantly enriched for the central carbon metabolism in cancer pathways (p-value: 7.3E-5). On the contrary, no significant enrichment was identified in PEO-4 cells. Based on this difference we presented the protein and transcript expression profiles on an adapted central carbon metabolism pathway in a cancer pathway map (Fig. 9). The complete list of genes and proteins enriched for this pathway can be found in Supplemental Table 8. One observes a significant upregulation of the NRAS protein in the RAS/RAF/MEK/ERK/c-Myc pathway in SKOV-3 cells (ANOVA q-value: 0.017). On the other hand, its transcript levels were significantly downregulated in PEO-4 cells (ANOVA q-value: 0.0004). Moreover, for the other two members of the oncogene RAS family, no significant variation was found at the proteome level whereas on the transcript level, HRAS was downregulated in PEO-4 cells (ANOVA q-value: 3.7E-6) and KRAS upregulated in SKOV-3 cells (ANOVA q-value: 5.58E-5). Other differences were observed for the MEK kinases MAP2K1 and MAP2K2; for instance, MAP2K2 was significantly downregulated in both cancerous cells’ membrane-bound fraction (ANOVA q-value: 0.005) and downregulated in the PEO-4 cytoskeletal fraction (ANOVA q-value: 0.028). MAP2K1 was found downregulated in PEO-4 cells (ANOVA q-value: 2.29E-6) while its transcript level was found upregulated in SKOV-3 cells (ANOVA q-value: 1.49E-5).
In another part of the central carbon metabolism in cancer pathway, SIRT6 and SIRT3 are considered as cancer associated genes (53–55). It has been found that downregulation of SIRT6 increased ovarian cancer cells growth (55). The transcript levels of SIRT6, a tumour suppressor gene, were found downregulated in PEO-4 cells (ANOVA q-value: 4.65E-6), while the transcript levels of c-Myc, an oncogene, were upregulated in these cells (ANOVA q-value: 3.88E-5). Protein levels of SIRT3, another tumour suppressor gene, were upregulated in both cancerous cells (ANOVA q-value: 0.005), while its transcript levels were found downregulated in PEO-4 cells (ANOVA q-value: 0.0001). The expression of the oncogenic PI3K family was also found significantly regulated among the three cell lines. PIK3R1 was upregulated in both cancerous cells’ cytoplasmic fraction (ANOVA q-value: 0.037), while its transcript was only upregulated in SKOV-3 cells (ANOVA q-value: 2.31E-5). Additionally, the transcripts of PIK3CB (ANOVA q-value: 0.0001) and PIK3R2 (ANOVA q-value: 0.005) were also only upregulated in these cells. On the contrary, the PIK3CA (ANOVA q-value: 0.001) and PIK3CD (ANOVA q-value: 0.0001) transcripts were found downregulated in both cancerous cells.
Other oncogenes in the central carbon metabolism cancer pathway are members of the AKT family. AKT1 protein (ANOVA q-value: 0.0002) and transcript levels (ANOVA q-value: 2E-5) were downregulated in PEO-4 cells. For AKT2 and AKT3, no significant variation in protein expression was found, while their transcript levels were significantly downregulated in both cancerous cells (ANOVA q-value: 0.02 and 3.6E-6).
With our proteogenomic workflow, we could identify a variant form of p53 (ENSP00000269305.8: p.Pro72Arg), an amino acid substitution that stems from the c.215C > G variant in TP53. This p53 mutant was significantly downregulated in both cancerous cells’ cytoplasmic (ANOVA q-value: 0.0036) and cytoskeletal (ANOVA q-value: 0.0096) fractions, while its transcript levels were only significantly downregulated in SKOV-3 cells (ANOVA p-value: 1.17E-10). Three other RefProt variants were identified in this pathway. ENSP00000359991.5: p.Thr238Met, a mutant of PGAM1 was downregulated in both cancerous cells (ANOVA q-value: 0.0013), while two mutants of HKDC1 were upregulated in both cancerous cells; ENSP00000346643.5: p.Thr124Ile, p.Asn917Lys, p.Arg827Trp, p.Trp721Arg, [p.Phe601Phe] (ANOVA q-value: 0.008) and ENSP00000346643.5: p.Thr124Ile, p.Asn917Lys, p.Trp721Arg, [p.Phe601Phe] (ANOVA q-value: 0.023).
Crosslinking network analysis
The computational analysis of the crosslinked samples was carried out as described in (30), which allowed us to generate a protein interaction map in Cytoscape (56) (Supplemental Fig. 3). A total of 90 crosslinks were identified (Supplemental table 9), among them 20 intra-crosslinks were identified, which do not give interactome information, but might be useful for structural studies. In this protein network (Supplemental Fig. 3), 28 protein-protein interactions (PPIs) were found in PEO-4 cells (marked in purple), 27 in SKOV-3 cells (marked in blue) and 35 in T1074 cells (marked in green). From these pairs, 12 crosslink interactions were identified in at least two cell lines. Among all the crosslinked pairs, 20 involved AltProts, four crosslinks were AltProt-AltProt interactions, and 13 AltProt-RefProt crosslinks were identified. The latter were considered most important for our study as they provide hints to an AltProt’s physiological or pathological involvement.
To attribute functions to an AltProt from this set of PPIs, we retrieved the known interactions from the STRING (57, 58), BioGrid (59) and IntAct (60) databases and included the identified crosslinked interactions (Supplemental Fig. 4). Additionally, for the RefProts that did not present a referenced STRING interaction within the crosslinked network, the addition of three STRING interactors has been performed to expand the network. We observed that seven PPIs had already been described (pink lines): B2M-HLA-B, B2M-HLA-A, ITGA5-ITGA1, TUBA1C-TUBB, HIST3H2A-HIST2H3D, PRC1-ORC1 and VP39-VPS13C. Using this network, a molecular function GO term and KEGG pathway enrichment analysis was performed with the ClueGO App(61) from Cytoscape. The interactions between AltProts and RefProts were displayed along with the enriched GO terms (Fig. 10). Four direct AltProt-RefProt-GO-term interactions were detected. The AltProt IP_192190 was crosslinked to KIF13A in PEO-4 cells and linked to the vesicle-mediated transport of plasma membrane (GO:0098876), Golgi to plasma membrane protein transport (GO:0043001), protein localization to plasma membrane (GO:0072659) and post-Golgi vesicle mediated transport (GO:0006892). The AltProt IP_136846 was identified as crosslinked to LGALS1 in T1074 cells, which is linked to the GO terms viral entry into host cell (GO:0046718) and biological process involved in interaction with host (GO:0051701). Similarly, IP_235241, crosslinked to ITGA5 in T1074 cells, was linked to the phagosome KEGG pathway (KEGG:04145) and the GO terms virus receptor activity (GO:0001618), biological process involved in interaction with host (GO:0051701) and viral entry into host cell (GO:0046718). Finally, IP_183088 was crosslinked to POLD3 in T1074 and PEO-4 cells. POLD3 is part of the DNA polymerase involved in the replication and reparation of DNA and linked to the UV-damage excision repair (GO:0070914) and response to UV (GO:0009411) GO terms.
Three AltProt-GO-term/KEGG pathways indirect links were identified. IP_292259, crosslinked to TMEM260 in T1074 cells, and TMEM260 possesses a STRING interaction with TOGARAM, which is linked to the non-membrane-bounded organelle assembly (GO:0140694), spindle assembly (GO:0051225) and microtubule cytoskeleton organization involved in mitosis (GO:1902850). Additionally, TMEM260 interacts with GOLGA7, which is linked to GO terms related to the vesicle-mediated transport to the plasma membrane. In addition, two AltProts were also identified to be related to these GO terms: IP_105326 and IP_118499. The former was crosslinked to VIM in SKOV-3 cells, and VIM was crosslinked to MACF1, which is linked to vesicle-mediated transport GO terms. IP_118499 was found crosslinked to CNNM3 in SKOV-3 cells, which processes a STRING interaction with CCNL2, which was crosslinked to VPS13C, which is linked to vesicle-mediated transport GO terms.
To confirm the probability of the observed interactions, we analysed 3D models of RefProt-AltProts using unguided interaction docking between the two partners (as described in (30)). The structures of the AltProts were predicted using I-Tasser(62), while those of the interactors were predicted using ClusPro (63). The RefProt, for which the structure was predicted by AlphaFold(64), was used as a receptor of the AltProt, which was smaller in structure. By measuring the distance of the predicted interactions, we confirmed the observed interactions from XL-MS with a mean of 23.467 Å (Supplemental Fig. 5), which is consistent with the distances described in the literature for DSSO, ranging from 5.3 (34) to 30 Å (35).