Using a three-step strategy to prioritize cancer-type specific genes and identify potential drugs
We proposed the first systematic investigation of repurposed drugs targeting genes, specifically essential in 19 different cancer types. The general framework of this study is outlined in Fig. 1. First, we obtained information on whether the 7470 genes were essential in each of the 325 cell lines, reported by the Sanger [22] Center and excluded 13 cancer types with cell line numbers less than five in that gene dependence project. Then, we counted the EssentialitySpecificityScore (e score) of these genes in each cancer type. With this score, we constructed relationships between genes and cancer types. We also considered the ratio of cell lines for which a gene is essential in a cancer type and defined it as n in the formula of scoring function for the e score. With the two scores, we will retain only those genes larger than certain thresholds as cancer-type specific targets.
The methodology for obtaining gene-drug records refers to the PanDrugs (http://www.Pandrugs.org) [7]. Gene-drug records indicate whether a gene is a potential target of a drug. The specific source of gene-drug records and the number of records in each source are shown in Table S1. For all the cancer-type specific gene targets, we match them with drugs based on gene-drug records.
Meanwhile, referring to SLKG [23], DrugScore was calculated as described in the methods to integrate the drug research and marketing (R&M) status and target gene type, and the scoring criteria for DrugScore are shown in Table 1. When matching drugs to specific cancers through gene-drug records, drugs with DrugScore less than 0.8 were filtered out.
Hence, we identified drugs for each cancer type based on the relationship between genes and cancer types, gene-drug records, as well as the R&M status of drugs. All the identified drugs have n > 0.5 to guarantee the wide-spectrum in a given cancer type; e > 1.3 means the high specificity and less toxicity associated with inhibiting the target gene; DrugScore ≥ 0.8 reflects the clinical safety. Among the 19 cancer types, the numbers of identified drugs and their corresponding targets are shown in Table S2. All information was uploaded onto the EGKG (Essential Gene Knowledge Graph) (http://guolab.whu.edu.cn/egkg/index.php), which presents a potential option for precision treatment of tumors based on the specificity of essential genes. For example, there are a total of 39 lung adenocarcinoma-specific targets; among them, seven have matching drugs. However, only five have matching drugs with DrugScore larger than or equal to 0.8, and the number of so-called lung adenocarcinoma-specific drugs is 32.
Among our chosen 1151 cancer-type specific targets, most have no matched drugs. We also listed these 1051 targets without bound drugs in 19 cancer types. For example, gene YPEL5 has a n value of 0.8 and an e score of 1.656 for lung adenocarcinoma. If, in the future, novel drugs could be developed targeting it, they would be wide-spectrum in lung adenocarcinoma while may cause less toxicity because of the high specificity of essentiality for this gene in the cancer type. Novel drugs targeting Gene SEC63 (n: 0.650, e: 2.401) are also expected to treat lung adenocarcinoma with high safety.
The workflow of screening cancer type-specific targets and identifying drug candidates by gene specificity scores, gene-drug interaction information and drug scores is shown in Fig. 2A and Fig. 2B, respectively. Consequently, among the finally identified targets and drugs, there are different numbers of their applied cancer types. For example, 368 identified targets are applicable to only one cancer type and all identified targets (both with or without matching drugs, Fig. 2C and Fig. 2D) are applied to less than 10 cancer types, indicating these targets’ specificity and agents specifically targeting them would cause lower toxicity. As shown in Fig. 2E, there are 69 identified drugs applied to only one cancer type and the cumulative number of identified drugs applicable to nine or fewer cancer types is 335, which accounts for 90.5% of all identified drugs. Although some drugs may bind multiple essential genes and we consider their congregate effects in our screening procedure, rare wide-spectrum anti-cancer drugs appear in our final list. In summary, it can be observed that the targets we identified have prominent cancer-type specificity, and the drugs identified also have visible cancer-type specificity, which is consistent with our theoretical speculation.
Table 2
The sources of gene-drug records, e score, DrugScore of seven chosen drugs to validate, and their affinities with interacting targets in lung adenocarcinoma.
Drugs | Target | Source | Source provider | n score | e score | DrugScore | AA (kcal/mol) |
Apremilast | CDK4 | TTD | TTD | 0.55 | 2.13 | 1 | -8.2 |
Colchicine | SCD | NCI | DGIdb | 0.8 | 1.32 | 1 | -6.2 |
Dexamethasone | CDK4 | NCI, CIViC | DGIdb | 0.55 | 2.13 | 1 | -9.2 |
Hydrochlorothiazide | WNK1 | PharmGKB | DGIdb | 0.5 | 1.56 | 1 | -6.1 |
Ibuprofen | VHL | NCI | DGIdb | 0.65 | 1.33 | 1 | -5.1 |
Progesterone | CDK4 | NCI | DGIdb | 0.55 | 2.13 | 1 | -9.9 |
Rosiglitazone | SCD | NCI | DGIdb | 0.8 | 1.32 | 1 | -8.9 |
Note: AA: average affinity (kcal/mol); ABS: absolute value |
Identified drugs were validated with in vitro pharmacological evidence
For lung adenocarcinoma, we selected the top six genes (CDK4, WNK1, HSP90B1, OGDH, VHL, SCD) with the highest e-value to match with interacting drugs (among the first six genes, there were no targeting drugs for HSP90B1 and OGDH). After removing the existing cancer applications (Table S3) (TTD) as well as the drugs that were not available, we finally selected seven drugs out of the top 32 drugs with the highest e value for cell experiments and listed their information in Table 2. In addition, CDK4 and SCD are basically complementarily essential in 11 lung adenocarcinoma cell lines (Table S4), which theoretically have the potential to be combined. In other words, the two genes have orthogonal essentiality profiles across lung cancer cell lines. So, their targeting drugs, progesterone and rosiglitazone, were combined to treat lung adenocarcinoma cell lines.
In fact, top-identified drugs often have been validated through clinical trial results on the ClinicalTrials.gov website. Of the top 32 reusable lung adenocarcinoma drugs, 23 are in clinical trials, with 21 having cancer clinical trials. Among the top 100 best reusable lung adenocarcinoma drugs, 86 were involved in clinical trials, with 76 having cancer clinical trials. These results suggest that a significant proportion of the top-ranked reusable drugs are registered in clinical trials, further demonstrating the reliability of our predictive results (Table S5).
In order to validate the virtual screening results of EGKG for lung adenocarcinoma, the top seven drugs were selected to be experimentally validated in vitro. Eleven lung adenocarcinoma cell lines are involved to test drug efficacy. In clinical practice, when drugs target cancer essential genes, they may cause a series of side effects and toxic reactions, such as lipid metabolism disorders and liver damage, because the target gene may play vital functions also for normal cells. To address the above problems, in this study, we also verified the effect of these seven drugs on ten normal cell lines from different parts of the human body.
In the anti-lung adenocarcinoma experiment, the average survival rate (Fig. 3A) and standard deviation of 11 lung adenocarcinoma cell lines and 10 normal cell lines, and the p-value of the Student’s t-test for two types of cells were shown in Table S6. The survival rate of each lung adenocarcinoma cell line and normal cell line is also shown (Figs. 3C, 3D, 3E, 3F, figs. S1 and S2). It can be observed that after treatment with seven drugs, the cell survival rate of lung cancer cell lines is significantly reduced compared with normal cell lines. More specifically, colchicine has significant anti-cancer efficacy with no significant toxic side effects on normal cells (Fig. 3C, and Fig. 3D). In addition, p value of the Student’s t-test of colchicine survival rate was 0.00066 (Table S6), indicating that its effect on cancer cell lines and normal cell lines was significantly different. We also used IC50 values as parameters for drug efficacy, and the results of IC50 values for these drugs are shown in Table S7. Colchicine can be seen to have lower IC50 values for 11 lung cancer cell lines compared to the other six drugs (Fig. 3B). From the above results, it can be seen that colchicine has an excellent killing effect on a variety of lung cancer cell lines and has good biocompatibility with normal cell lines.
Because their targeting genes CDK4 and SCD have orthogonal essentialities across cell lines of lung adenocarcinoma, it is important to test the combination effect of progesterone and rosiglitazone. Then, we verified this in vitro on 11 lung adenocarcinoma cell lines. Clinically, rosiglitazone is indicated for diabetes mellitus. In the combination drug experiment, the average inhibition rate of 11 lung adenocarcinoma cell lines could reach 68.77% at a concentration of 30 µg·mL− 1. When these two drugs were used alone, the average inhibition rates of progesterone and rosiglitazone against lung cancer cell lines were 53.64% and 47.85%, respectively (Table S6). It can be seen that the combination of drugs has a more significant anti-cancer effect than the use of drugs alone (Fig. 3E, fig. S1C, and fig. S2C). Subsequently, we further verified the effect of the combination of progesterone and rosiglitazone on the viability of a variety of normal cell lines (Fig. 3F), and the average cell survival rate was 94.31% (Table S6). In addition, the p-value of the Student’s t-test of the survival rate of combined treatment with progesterone and rosiglitazone was 1.26E-11 (Table S6), indicating that there was a significant difference in its effect on cancer cell lines and normal cell lines. The above results indicate that the drug combination has a stronger anti-cancer effect than the use alone, and the biocompatibility is superior.
Signal pathway associated with CDK4 and SCD and use of them as targets
CDK4 (Cyclin-dependent kinase 4) is a cell cycle-regulated protein kinase that binds to cyclin D and promotes cell entry into the S phase and initiates DNA synthesis during the G1 phase of the cell. In addition to regulating the cell cycle, CDK4 is also associated with PI3K/Akt, Ras/Raf/MAPK, and other signaling pathways, affecting cell proliferation and growth. Aberrant CDK4 activity is associated with tumorigenesis and progression [35]. Therefore, CDK4 is considered to be a potential target for cancer treatment, and inhibitors targeting CDK4 have become one of the research hotspots in tumor therapy.
SCD (stearoyl-CoA desaturase) is a fatty acid desaturase enzyme responsible for converting saturated fatty acids into monounsaturated fatty acids. This conversion is one of the key steps in the synthesis of triglycerides and phospholipids. The activity of SCD is controlled by a variety of factors, including nutritional status and insulin levels. Aberrant SCD activity may be associated with obesity, diabetes, and certain cancers [36–38]. As a result, SCD has emerged as a potential drug target, and inhibitors targeting it have been studied for the treatment of related diseases. The molecular docking sites of SCD mainly include the active center, substrate binding site, and other sites that may interact with regulatory factors. The substrate binding site is one of the important regions of molecular docking. Molecular docking and molecular dynamic simulation were carried out based on the original ligand binding sites on the crystal structures of CDK4 and SCD.
Molecular docking and molecular dynamic (MD) simulation of SCD with colchicine and rosiglitazone, CDK4 with progesterone
Following the above screening procedure, we obtained a total of seven drug candidates targeting lung adenocarcinoma cell lines. We obtained preliminary affinity calculations by molecular docking (Table 2). Previous studies [39] suggest that the mechanism by which gemcitabine improves lung cancer may be related to targeting the RRM1 protein (ribonucleotide reductase M1). However, the mechanism of action of our chosen drugs on lung cancer cell lines is unclear. Based on gene-drug interactions deposited in the public database, we picked out SCD for colchicine and rosiglitazone, and CDK4 for progesterone. SCD and CDK4, which are widely essential in lung cancer cell lines, were nine and seven in 11 lung cancer cell lines, respectively (Table S4). To explore the role of the three drugs with their potential targets in lung cancer cell lines, we performed molecular docking and MD simulation experiments sequentially. The results of molecular docking showed that the binding energy of colchicine to SCD protein was − 6.2 kcal/mol, the binding energy of progesterone to CDK4 protein was − 9.9 kcal/mol, and the binding energy of rosiglitazone to SCD protein was − 8.9 kcal/mol, indicating that the three drugs could bind well to their corresponding targets.
In addition, molecular docking models of progesterone with CDK4 (PDB ID: 7SJ3) and colchicine and rosiglitazone with SCD (PDB ID: 4ZYO), respectively, showed that colchicine, rosiglitazone, and progesterone docked to the binding site of the SCD original ligand and the binding site of the CDK4 original ligand, respectively (Fig. 4). This finding confirms that SCD may be a key target for colchicine to kill lung cancer cells and also confirms that CDK4 and SCD are possible targets for the combination of progesterone with rosiglitazone to kill lung cancer cells.
Root mean square deviation (RMSD), radius of gyration (Rg), total energy, and solvent accessible surface area (SASA) reflect the overall stability of the drug in the protein pocket and the conformational changes of the protein itself. The molecular docking results showed (Fig. 4) that colchicine, progesterone, and rosiglitazone could bind to the original ligand binding sites for SCD, CDK4, and SCD, respectively. In this regard, our MD results support the conclusion of molecular docking. In the MD simulation results (fig. S3, S4, and S5), the three drugs were moved in the corresponding protein cavities with little change in total energy (fig. S3C, S4C, and S5C). However, in the RMSD (fig. S3A, S4A, and S5A), Rg (fig. S3B, S4B, and S5B), and SASA results (fig. S3D, S4D, and S5D), the three drugs are moved in the corresponding proteins, and the structure of the proteins can remain in a stable state. In summary, the MD simulation results showed that the original ligand site stability of the three drugs corresponding to their targets was high.
In addition, the RMSD/Cluster index plot during the 50 ns simulation and the protein number density map both suggest that the protein conformations can remain stable after binding the three drugs at their original ligand binding sites (fig. S6). As a result, the probability of binding to the original ligand sites corresponding to their targets is very high, and the interpretation of this conclusion combines affinity and MD simulation results.
Most of the 370 identified drugs are repurposed as anti-cancer agents and have a restricted number of bound targets
We have identified 370 cancer-type specific essential genes bound drugs, which are distributed among 19 cancer types. Some of these drugs’ originally approved indications may be cancer therapy, while others may not. We extracted the information on treating indications from Drugbank and showed their distribution in Fig. 5A. As we can see, only 24.3% of drugs are developed for tumor therapy and over 70% are identified as anti-tumor because they may bind targets that are essential in specific cancer cells (types). These results could preliminarily demonstrate that our identifying strategy has the capacity of drug repurposing.
Ideal drugs are expected to have specific and precise targeting, while off-targeting would cause toxicity to normal cells or the human body [40]. This rule holds valid for our identified anti-cancer drugs. Since we assess the specificity of one drug by each of all potentially bound targets, if this drug could bind simultaneously with many essential genes, its specificity would be compromised. In Fig. 5B, we show the distribution of the number of recorded targets that are essential in any cancer cells for all 370 drugs. It is acceptable that about half of the 370 drugs have essential targets in one to three, indicating relatively high specificity of these drugs as anti-tumor agents.
We think that drug toxicity to normal cells is determined by the aggregative essentiality of all simultaneously bound targets in all cancer cells. In the inhibitory experiments, we observed a correlation (Pearson's correlation analysis, R = -0.224, p = 0.062, 77 inhibitory rates with 7 aggregative m values of all validated drugs) between the aggregative m values of potentially bound targets and the viability of the normal cells. We think that the observed correlation is not highly significant just because the tested drug number is as low as seven. Based on this result if a drug has m values less than 0.5, it could not hold high inhibitory efficiency on normal cells. Therefore, if we figure out the aggregative essentiality of all the potential targets for one drug, we can roughly estimate its maximum toxicity to normal cells. Hence, and we sum the essential number of all potential targets recorded in the public databases for all drugs. For example, prednisone has 10 binding targets and the m value of the aggregative 10 targets is 0.41 (132/325). Consequently, we got Fig. 5C and luckily found that about half of 370 drugs have maximum essentiality (m value) less than 0.5. In fact, a drug could not simultaneously bind all potential targets, and hence, much more drugs will be safe because their factual aggregative essentiality will be less than 0.5.