3.1. Docking-based virtual screening
The list of all 2266 compounds is presented in Supplementary Table 1. Among 2266 phytochemical structures from the PhytoHub database, 94 compounds have high affinity for the binding cavity with docking scores < -10 kcal/mol, 2051 compounds have docking scores from − 5 kcal/mol to -10 kcal/mol, 113 compounds have docking scores from 0 kcal/mol to -5 kcal/mol, 8 compounds are considered unable to bind to PTPN2 (with docking score being a positive number). Based on the structures of 94 compounds with docking scores < -10 kcal/mol, we have identified 9 phytochemical structural frameworks for potential PTPN2 inhibitors, including flavonoids, stilbene, alkaloids, carotenoids, coumarin, ellagitannin, diterpenoids, curcuminoids, and phenolic acid (Table 1, Fig. 1).
Table 1
94 phytochemical compounds with potential PTPN2 inhibition from PhytoHub database
Structure | PhytoHub Identifier | Docking score (kcal/mol) | MM/GBSA (kcal/mol) | pred-IC50 (µM) |
1 | PHUB002821 | -14.734 | -29.16 ± 4.96 | 5.084 |
2 | PHUB002824 | -14.144 | -25.45 ± 9.98 | 4.843 |
3 | PHUB002837 | -13.327 | -16.42 ± 4.92 | 4.843 |
4 | PHUB002830 | -13.095 | -33.76 ± 3.52 | 4.843 |
5 | PHUB002823 | -12.551 | -35.40 ± 4.20 | 5.492 |
6 | PHUB002833 | -12.494 | -18.79 ± 6.09 | 4.843 |
7 | PHUB002826 | -12.167 | -25.76 ± 6.20 | 5.521 |
8 | PHUB002832 | -11.903 | -29.62 ± 3.60 | 4.843 |
9 | PHUB002835 | -11.752 | - | 5.521 |
10 | PHUB002828 | -11.581 | - | 5.084 |
11 | PHUB002831 | -11.153 | - | 4.843 |
12 | PHUB002827 | -10.858 | - | 5.743 |
13 | PHUB002141 | -10.796 | - | 5.739 |
14 | PHUB002834 | -10.639 | - | 4.843 |
15 | PHUB002143 | -10.523 | - | 5.309 |
16 | PHUB002822 | -10.495 | - | 4.341 |
17 | PHUB002825 | -10.255 | - | 4.843 |
18 | PHUB002820 | -10.177 | - | 5.084 |
19 | PHUB002147 | -10.291 | - | 5.388 |
20 | PHUB002774 | -10.211 | - | 5.908 |
21 | PHUB001051 | -10.010 | - | 5.953 |
22 | PHUB001874 | -11.319 | - | 5.370 |
23 | PHUB001316 | -10.035 | - | 5.565 |
24 | PHUB002987 | -10.366 | - | 5.565 |
25 | PHUB002569 | -10.212 | - | 10.956 |
26 | PHUB000218 | -10.967 | - | 8.758 |
27 | PHUB000217 | -10.260 | - | 8.758 |
28 | PHUB000219 | -10.256 | - | 10.970 |
29 | PHUB002882 | -11.025 | - | 6.639 |
30 | PHUB001600 | -10.346 | - | 5.252 |
31 | PHUB001640 | -10.340 | - | 6.639 |
32 | PHUB001602 | -10.094 | - | 12.116 |
33 | PHUB001488 | -11.890 | -36.28 ± 4.81 | 6.056 |
34 | PHUB001486 | -11.784 | -25.60 ± 6.56 | 6.056 |
35 | PHUB001412 | -10.696 | - | 6.056 |
36 | PHUB001487 | -10.625 | - | 6.056 |
37 | PHUB001494 | -10.389 | - | 6.325 |
38 | PHUB001495 | -11.163 | - | 5.050 |
39 | PHUB001414 | -10.083 | - | 5.460 |
40 | PHUB001557 | -10.216 | - | 7.242 |
41 | PHUB000422 | -10.381 | - | 4.872 |
42 | PHUB000423 | -10.376 | - | 6.103 |
43 | PHUB000411 | -10.273 | - | 4.089 |
44 | PHUB000425 | -11.579 | -8.52 ± 8.01 | 5.094 |
45 | PHUB000850 | -10.006 | - | 5.129 |
46 | PHUB001954 | -11.336 | -27.78 ± 2.88 | 5.184 |
47 | PHUB001912 | -10.418 | - | 12.237 |
48 | PHUB001952 | -10.44 | - | 5.363 |
49 | PHUB001951 | -10.145 | - | 5.303 |
50 | PHUB001958 | -10.083 | - | 5.339 |
51 | PHUB000353 | -10.782 | - | 9.873 |
52 | PHUB000355 | -10.322 | - | 14.834 |
53 | PHUB000466 | -10.303 | - | 9.873 |
54 | PHUB000463 | -10.111 | - | 14.834 |
55 | PHUB000260 | -10.501 | - | 8.046 |
56 | PHUB000254 | -10.213 | - | 5.193 |
57 | PHUB001439 | -10.418 | - | 5.524 |
58 | PHUB002001 | -11.129 | -22.03 ± 4.29 | 5.363 |
59 | PHUB001975 | -10.200 | - | 3.653 |
60 | PHUB002053 | -11.366 | -19.90 ± 4.92 | 7.393 |
61 | PHUB002051 | -10.813 | - | 6.915 |
62 | PHUB002055 | -10.674 | - | 15.252 |
63 | PHUB002054 | -10.204 | - | 11.290 |
64 | PHUB002795 | -10.795 | - | 6.229 |
65 | PHUB002796 | -10.658 | - | 6.342 |
66 | PHUB002797 | -10.436 | - | 7.242 |
67 | PHUB002794 | -10.394 | - | 5.549 |
68 | PHUB002752 | -10.887 | - | 5.566 |
69 | PHUB002759 | -10.805 | - | 5.461 |
70 | PHUB002725 | -10.529 | - | 4.468 |
71 | PHUB002758 | -10.487 | - | 5.888 |
72 | PHUB002751 | -10.309 | - | 5.888 |
73 | PHUB002762 | -10.264 | - | 5.888 |
74 | PHUB002744 | -10.087 | - | 5.923 |
75 | PHUB002719 | -10.051 | - | 3.263 |
76 | PHUB002761 | -10.041 | - | 5.888 |
77 | PHUB002748 | -10.024 | - | 5.923 |
78 | PHUB002757 | -10.017 | - | 5.958 |
79 | PHUB002477 | -10.036 | - | 11.059 |
80 | PHUB000635 | -11.073 | -17.28 ± 5.74 | 7.688 |
81 | PHUB000522 | -10.435 | - | 7.025 |
82 | PHUB000521 | -10.538 | - | 7.556 |
83 | PHUB000519 | -10.219 | - | 7.598 |
84 | PHUB000641 | -10.592 | - | 7.688 |
85 | PHUB002786 | -10.556 | - | 6.512 |
86 | PHUB002779 | -10.112 | - | 7.947 |
87 | PHUB001991 | -10.840 | - | 6.511 |
88 | PHUB001989 | -10.256 | - | 6.555 |
89 | PHUB002801 | -10.116 | - | 6.343 |
90 | PHUB002790 | -10.267 | - | 6.424 |
91 | PHUB001751 | -10.018 | - | 11.126 |
92 | PHUB002618 | -10.143 | - | 7.810 |
93 | PHUB002885 | -10.472 | - | 8.342 |
94 | PHUB002863 | -10.408 | - | 9.067 |
The first group, consisting of 32 flavonoids and flavonoid metabolites has the top average docking score. Among them, the sulfate and glucuronide metabolites of hesperetin are the structures with the strongest affinity for the PTPN2 binding cavity. Figure 2 shows the conformation in the binding cavity and 2D interactions of some flavonoid structures including PHUB002821 (1), PHUB002824 (2), PHUB002837 (3), and PHUB002830 (4). All structures form from multiple interactions with PTPN2 residues at the active site such as hydrogen bonds with Tyr48, Asp50, Lys122, Asp182, Cys216, Ser217, Ala218, Gly219, Ile220, Gly221, Arg222, Met256, Gly257, and Gln260. Among them, Tyr48, Gly219, and Arg222 are residues that form hydrogen bonds with all six hesperetin derivatives. Meanwhile, ring A and ring B of the flavonoid framework also form many hydrophobic interactions with Tyr48, Asp50, Val51, Phe183, Ala218, Ile220, Met256.
The second group includes eight stilbenes, most of which are trans-resveratrol derivatives. Their conformations all fit within the catalytic site of PTPN2, of which the two best compounds are PHUB001488 (33) and PHUB001486 (34). The sulfate groups of the A ring form multiple hydrogen bonds with residues Ser217 to Arg222, Tyr48, Lys122, Asp182, and Ala218 of the protein. Ring A is sandwiched between Ala218 and Tyr48 by Pi-Sigma and Pi-Pi Stacked interactions, while ring B interacts hydrophobically with Asp50, Val51, and Met256 (Fig. 3).
The seven alkaloids and three amines in group 3 obtain the docking results from − 10.006 to -11.579 kcal/mol. Among them, PHUB000425 (44) and PHUB001954 (46) are the two compounds with the highest affinity for PTPN2. PHUB000425 interacts with PTPN2 by many hydrogen bonds with active site residues such as Asp50, Lys122, Gly221, Arg222, Met256. The indole skeleton interacts with sulfur-X by Met256 and the amine group in the pyridine ring interacts pi-cation with Phe183. Meanwhile, PHUB001954 binds to PTPN2 mainly by hydrogen bonds with Asp50, Ser217, Ala218, Gly219, Arg222 (Fig. 4).
In addition, some other compounds also show very strong affinity for PTPN2 such as PHUB002001 (58), PHUB002053 (60), and PHUB000635 (80) (Fig. 5).
3.2. Molecular dynamic simulation (MDs)
To evaluate the stability of the protein-ligand complex, MDs studies were performed for the complexes of PHUB002821, PHUB002824, PHUB002837, PHUB002830, PHUB002823, PHUB002833, PHUB002826, PHUB002832, PHUB001488, PHUB001486, PHUB000425, PHUB001954, PHUB002001, PHUB002053, PHUB000635 with PTPN2, complex of ABBV-CLS-484 with PTPN2, and protein without ligand (apo-protein) for comparison over 100 ns.
The stability of apo-proteins and proteins in complexes during dynamic simulations was evaluated through the RMSD value of the carbon backbone. The RMSD value of the apo-protein is around 0.17 nm while the RMSD value of the proteins in the complexes is around 0.15 nm (Fig. 5). This demonstrated that the PTPN2 structure in complex with ligands and ABBV-CLS-484 are more stable than the apo-protein during 100 ns dynamic simulation.
The mobility of protein amino acids in complexes and of apo-proteins was evaluated by the value of RMSF. During the 100 ns dynamic simulation, the amino acids in the protein pocket such as Tyr48, Asp50, Val51, Lys122, Pro181, Asp182, Phe183, Gly184, Cys216, Ser217, Ala218, Gly219, Ile220, Gly221, Arg222, Met256, Gly257, Gln260, and Gln264 of the complex with the ligands (except for PHUB000425) have RMSF values being always lower than the RMSF value of the apo-protein (Fig. 5), suggesting that interactions between these ligands and proteins may stabilize the protein structure and residues.
Binding free energy (ΔGbind) between ligands PHUB002821, PHUB002824, PHUB002837, PHUB002830, PHUB002823, PHUB002833, PHUB002826, PHUB002832, PHUB001488, PHUB001486, PHUB000425, PHUB001954, PHUB002001, PHUB002053, PHUB000635 and ABBV-CLS-484 with PTPN2 during 100 ns dynamic simulator is calculated according to MM/GBSA method [33] (Table 1). The free energy of binding to PTPN2 of ABBV-CLS-484 is -16.31 ± 6.42 kcal/mol while hesperetin metabolites (PHUB002821, PHUB002824, PHUB002837, PHUB002830, PHUB002823, PHUB002833, PHUB002826, and PHUB002832) have energy binding from − 16.42 ± 4.92 kcal/mol (PHUB002837) to -35.40 ± 4.20 kcal/mol (PHUB002823), two metabolites PHUB001488 and PHUB001486 of trans-resveratrol have binding energies of -36.28 ± 4.81 kcal/mol and − 25.6 ± 6.56 kcal/mol, respectively. The free energy of binding to PTPN2 of PHUB001954 is -27.78 ± 2.88 kcal/mol, PHUB002001 is -22.03 ± 4.29 kcal/mol, PHUB002053 is -19.90 ± 4.92 kcal/mol, PHUB000635 is -17.28 ± 5.74 kcal/mol. The results show that the phytochemical compounds all have a strong affinity for PTPN2 and are stronger than ABBV-CLS-484. Only PHUB000425 shows weaker binding to PTPN2 than co-ligand ABBV-CLS-484 with a binding free energy of -8.52 ± 8.01 kcal/mol. This suggests that these agents may be potent PTPN2 inhibitors.
3.3. Prediction of PTPN2 inhibitory activity using machine learning
Hyperparameter tuning was performed on 540 compounds of the training set using 5-fold cross-validation. The hyperparameters of the optimized random forest algorithm include number of trees in Random Forest (n_estimators), minimum number of samples required to split a node (min_samples_split), minimum decrease in impurity required for split to happen (min_impurity_decrease), number of features to consider at every split (max_feature), maximum number of levels in tree (max_depth), criterion to split on (criterion), and method of selecting samples for training each tree (bootstrap). The hyperparameter tuning results are as follows: n_estimators = 200, min_samples_split = 9, min_impurity_decrease = 0.0, max_features = 'sqrt', max_depth = 10, criterion = 'absolute_error', bootstrap = False are saved as best_params_. Then, we trained a random forest model for the regression task to predict PTPN2 inhibitory activity using the entire training set with best_params_. The final model was evaluated using the external test set as follows: R2 = 0.81; MAE = 0.30; RMSE = 0.40. The correlation between experimental versus predicted pIC50 values is shown in Fig. 6.
The model was then used to predict the PTPN2 inhibitory activity of the compounds screened through the molecular docking process; the predicted IC50 values of phytochemical compounds are presented in Table 1. Phytochemical compounds are predicted to have PTPN2 inhibitory activity with IC50 values ranging from 15.25 µM to 3.26 µM. Although the predicted activity shows that the compound structures have a much weaker ability to inhibit PTPN2 than the co-ligand (ABBV-CLS-484), the majority of the structures show a predicted IC50 of less than 10 µM suggests that these structures may be of interest in the search for PTPN2 inhibitors for anti-tumor immunotherapy. In addition, structures that show the ability to bind to PTPN2 and are stable during molecular dynamics simulations have predicted IC50 values ranging from 7.69 µM to 4.48 µM, indicating that these are very potential structures.
3.4. In silico ADME properties
ADME properties of the lead compounds were analyzed using the SwissADME server (Supplementary Table 2). All hesperetin and trans-resveratrol derivatives do not meet the criteria of the SwissADME server. However, these compounds are known metabolites of hesperetin and trans-resveratrol, therefore, bioavailability score assessment may not be of concern. Only PHUB001954 and PHUB002001 satisfy all druglikeness criteria, however PHUB001954 has been given a PAINS warning due to its indole_3yl_alk structure. For pharmacokinetic assessment, with the exception of PHUB001486 and PHUB002001 which are predicted to inhibit CYP1A2, the agents do not show a risk of cytochrome P450 inhibition.