2.1 Data
2.1.1 Protein-protein interaction network
The Protein-Protein Interaction (PPI) network is derived from the Human Interactome dataset established by Albert-László Barabási and colleagues, which constitutes a comprehensive collection of 332,749 pairwise interacting bindings among 18,508 distinct human protein species[22]. We use the Python Networkx software package[23] to obtain the largest connected subgraph of the Protein-Protein Interaction (PPI) network as our protein network. Following preprocessing steps, the final PPI network encompasses 311,210 interaction pairs connected by 17,329 unique proteins.
2.1.2 Drug-protein associations
Drug-protein associations are derived from the Drug-Gene Interaction Database (DGIdb)[24], a compendium integrating drug-gene interactions from a variety of publications, databases, and online resources. The DGIdb[24] employs a hybrid methodology combining expert curation with text-mining techniques to standardize information across 41 distinct sources, ultimately assembling 54,591 drug-gene interactions involving 41,102 individual genes and 14,449 separate drugs. In our study, we retrieved the coding genes for target proteins associated with specific drugs from the DGIdb[24], followed by mapping these genes to their corresponding protein entities. Consequently, our constructed drug-protein interaction network details 7,713 distinct drug-protein engagements among a set of 1,161 drugs and 2,019 proteins.
2.1.3 Disease-protein associations
Disease-protein linkages were obtained from the DisGeNET[25] repository, a compendium that constitutes one of the most comprehensive assemblies of genes implicated in human pathologies. DisGeNET[25] amalgamates data from expert-curated databases with knowledge harvested by means of text-mining techniques applied to the scientific literature, thereby harmonizing and standardizing information on disease-associated genes and genetic variations from various origins. This resource covers the entire breadth of human diseases along with both physiological and pathological phenotypes. The present iteration of DisGeNET[25] enumerates over 24,000 unique diseases and traits, involves 17,000 genes, and catalogues 117,000 genomic variations. In our study, we identified protein-coding genes associated with specific diseases and proceeded to map these genes onto their corresponding protein entities. Consequently, the systematically constructed disease-protein interaction network encompasses 11,482 disease-protein associations across 634 diseases and 9,109 proteins.
2.1.4 Drug–disease associations
In this study, we compiled a dataset of 1948 known drug indications from the repoDB database[26]. Only small molecule drugs approved by the United States Food and Drug Administration (FDA) were considered, and each drug's generic name was standardized using Medical Subject Headings (MeSH) and the Unified Medical Language System (UMLS) vocabulary. A majority (75%) of the drugs were indicated for fewer than three disease conditions; conversely, only 4% of the drugs had therapeutic indications spanning over ten distinct diseases. Regarding disease coverage, 70% of the diseases had fewer than five associated drugs; 16% of the diseases were treated by a range of 5 to 10 drugs; and finally, 14% of the diseases had more than ten drugs available for treatment.
2.2 Drug-disease pairs score
2.2.1 Network-based proximity between drugs and diseases
A drug-disease proximity metric algorithm[9] has been developed that quantifies the relationship between a drug and disease proteins based on their interactions within a network context, thereby facilitating the estimation of a drug's therapeutic potential. The specific details of this measure are as follows:
The closeness between a drug and a disease was systematically quantified by employing distance metrics that consider the shortest path distances among drug targets and disease-associated proteins within a biological network. Given \(\:S\), which represents the ensemble of proteins implicated in the disease state, \(\:T\) denoting the collection of proteins targeted by the drug, and \(\:d(s,t)\) signifying the shortest path length between nodes \(\:s\) and \(\:t\) in the network, this methodology establishes a proximity metric to assess the relational proximity between drugs and diseases.
$$\:\:{d}_{c}\left(S,T\right)=\:\frac{1}{\left|\left|T\right|\right|}\sum\:_{t\in\:T}{min}_{s\in\:S}d\left(s,t\right)\:\:\:\left(1\right)$$
To evaluate the statistical significance of the proximity between a drug and a disease \(\:\left(T,S\right)\), this methodology created a reference distance distribution corresponding to the expected distances between two randomly selected groups of proteins matching the size and the degrees of the original disease proteins and drug targets in the network. The reference distance distribution was generated by calculating the proximity between these two randomly selected groups, a procedure repeated 1,000 times. The mean \(\:{{\mu\:}}_{d(S,T)}\) and \(\:{\sigma\:}_{d(S,T)}\) of the reference distribution were used to convert an observed distance to a normalized distance, defining the proximity measure:
$$\:z\left(S,T\right)=\:\frac{d(S,T)-{\mu\:}_{d(S,T)}}{{\sigma\:}_{d(S,T)}}\:\:\:\left(2\right)$$
This algorithm defines a drug as being proximal to a disease (suggesting a therapeutic effect) when the computed proximity metric \(\:z\le\:-0.15\).
2.2.2 Construct the training data set
We map drug targets and disease proteins to the human interaction group network, and then use the drug-disease proximity[9] measure to calculate the score based on the network relationship to quantify the therapeutic effect of drugs on diseases. If the drug-disease proximity score \(\:z\le\:-0.15\), the drug is defined as the proximal end of the disease, that is, the drug has a therapeutic effect on the disease. Otherwise regarded as no efficacy. In this study, 1785 unknown pairs were selected as negative samples through a random negative sampling strategy. And we utilized 1487 pairs of therapeutic drug-disease associations with scores less than − 0.15 out of 1948 pairs as true positive data. Simultaneously, we considered 461 pairs of drug-disease associations with scores greater than − 0.15 among 1785 pairs of negatively sampled drug-disease associations as true negative data. Then, the true positive data and the true negative data were merged as the training set (Fig. S1). Subsequently, in order to facilitate the training of the model in subsequent steps, this study will convert the obtained network proximity scores in accordance with the following formula:
$$\:{z}_{new}\:=\:-\text{l}\text{o}\text{g}\left({e}^{z}\right)\:\:\:\left(3\right)$$
After conversion, if \(\:{z}_{new}\ge\:0.065\), the drug is considered to be at the proximal end of the disease.
2.3 The overall framework of Meta-DEP
The Meta-DEP framework is presented in Fig. 1. Meta-DEP model discerns the pathways related to drug-disease interactions by traversing the shortest connections between drug targets and diseases within a complex, heterogeneous network composed of drugs, proteins, and diseases. To comprehend the overarching connectivity within this diverse graph structure, it initially utilizes Metapath2vec[27] to generate feature vector for the constituent nodes. Then, to capture the detailed mechanism of drug action patterns, the embeddings of the nodes along the shortest paths between a drug and a disease are fed into an RNN[28] module to model their sequential dependencies. Moreover, Meta-DEP incorporates a dual attention[29] mechanism consisting of path attention and node attention. This design allows for the intelligent aggregation of node embeddings along the identified pathways, assigning relative significance to each node's contribution to the overall pathway and variably weighting the relevance of distinct paths in influencing the final prediction outcome. Finally, to enhance the predictive capability of the model, the Meta-DEP architecture employs a multitask learning strategy involving both regression and classification tasks. The regression task predicts the proximity score between drugs and diseases, while the classification task performs binary prediction of drug efficacy.
2.3.1 Metapath related to drug disease effects
To accurately model the effects of a drugs, it is imperative to identify the paths conveying drug-disease interaction information that effectively represent the mechanism of action. We prioritize these shortest paths because the shorter the distance between a drug and its disease target, the higher the likelihood of a therapeutic effect[30–32]. Utilizing the shortest paths algorithm from the Python Networkx software package[23], we can systematically discern such pathways. Given a drug and a disease, the set of shortest paths connecting them within the context of a protein-protein interaction network comprises a series of \(\:PATH=\{{patℎ}_{1},{patℎ}_{2},\:\dots\:\:,\:{patℎ}_{L}\}\), where each \(\:{patℎ}_{i}=\{{node}_{m1}\to\:{node}_{m2}\to\:\dots\:\:{node}_{disease}\}\). Here, \(\:{node}_{m1}\) and \(\:{node}_{m2}\) denote intermediate nodes along a specific path, with \(\:L\) denoting the number of distinct shortest paths. These paths consist of interconnected nodes that bridge the gap between the starting drug node and the end disease node, thereby elucidating potential routes through which the drug may exert its effect on the disease.
2.3.2 Metapath2vec captures global connectivity information of drug-protein-disease heterogeneous networks.
In the heterogeneous network of drugs, proteins, and diseases, which incorporates multiple node types and edges representing their interactions, Metapath2vec employs the strategic definition of metapaths to model the complex relationships among these entities and thereby encapsulate the global connectivity information (TSNE[33] dimensionality reduction clustering visualization of feature representations obtained by different representation learning methods[27, 34–38], see Fig. S3). Within this drug-protein-disease heterogenous network, a specific metapath is defined as \(\:\left(\text{drug},\text{to},\text{protein}\right),\left(protein,toprotein\right),(protein,to,disease)\), which signifies a trajectory that conceptually connects a disease node through one or more intervening protein nodes back to another disease node; here, \(\:\text{drug}\) denotes a disease node, \(\:protein\) represents a protein node, and \(\:disease\) indicates a disease node. Based on this predefined metapath, Metapath2vec[27] generates sequences of nodes by employing random walks throughout the network topology. Subsequently, it utilizes the Skip-gram model during the training process to learn low-dimensional vector embeddings \(\:E\in\:{R}^{N\times\:d}\) for every individual node, where \(\:N\) refers to the total number of nodes in the network and \(\:d\) is the dimensionality of the embedding space. This approach allows Metapath2vec[27] to capture meaningful representations that encode the structural and relational properties inherent within the network structure.
2.3.3 Recurrent Neural Network
Given a drug–disease pair, the embeddings generated by the Metapath2vec[27] and the shortest path set \(\:PATH\), we employ RNN to encode both long-term and short-term dependencies in a Metapath. Such sequential dependencies are crucial to the model intelligibility. Given a node \(\:{node}_{m}\) and a path \(\:{patℎ}_{i}\), the input of the RNN recurrent neural network is the node embedding \(\:X\) generated by Metapath2vec[27]. After the network accepts the input \(\:{X}_{t}\) at time \(\:t\), the value of the hidden layer is \(\:{S}_{t}\), and the output value is \(\:{O}_{t}\). The value of \(\:{S}_{t}\) depends not only on \(\:{X}_{t}\), but also on \(\:{S}_{t-1}\). Specifically, it can be expressed by the following formula:
$$\:{O}_{t}=g(V\ast\:{S}_{t})\:\:\:\left(4\right)$$
$$\:{S}_{t}=\:f(\text{U}\ast\:{X}_{t}+W\ast\:{S}_{t-1})\:\:\:\left(5\right)$$
Where \(\:X\) is a vector, which represents the value of the input layer; \(\:S\) is a vector, which represents the value of the hidden layer. \(\:U\) is the weight matrix from the input layer to the hidden layer, \(\:O\) is also a vector, which represents the value of the output layer; \(\:V\) is the weight matrix from the hidden layer to the output layer. \(\:f\) is the nonlinear activation function of ReLU, and \(\:g\) denotes the Softmax function. The output of each node is aggregated to the attention module for each path to represent the entire path and the final prediction. Because the length of the shortest path is not equal, we use the filling method as follows. Assuming that the maximum length of a path is set to \(\:{l}_{max}\), for the path that shorter than \(\:{l}_{max}\), we use the padding value \(\:pad\) (such as 0) to fill the path, and the following processing ignores these padding positions to avoid affecting performance.
2.3.4 Node attention
In the context of a shortest Metapath \(\:{patℎ}_{p}\) connecting a drug-disease pair, for each node produced by the RNN (Recurrent Neural Network)[28], the output includes hidden state variables denoted as \(\:{O}_{p}\in\:{R}^{{l}_{max}\times\:d}\), \(\:{O}_{p}=\:\left\{{o}_{1},{o}_{2},\dots\:,{o}_{pad},{o}_{pad}\right\}\) and \(\:{o}_{pad}\) are filled hidden states. Initially, all the hidden values at these filled positions are transformed to negative infinity. Subsequently, a linear layer is applied, followed by a Softmax activation function, to combine and condense the embeddings into a single value that signifies the importance or weight of the node in the Metapath context.
$$\:{\varOmega\:}_{p}=\:{O}_{p}{W}_{n}\:\:\:\left(6\right)$$
Where \(\:{W}_{n}\in\:{R}^{d\times\:1}\) is a learnable parameter, \(\:{\varOmega\:}_{p}\in\:{R}^{{l}_{max}\times\:1}\) represents the weight of each node in the path \(\:{patℎ}_{p}\), where \(\:{\varOmega\:}_{p}=\:\left\{{w}_{1},{w}_{2},\dots\:,{w}_{max}\right\}\). For a node \(\:j\) in \(\:{patℎ}_{p}\), its weight is calculated as follows:
$$\:{w}_{j}^{{\prime\:}}=\:\frac{{e}^{{w}_{j}}}{{\sum\:}_{k=1}^{{l}_{max}}{e}^{w}k}\:\:\:\left(7\right)$$
Then we aggregate the hidden states of these nodes weighted by \(\:{w}_{i}^{{\prime\:}}\) to get the embedding of \(\:{patℎ}_{p}\):
$$\:{e}_{{patℎ}_{p}}=\:{\sum\:}_{k=1}^{{l}_{max}}{w}_{k}^{{\prime\:}}{o}_{k}\:\:\:\left(8\right)$$
2.3.5 Path attention
Following the aggregation step, for a given drug-disease pair, the embedding representing the shortest Metapath is obtained: \(\:{E}_{PATH}=\:\left\{{e}_{{patℎ}_{1}},{e}_{{patℎ}_{2}},\dots\:,{e}_{{patℎ}_{L}}\right\}\). The path attention is similar to the node attention:
$$\:{\varOmega\:}_{PATH}=\:{e}_{PATH}{W}_{p}\:\:\:\left(9\right)$$
$$\:{w}_{{patℎ}_{p}}^{{\prime\:}}=\:\frac{{e}^{{w}_{{patℎ}_{k}}}}{{\sum\:}_{k=1}^{L}{e}^{{w}_{{patℎ}_{k}}}}\:\:\:\left(10\right)$$
$$\:{y}_{s}=\:f\left({\sum\:}_{k=1}^{L}{w}_{{patℎ}_{k}}^{{\prime\:}}{e}_{{patℎ}_{k}}\right)\:\:\:\left(11\right)$$
$$\:{y}^{{\prime\:}}=\:{\sigma\:}_{p}\left({\sum\:}_{k=1}^{L}{w}_{{patℎ}_{k}}^{{\prime\:}}{e}_{{patℎ}_{k}}\right)\:\:\:\left(12\right)$$
Where \(\:{\sigma\:}_{p}\) is the softmax function, \(\:f\) is the nonlinear activation layer. The final predicted outcome \(\:{y}^{{\prime\:}}\), denoted as, represents the probability that the drug will be effective in treating the disease. Additionally, the score \(\:{y}_{s}\), quantifies the closeness or proximity between drugs and diseases, serving as an indicator of the drug's therapeutic impact on the disease.
2.3.6 Objective function
Within the scope of this research, the Meta-DEP model has been designed to be concurrently trained on two distinct learning objectives: regression and classification tasks. Regarding the regression component, the model employs the Mean Squared Error (MSE) as its loss function, an extensively adopted metric for gauging the discrepancy between estimated outcomes and actual targets. The MSE is mathematically defined as the mean of the sum of squared discrepancies between each predicted value and its true counterpart, with its functional form expressed thusly:
$$\:{L}_{MSE}=\:\frac{\sum\:_{i=1}^{n}{\left({y}_{s}-y\right)}^{2}}{n}\:\:\:\left(13\right)$$
And the categorical prediction task is approached as a two-class classification problem, where the chosen loss function for optimization is the CrossEntropyLoss.
$$\:{L}_{CE}=\:\sum\:_{i=1}^{n}ylog\left({y}^{{\prime\:}}\right)\:\:\:\left(14\right)$$
In our multi-objective training process, we aim to minimize the combination of two losses with \(\:{l}_{2}\) regularization:
$$\:L=\:{L}_{MSE}\:+\:\alpha\:{L}_{CE}+\lambda\:{‖\theta\:‖}_{2}^{2}\:\:\:\:\left(15\right)$$
Where \(\:\alpha\:\) is a hyperparameter that weighs the weight of classification tasks, \(\:\theta\:\) is the set of parameters to be learned in Meta-DEP, \(\:\lambda\:\) is the \(\:{l}_{2}\) regularizer to prevent over-fitting, \(\:{‖\theta\:‖}_{2}^{2}\) is the square of the second norm of \(\:\theta\:\).
2.3.7 Performance evaluation and experiment setup
Meta-DEP training is a dual task training of regression and classification (Regression and classification dual task ablation experiment, see Fig. S2A), given a drug disease pair, we input all the shortest paths between the drug and the disease into the Meta-DEP, and Meta-DEP will predict the probability that the drug will have a therapeutic effect on the disease and score the magnitude of its pharmacodynamic effect. In this study, Pearson correlation coefficient was used as a performance indicator to evaluate and adjust the optimization feature representation, model framework, and hyperparameter settings through five-fold cross-validation. First, keep the model architecture unchanged, set the hyperparameters in a moderate range, and adjust the feature representation to make the performance index reach a temporary optimal state (The influence of feature representation on model performance, see Fig. S2B). Then, the feature representation is fixed, the hyperparameters are still in the moderate range, and the model architecture is adjusted to find the temporary optimal performance index (The influence of model architecture on model performance, see Fig. S2C). After determining the feature representation and model architecture, the combination of hyperparameters of the model is optimized to achieve the best state.
2.9 Apoptosis was detected by TUNEL detection
AC16 cells were seeded into 12-well plates. When the confluence reached 70%, AC16 cells were treated with 100µmol / L hydrogen peroxide and 10µmol / L bifendate for 24 h. After incubation, cells were washed with PBS once, fixed with 4% paraformaldehyde for 30 minutes, and washed with PBS twice after fixation. Add immunostaining strong penetrant (P0260, Beyotime Biotechnology, China) and incubate at room temperature for 5 minutes; after incubation, the cells were washed twice with PBS, and 50µl TUNEL staining solution (C1088, Beyotime Biotechnology, China) was added to the sample, and incubated at 37°C for 60 minutes in the dark; after the incubation, the cells were washed three times with PBS. DAPI staining solution (C1005, Beyotime Biotechnology, China) was incubated for 20 minutes, and washed three times with PBS after incubation. 500ul PBS was added to each well, and the fluorescence intensity was detected by high-content microscope (ImageXpress MicroConfocal, MD, USA).