Gene set analysis
Distribution of KDTN and KDSN in the DTSG set. First, the distributions of KDT and KDS for the DTSG set were analyzed. Each distribution and the three-dimensional distribution for DTSG set were identified to elucidate the relationships between KDT and KDS. As shown in Figure 2, KDT and KDS had scale-free and power-law distributions.
Construction of the drug–gene network. The distributions of KDT and KDS for the DTSG set were compared with the D1 and D3 sets to determine whether the DTSG set had representativeness for the D1 and D3 set and to show the tendency of the DTSG set before a drug–gene network was constructed. Results showed that the distributions of KDT and KDS for the DTSG set presented representativeness for D1 and D3 (Fig. 3). Once representativeness was confirmed, an integrative drug–gene interaction network for the DTSG set and the drugs was visualized. However, visualizing all the target genes and sensitive genes made it difficult to intuitively observe the characteristics of the network. Therefore, only the target genes and sensitive genes for about 5% of the 371 drugs included in the DTSG set were analyzed in alphabetical order.
To construct a drug-target subnetwork, 16 drugs binding to 84 genes were extracted. To construct a drug-signature subnetwork, 13 drugs that affect the expression level of 103 genes were extracted. The drug-target subnetwork and the drug-signature subnetwork were merged as a drug–gene subnetwork containing 189 nodes (19 drug and 170 genes) and 266 edges. As shown in Figure 4 (in which the black-colored nodes represent drugs, the orange-colored nodes represent sensitive genes, and the green-colored nodes represent target genes), PTGER2 (purple-colored node) was the only gene included as both a target gene and a sensitive gene for 19 drugs in the subnetwork. Except for PTGER2, the relationships between the target genes and sensitive genes were exclusive and independent (Fig. 4).
Enrichment analysis
GO/KEGG pathway analysis for the DSG, DTSG, and DTG sets. Through GO analysis, the cellular components, biological processes, and molecular functions associated with each gene set were investigated. In the DTG set, 257 genes were associated with 13 cellular components, 28 biological processes, 20 molecular functions, and 11 KEGG pathways (FDR-adjusted p-value <0.05). In the DSG set, 8,770 genes were associated with 41 cellular components, 63 biological processes, 21 molecular functions, and 8 KEGG pathways (FDR-adjusted p-value <0.05). In the DTSG set, 463 genes were associated with 24 cellular components, 95 biological processes, 35 molecular functions, and 28 KEGG pathways (FDR-adjusted p-value <0.05).
GO analysis revealed that most proteins synthesized by drug-sensitive genes were located in inner cellular zones such as the nuclear chromosome, nuclear pore, and nucleosome rather than in outer cellar zones such as the cell wall. The proteins synthesized by drug-sensitive genes were shown to be involved in gene transcription, gene expression regulation, and DNA replication, and to function in DNA, RNA, and protein binding. In contrast, GO analysis showed that most proteins synthesized by drug-target genes were located in outer cellular zones and played roles, for example, in receptor complexes, voltage-gated channel complexes, synapses, and cell junctions. Most proteins synthesized by drug-target genes were found to be involved in the catabolic process of cGMP and cAMP and in transmission and transport processes; they played roles in ion channels and enzyme activity.
Through KEGG pathway analysis (Fig. 5), it was revealed that drug-sensitive genes were involved in central dogma-related pathways such as the spliceosome, transcriptional regulation, and protein-processing progress. However, drug-target genes were involved in neural signaling pathways such as addiction to nitrogen, nicotine, and morphine, serotonergic synapses, and retrograde endocannabinoid signaling.
As shown in Figure 5, only 6 terms (GO and KEGG) from 114 terms associated with the three gene sets (DTG, DSG, and DTSG) overlapped. Two cellular component terms overlapped in the DTG and DTSG sets: postsynaptic membrane and voltage-gated calcium channel complex. Of the molecular function terms, 3′,5′-cyclic-nucleotide phosphodiesterase activity and 3′,5′-cyclic-AMP phosphodiesterase activity overlapped in the DTG and DTSG sets. From the biological processes terms, only one term overlapped between the DTG and DTSG sets: cAMP catabolic process. Similarly, one KEGG pathway term, morphine addiction pathway, overlapped in the DTSG and DTG sets. These results suggest that drug-target genes and drug-sensitive genes are exclusive and independent in terms of their cellular locations, genetic functions, processes, and pathways.
Transcription factor (TF) analysis
In gene set analysis, it is important not only to characterize the gene set but also to identify the number and type of TFs as this can help to improve understanding of gene regulatory networks. Thus, we examined whether there were differences in the number of TFs involved in each gene set. We used X2Kweb (20) as a TF analysis tool to examine the binding frequency and types of TFs for each gene set. Results showed that TFs bound on DNA strands on average six times per gene in the DSG set, which was three-fold greater than the TF binding in the DTG and DTSG sets (both two times per gene on average, Fig.6a). In total, 737 TFs were associated with the three gene sets. Of these, 30 TFs overlapped between two or more gene sets as shown in Figure 6b. Therefore, the TFs involved in each gene set differed. Of the 30 overlapping TFs, 9 were derived from essential genes in humans (<10% of all human genes are considered essential) (21).
Core gene analysis of DTN and DSN
Characterization of the core genes in KDT and KDS for the DTG and DSG sets was examined by applying a peeling algorithm. Each network that included >50 genes was analyzed according to m-core. Thus, m-coreDSN had 1 to 36 core gene groups whereas m-coreDTN had 1–17 core gene groups. Figure 7 indicates the gene ontological characterization in each network according to m-core. In cellular component analysis, the core genes of each DSN and DTN showed exclusive distributions. Proteins synthesized by core genes of the DSN were located in the cytosol, cytoplasm, nuclear chromosome, and nucleosome. Conversely, proteins synthesized by the core genes of the DTN were located in the synapses, dendrites, plasma membrane, and axon terminus. In molecular function analysis, the core genes of each DTN and DSN were also exclusively distributed. Proteins synthesized by the core genes of the DSN functioned during cell–cell adhesion and in protein heterodimerization activity by binding proteins and cadherin. The proteins synthesized by the core genes of the DTN functioned in ion binding, hormone binding, chemical receptor activity, and enzyme activity functions (Supplementary Fig. 1A). In biological processes analysis, the core genes of the DSN and DTN networks also showed exclusive distributions. Proteins synthesized by the core genes of the DSN were involved in the PERK-mediated unfolded protein response, response to hypoxia, positive regulation of angiogenesis, and regulation of cell death. In contrast, proteins synthesized by the core genes of the DTN were involved in the response to drugs, dopamine transport, receptor signaling pathways, and monoterpenoid metabolic processes (Supplementary Fig. 1B).
In summary, characterizations of genes in the DSG, DTG, and DTSG sets in terms of GO and KEGG pathways could clearly be distinguished. In addition, the numbers and types of TFs differed among the DSG and DTG sets with different binding frequencies of the TFs on DNA strands. Finally, m-core analysis of the core genes in each DSN and DTN exhibited reciprocal balanced characteristics.