SARS-CoV–2-Human PPI dataset is denoted as HxC where the rows repre- sent the human proteins and the columns represent the viral proteins. In this case, we get the clustered itemsets of SARS-CoV–2 proteins that have similar interaction pattern for a subset of objects of human proteins. We have also used the transposed form of the SARS-CoV–2-Human PPI dataset which is represented by the CxH. The aim of this transposed data is to obtain the clusters along with the human proteins for a subset of objects of SARS-CoV–2 proteins. Here, each bicluster specifies that with a subset of human proteins, a subset of SARS-CoV–2 proteins interacts and vice-versa.
Table 1 Statistics for the generated outputs
Dataset
|
# Biclusters
|
# Rules
|
Rules > 0.7
Confidence
|
Non-redundant
Rules
|
# Predicted
Interactions
|
HxC OLD
|
10
|
1
|
0
|
0
|
0
|
CxH OLD
|
9
|
1
|
0
|
0
|
0
|
HxC
|
34
|
30
|
9
|
5
|
48
|
CxH
|
34
|
20
|
11
|
7
|
8
|
We use both HxC and CxH as input to the bimax algorithm and summery of generated biclusters can be found in Table 1. It has listed the interac- tion information for two sets of both of the datasets, i.e before (HxC OLD, CxH OLD) and after(HxC, CxH) applying the rule of transition on the orig- inal known protein protein interaction information. The improved results for the updated datasets can be observed from Table 1.
4.1 Interactions prediction from biclusters
Interactions can be predicted from the obtained list of biclusters. Let us take an example of HxC dataset and consider two obtained biclusters in the form of <IS_CP1 : IS_HP1> and <IS_CP2 : IS_HP2> where <IS_CPi: IS_HPi> represents itemsets denoting ith bicluster consisting of SARS-CoV–2 protein and human protein. To predict new interactions from the biclusters itself, we need to take the advantage of set-difference operator. Similarly, Following this way, we obtain a list of predicted interactions, and Table 2 is showing two sets of examples of such a predicted list.
Table 2 Predicted interactions from biclusters
4.2 Prediction by generating rules from the biclusters
This section establishes the approach that we have already mentioned in section 2. CxH dataset gives rules between human proteins and HxC dataset gives rules between SARS-CoV–2 proteins which are true for corresponding SARS-CoV–2 and human protein, respectively. The obtained rules are filtered out in two steps. First, we filter the rules based upon the support and confidence values. Being a sparse dataset, experimentally we fix the support values to a lower threshold as follows. For the CxH dataset, the minimum support count is kept to 4 with a 70% minimum confidence level. Similarly, for the HxC dataset, the minimum support count is 11 with the same minimum confidence level as 70%. Next, the redundancy is eliminated taking the most general rules. Then the interaction prediction is performed. From the final results, the HxC dataset predicts 48 unique and novel interactions. Similarly, from CxH dataset, 8 unique and novel interactions are predicted. To keep the prediction unambiguous, we discard the interactions with a lower confidence level in case of the similar predicted interactions. The predicted interactions obtained from both the datasets are merged and a PPI network is drawn as shown in Figure 2. We find 3 common interactions and thus finally 53 unique novel interactions are obtained from the dataset. The confidence level for each predicted interaction lies within 85% to 90% in most of the cases followed by 90% to 95% interval. For this, we have shown a histogram in Figure 2 along with the interaction network. It can be seen from the figure that SARS-CoV–2 proteins ORF8, NSP12, and E act as network hub. All SARS-CoV–2 proteins are highlighted in yellow color whereas the human proteins are in pink colour. The edge width representing an interaction is getting wider along with the increasing confi- dence level of the interaction. Edge level is showing the value of confidence level for each interaction.
4.3 Gene Ontology based justification of the predicted interactions
To justify the prediction, we have tried to find relevance with the help of biological interpretation. For this, we make use of DAVID (http://david.abcc.ncifcrf.gov), a freely available online bioinformatics repository that provides functional an- notations for a large set of genes. Among the multiple information extracted from this tool, we opt for gene ontology (GO) based study and KEGG Pathway. From all the three domains (Biological Process, Cellular Component, and Molecular Function) covered by GO, we find the GO terms. Moreover, we ex- tract the non-redundant informative GO terms, based upon the p-values (taken from DAVID generated result), by using another tool REVIGO (http://revigo.irb.hr/). It reveals the outliers from the list of submitted GO-terms via checking the semantic similarity and outputs a sorted list based upon dispensability value for each GO-term. More unique terms are having lesser dispensability.
4.3.1 For the predicted interactions obtained from the biclusters
As shown in Table 2, ORF9B is predicted to have interactions with 24 human proteins. Table 3 is showing GO-terms for verifying the predictions. It can be seen from the molecular function of predicted human proteins that these are related to identical binding activities. Study has shown that, ORF9B also has a functional property of membrane binding.
Table 3 Significant GO terms found in the human proteins that are predicted to interact with SARS-CoV-2 Protein ORF9B
GO-id
|
Term
|
% of Proteins
|
P value
|
Dispensability
|
GO Category: Biological Process (P-value < 1E-01, Dispensability< 0.6)
|
GO:1903955
|
positive regulation of protein targeting to mito-
chondrion
|
12.5
|
5.11E-3
|
0
|
GO:0000184
|
nuclear-transcribed mRNA catabolic process,
nonsense-mediated decay
|
12.5
|
9.57E-3
|
0.019
|
GO:0006413
|
translational initiation
|
12.5
|
1.25E-3
|
0.106
|
GO:0007265
|
Ras protein signal transduction
|
8.33
|
8.40E-2
|
0.183
|
GO:0016192
|
vesicle-mediated transport
|
12.5
|
1.52E-2
|
0.216
|
GO:0006364
|
rRNA processing
|
12.5
|
2.89E-2
|
0.335
|
GO:0034058
|
endosomal vesicle fusion
|
8.33
|
1.12E-2
|
0.347
|
GO:0045070
|
positive regulation of viral genome replication
|
8.33
|
3.20E-2
|
0.445
|
GO:0008333
|
endosome to lysosome transport
|
8.33
|
4.76E-2
|
0.453
|
GO:0016239
|
positive regulation of macroautophagy
|
8.33
|
2.83E-2
|
0.468
|
GO:0045727
|
positive regulation of translation
|
8.33
|
6.42E-2
|
0.516
|
GO Category: Cellular Component (P-value < 1E-01, Dispensability < 0.6)
|
GO:0010494
|
cytoplasmic stress granule
|
12.5
|
9.35E-04
|
0
|
GO:0016020
|
membrane
|
29.17
|
5.05E-2
|
0
|
GO:0005622
|
intracellular
|
25
|
2.30E-2
|
0.075
|
GO:0030123
|
AP-3 adaptor complex
|
8.33
|
1.3E-2
|
0.227
|
GO:0005829
|
cytosol
|
41.67
|
1.51E-2
|
0.327
|
GO:0005765
|
lysosomal membrane
|
12.5
|
4.63E-2
|
0.33
|
GO:0005730
|
nucleolus
|
20.83
|
2.10E-2
|
0.377
|
GO:0030897
|
HOPS complex
|
8.33
|
1.75E-2
|
0.417
|
GO:0031519
|
PcG protein complex
|
8.33
|
3.35E-2
|
0.491
|
GO:0030529
|
intracellular ribonucleoprotein complex
|
16.66
|
6.45E-04
|
0.54
|
GO Category: Molecular Function (P-value < 1E-01, Dispensability < 0.05)
|
GO:0044822
|
poly(A) RNA binding |
54.17 |
1.26E-09 |
0 |
GO:0000166
|
nucleotide binding |
25 |
5.61E-05 |
0 |
GO:0004004
|
ATP-dependent RNA helicase activity |
12.5 |
2.92E-3 |
0 |
GO:0003729
|
mRNA binding |
12.5 |
1.04E-2 |
0 |
GO:0008494 |
translation activator activity |
8.33 |
1.11E-2 |
0 |
GO:0008143 |
poly(A) binding |
8.33 |
1.60E-2 |
0 |
GO:0005515 |
protein binding |
70.83 |
2.10E-2 |
0 |
GO:0003676 |
nucleic acid binding |
20.83 |
3.11E-2 |
0 |
Table 4 Significant GO terms found in the human proteins that are predicted to interact with SARS-CoV-2 Protein ORF8
GO-id
|
Term
|
% of Proteins
|
P value
|
Dispensability
|
GO Category: Biological Process (P-value < 1E-01, Dispensability < 0.3)
|
GO:0000398
|
mRNA splicing, via spliceosome
|
14.28
|
2.57E-02
|
0
|
GO:0017148
|
negative regulation of translation
|
9.52
|
6.36E-02
|
0.205
|
GO Category: Cellular Component (P-value < 1E-01, Dispensability < 0.2)
|
GO:0016020
|
membrane
|
33.33
|
1.16E-2
|
0
|
GO:0071013
|
catalytic step 2 spliceosome
|
9.52
|
8.24E-2
|
0
|
GO:0005737
|
cytoplasm
|
42.85
|
8.30E-2
|
0.15
|
GO Category: Molecular Function (P-value < 1E-01, Dispensability < 0.05)
|
GO:0044822 |
poly(A) RNA binding |
33.33 |
1.12E-2 |
0 |
4.3.2 For the predicted interactions obtained from the rules
Among the predicted interactions, SARS-CoV–2 protein ORF8 is found to have maximum predicted interactions with 21 human proteins. Followed by this, 16 and 8 numbers of human proteins have been predicted to interact with SARS-CoV–2 proteins NSP12 and E, respectively. Below, Table 4, 5, and 6 are summing up the informative GO-terms verifying the identical biological activities for viral proteins ORF8, NSP12, and E, respectively.
Table 5 Significant GO terms found in the human proteins that are predicted to interact with SARS-CoV-2 Protein NSP12
GO-id
|
Term
|
% of Proteins
|
p value
|
Dispensability
|
GO Category: Biological Process (p-value < 1E-04, Dispensability < 0.05)
|
GO:0007077
|
mitotic nuclear envelope disassembly
|
35.71
|
2.88E-08
|
0
|
GO:0006409
|
tRNA export from nucleus
|
28.57
|
1.77E-06
|
0
|
GO:0010827
|
regulation of glucose transport
|
28.57
|
1.95E-06
|
0
|
GO:0075733
|
intracellular transport of virus
|
28.57
|
7.39E-06
|
0
|
GO:1900034
|
regulation of cellular response to heat
|
28.57
|
2.37E-05
|
0
|
GO:0031047
|
gene silencing by RNA
|
28.57
|
7.66E-05
|
0
|
GO:0016925
|
protein sumoylation
|
28.57
|
8.96E-05
|
0
|
GO Category: Cellular Component(p-value < 1E-01, Dispensability < 0.75)
|
GO:0044613
|
nuclear pore central transport channel
|
21.43
|
3.65E-05
|
0
|
GO:0001527
|
microfibril
|
14.285
|
7.11E-03
|
0
|
GO:0005829
|
cytosol
|
42.85
|
7.06E-02
|
0.1
|
GO:0031012
|
extracellular matrix
|
21.42
|
1.82E-02
|
0.5
|
GO:0005643
|
nuclear pore
|
14.28
|
5.01E-02
|
0.7
|
GO:0005578
|
proteinaceous extracellular matrix
|
21.42
|
1.51E-02
|
0.7
|
GO category: Molecular Function(p-value <1E-01, Dispensability < 0.05)
|
GO:0017056
|
structural constituent of nuclear pore
|
21.43
|
1.14E-04
|
0
|
GO:0005515
|
protein binding
|
92.86
|
2.65E-03
|
0
|
GO:0005487
|
nucleocytoplasmic transporter activity
|
14.29
|
1.68E-02
|
0
|
GO:0004386
|
helicase activity
|
14.29
|
6.35E-02
|
0
|
GO:0005178 |
integrin binding |
14.29 |
7.80E-02 |
0.04 |
GO:0030023 |
extracellular matrix constituent conferring elasticity
|
14.29 |
3.84E-03 |
0.31 |
GO:0005201 |
extracellular matrix structural constituent |
14.29 |
5.04E-02 |
0.4 |
While the activities are highly cohesive within each table data, each indi- vidual table has a distinct set of functions. Table 4 depicts significance GO terms like membrane, poly(A) RNA binding, etc, indicating the involvement of many human proteins in these. It is an important observation as these proteins are expected to have interaction with ORF8 that plays the main role in host- virus interaction. SARS-CoV–2 protein NSP12 is a multifunctional protein and mainly involved in the transcription and replication of viral RNAs. Table 5 is found to have many such common GO-terms related to RNA functions in the biological process. Similarly, from Table 6, it appears that many human proteins are involved in the molecular function of protein transporter activity. It says that these proteins are highly involved in transporting molecules across biological membrane. Hence, the prediction of the human proteins that inter- act with viral protein E is intuitive as it is a small membrane protein having a major role in the assembly of virions.
Table 6 Significant GO terms found in the human proteins that are predicted to interact with SARS-CoV-2 Protein E
GO-id Term
|
% of Proteins
|
P value
|
Dispensability
|
GO Category: Biological Process (P-value < 1E-01, Dispensability< 0.7)
|
GO:0006626
|
protein targeting to mitochondrion
|
42.86
|
2.38E-05
|
0
|
GO:0007605
|
sensory perception of sound
|
28.57
|
3.12E-02
|
0
|
GO:0072321
|
chaperone-mediated protein transport
|
28.57
|
1.90E-03
|
0.42
|
GO:0015031
|
protein transport
|
28.57
|
9.08E-02
|
0.68
|
GO Category: Cellular Component (P-value < 1E-03, Dispensability < 0.7) |
GO:0042719
|
mitochondrial intermembrane space protein transporter complex
|
42.86
|
9.03E-07
|
0
|
GO:0005739
|
mitochondrion
|
71.43
|
3.77E-04
|
0.28
|
GO:0005743
|
mitochondrial inner membrane
|
71.43
|
4.88E-06
|
0.65
|
GO:0042721
|
mitochondrial inner membrane protein insertion complex
|
28.57
|
9.87E-04
|
0.68
|
GO Category: Molecular Function (P-value < 1E-01, Dispensability < 0.5)
|
GO:0042803
|
protein homodimerization activity
|
42.86
|
2.50E-2
|
0
|
GO:0008565
|
protein transporter activity
|
28.57
|
2.53E-2
|
0
|
GO:0005215
|
transporter activity
|
28.57
|
6.97E-2
|
0
|
GO:0008270
|
zinc ion binding
|
42.86
|
6.0E-2
|
0.06
|
GO:0051087
|
chaperone binding
|
28.57
|
2.94E-2
|
0.46
|
4.4 Justification on the predicted interactions using KEGG pathway
Along with the GO study, we also examine the KEGG pathway obtained from DAVID tool. KEGG pathway has importance in bioinformatics research in understanding genomes, biological pathways, disease, drugs, etc.
4.4.1 For the predicted interactions obtained from the biclusters
Here we would address the example of KEGG pathway enrichment for ORF9B. The tool reveals that the human proteins that are predicted to have interac- tions with ORF9B are related to the Metabolic pathways, pathways to Measles, Herpes simplex infection. Many of the proteins are related to the B-signaling pathway, RNA degradation, etc.
4.4.2 For the predicted interactions obtained from the rules
From the KEGG Pathway obtained using the tool, it has come by the probable interaction of ORF8 with human protein which indicates that multiple cellular activities may lead to Human T-Cell Leukemia Virus Infection. Human pro- teins that are predicted to interact with ORF8, are involved in the pathway of HTLV-I infection, Apoptosis, Hepatitis C, Epstein-Barr virus infection, In- sulin signaling pathway, etc. For the predicted interactions with NSP12, the pathways are Metabolic pathways, RNA transport, etc. Similarly, SARS-CoV–2 protein E is expected to interact with human protein ALG11, IDE that are found to have involvement in Metabolic pathways, Alzheimer’s disease, etc.