We collected 332 host proteins that were identified to interact with 27 nCoV19 viral proteins by Gordon et al.6 To assemble the interactome of these host proteins, we compiled known PPIs from HPRD10 (Human Protein Reference Database) and BioGRID11 (Biological General Repository for Interaction Datasets) and predicted novel PPIs by applying the HiPPIP algorithm described earlier.32 Note that the interactome is human protein interactome, and not a host-virus interactome; the relevance to COVID19 is that the core proteins for which the interactome is assembled are those that viral proteins bind to. HiPPIP predicted ~2,600 PPIs of which ~600 PPIs are previously cataloged in HPRD and BioGRID, leaving ~2,000 PPIs to be considered as novel PPIs of the host proteins. There are an additional 3,500 PPIs that are known and not predicted by HiPPIP (this is as it should be: HiPPIP prediction threshold has already been fixed32 to achieve high precision by compromising recall, which is required for adoption into biology; in other words, it is set to predict only a few PPIs out of the hundreds of thousands of unknown PPIs, but those will be highly accurate). As reported in Supplementary File 1, prior validations of 16 PPIs predicted by HiPPIP in our other studies validated all 16 to be true; the experiments were carried out by diverse research labs. Overall, the host protein (HoP) interactome consists of 4,408 proteins and 6,076 interactions (Supplementary Data File 1). A partial network of host proteins and their novel interactors is shown in Figure 2A (see Supplementary Figure S1 for the full network of novel interactors).
We verified whether any of the 2,000 novel PPIs came up in recent interactome maps HuRI (HI-Union)14 and BioPlex15. While there was no overlap with HuRI union dataset, there were 8 PPIs in the BioPlex map (ADAM9-ADAM32, P3H3- OS9, PVR-NECTIN2, SRRM2-SNIP1, PABPC4-LUC7L2, PRKACA-AKAP1, NDUFA13-ECSIT, and NPTX1-NPTX2). The small overlap is not surprising because even high-throughput biotechnological methods discover different parts of the interactome with only small overlaps with each other6, this demonstrating complementary strengths.14
Applications of this network are two-fold: (1) biologists, who typically focus their research on specific proteins or a pathway may look up the novel interactions relevant to that protein or pathway (e.g.26) (2) computational systems biologists may investigate it in conjunction with transcriptomic/proteomic data (e.g.).30,33-35 To facilitate ‘(1)’, we are making these results available over an interactive webserver, and to facilitate ‘(2)’, we are releasing the data as downloadable files in various formats.
We employed ‘Netbox’36 to identify modules based on network topology. It expands the core proteins by adding nodes from the interactome whose number of links to core proteins are statistically significant compared to its degree in the human interactome. From this network, it identifies highly interconnected modules. It was able to connect 323 proteins (220 host proteins and 103 linker proteins) into 21 modules, of which 14 modules had 4 or more nodes each (Supplementary Figure S2). For comparison, when novel PPIs are not included, it connects 199 proteins (138 host proteins and 61 linker proteins) into 18 modules of which 10 had 4 or more proteins each. Scaled modularity score (z- score compared to random networks) was 17.0 with novel PPIs, and it was 14.5 without novel PPIs (z-score compared to corresponding random networks). Bioinformatic analysis of the computed modules showed that five modules formed with novel interactors had statistically significant enrichment of Gene Ontology biological process terms: epigenetic regulation of gene expression (p-value=3.3E-04, odds ratio=10.4), nuclear transport (p-value=2.4E-12, odds ratio=21.6), cilium organization (p-value=1.28E-03, odds ratio= 7.8) ribonucleoprotein complex biogenesis (p-value=0, odds ratio=22.4), and vesicle-mediated transport between endosomal compartments (p-value=9.4E-06, odds ratio=123.4) (Figure 2C i-vi). When novel PPIs are excluded, some of these associations were missed and the modules were smaller, but three additional functional modules were found: cell cycle G2/M phase transition (p-value=0.0019, odds ratio=21.7, 20 proteins), DNA replication (p-value=0.0049, odds ratio=55.25, 3 proteins) and cell-cell signaling by Wnt (p- value=0.0049, odds ratio=9.3, 24 proteins) (Supplementary Table S2).
ACE2 Interactome
SARS-CoV-2 engages the host receptor ACE2 (angiotensin-converting enzyme 2) for cell entry.37 Viral entry happens prior to the interaction of the viral proteins with host cellular proteins; it was the latter that was studied by Gordon et al.6 Therefore, it was not part of the 332 core genes considered in constructing the interactome. Owing to its crucial role in nCov19 infection, we assembled its known and novel PPIs separately and found that it was connected to four host proteins (SIL1, LOX, MDN1 and NINL) through an intermediate interactor, i.e. separated by two edges, where one or both intermediary PPIs are novel predicted ones (see red edges in Figure 2B).
These connections reveal interesting insights: ACE2 is a key player of the renin-angiotensin hormone system that regulates blood pressure and electrolyte balance.38 In line with this, we found that its interactors AGT (angiotensin), GHRL, CLTRN and POMC are associated with the Reactome Pathway peptide hormone metabolism (p-value=2.9E-05). ACE2 and its interactors were also enriched in the Gene Ontology Biological Process circulatory system process (ACE2, AGT, NTS, POMC, GHRL and the host protein MYL4; p-value=0.001). Three host proteins are associated with numerous vascular and cardiac phenotypes: LOX with abnormality of blood volume homeostasis, aortic root aneurysm, ascending aortic dissection, carotid artery dilatation, coronary artery atherosclerosis, cystic medial necrosis of the aorta, descending thoracic aorta aneurysm, dilatation of the cerebral artery, left ventricular failure, peripheral arterial stenosis, MYL4 with paroxysmal atrial fibrillation and bradycardia, and SIL1 with abnormal aldolase level.
The co-morbidity of hypertension, diabetes and cardiovascular among the group of COVID19 patients with high fatality rate warrants a closer look at ACE2 and other host proteins linked to cardiac and vascular phenotypes.
Wiki-CORONA: A web server of novel host PPIs
The HoP interactome is available on a website called Wiki-CORONA (http://hagrid.dbmi.pitt.edu/corona/). It has advanced-search capabilities, and presents comprehensive annotations, namely Gene Ontology, diseases, drugs and pathways, of the two proteins of each PPI side-by-side. Here, a user can query for results such as “PPIs where one protein is anti-viral and the other is involved in immunity”, and then see the results with the functional details of the two proteins side-by-side. The PPIs and their annotations also get indexed in major search engines like Google and Bing. Querying by biomedical associations is a unique feature which we developed in Wiki-Pi that presents known interactions of all human proteins.39
Transcriptome Analysis
Significantly large number of proteins in the interactome were differentially expressed in epithelial cells infected with SARS coronavirus (GSE17400, Calu-3 cell, 48 hours post-infection; p-value=4.76E-12). Several proteins also showed differential expression in the transcriptome level after infection by Urbani strain of SARS coronavirus (GSE37827, Calu-3 cells, 72 hours post-infection) and in peripheral blood mononuclear cells of SARS patients (GSE173940). These latter two datasets of differential expression did not show statistically significant overlaps; yet, the transcriptomic evidence highlights key protein-encoding genes associated with viral infection that interact with the core proteins considered in this study. As several of the interactors here are revealed through computational prediction, the information that they are differentially expressed in SARS/SARS-Cov-2 infections presents opportunity to prioritize consideration of novel PPIs for further study.
Melo et al. had identified 120 differentially expressed genes (DEGs) associated with nCoV19 infection in the A549 cell line.5 Of these, only 2 were common with the 332 host proteins identified through AP-MS study6 (‘host proteins’).
However, our study revealed several interesting links between the two sets: (a) 31 DEGs are direct interactors of 38 host proteins, with some DEGs interacting with multiple host proteins; (b) Thirteen novel PPIs exist between the two sets: AAR2-SAMHD1, TUBGCP2-C1R, IMPDH2-C1S, GOLGA7-TCIM, RAB8A-STEAP1, GDF15-EHF, REEP5-PDK4 FAM162A-PARP14, STOML2-CDH1, FGA-RAB14, FBXL12-C19orf66, ECSIT-C19orf66 and EIF4H-PTPN12. (c) 108
DEGs and 285 host proteins are highly interconnected through 808 common interactors (statistically significant overlap with odds ratio=1.5, p-value=7.12E-54). (d) Pathway enrichment analysis of overlapping interactome (consisting of shared interactors and the DEGs and host proteins that they interact with) revealed several immune-related pathways with FDR-corrected p-value<0.05.
2,630 proteins in the interactome that are supported by the above mentioned transcriptomic and proteomic evidence are listed in Supplementary Data File S2. In fact, the selected novel interactors shown in Figure 2A all have transcriptomic/proteomic evidence.
We studied tissue specific expression of the proteins in the interactome using GTEx data.41 Genes with an expression level greater than 1 TPM (transcripts per million) and relative expression at least 5-fold higher in a particular tissue (tissue- enriched) or a group of 2-7 tissues (group-enriched) were considered. As expected, many genes showed specific expression in lung which is the target tissue of the virus, and in spleen which regulates the immune response of the host (Figure 3). New PPIs were found of host proteins with 37 lung-specific proteins and 49 spleen-specific proteins. Host proteins also interacted with several brain and heart tissue specific proteins, which is of importance as cerebrovascular diseases and coronary heart diseases are co-morbidities among COVID-19 non-survivors.42 (Figure 3).
Gene Ontology Term Enrichment
PML bodies and the midbody may function as subcellular targets of nCoV19
Gene Ontology enrichment analysis of the interactome identified several subcellular locations that may be targeted by nCoV19. Cellular locations included points of virus entry such as the cell-substrate junction, nuclear periphery and specific sites from where viral proteins may potentiate viral replication, gene expression and modulate the immune response of the host such as the midbody, nuclear chromatin and PML body (each term with p-value<0.0001). PML (promyelocytic leukaemia bodies are nuclear sub-compartments that repress viral replication through entrapment or epigenetic silencing of the viral genomes.43 Components of PML bodies activate interferon-stimulated genes and cytokines, and may also be upregulated on induction of interferons.43 Therefore, it is conceivable that viruses target PPIs in these structures to circumvent anti-viral defences of the host cell. Sixty-one proteins in the HoP interactome are PML components. These include the host protein AKAP8L, which has been known to promote retroviral gene expression, and 55 known interactors and 5 novel interactors (RNF111, SP140, ELF4, NFE2, CIART) of other host proteins. Our model predicted an interaction of EIF4E2 with SP140, an interferon-inducible PML component; nCoV19 may perturb this PPI. The midbody is a microtubule-rich stricture that connects the daughter cells and marks the site of abscission during cytokinesis. Viruses have been known to recruit certain protein complexes that also localize to the midbody during cytokinesis, to the host cell membrane to promote its scission and thereby the release of viruses.44 This co-opting of proteins may explain the enrichment of midbody proteins in the HoP interactome. 83 proteins in the HoP interactome, including 11 host proteins (RHOA, CENPF, CIT, RAB8A, NUP62, SCCPDH, SPART, RDX, ARF6, CNTRL and
RALA), 63 known interactors and 9 novel interactors (KIF4A, BIRC5, INCENP, ALKHB4, DNM2, DDX11, ARL2BP, ABRAXAS2 and WIS) localize to the midbody.
Cell cycle phase transitions in the host may be modulated by nCoV19
Enriched biological processes in the interactome included (G1/S and G2/M) mitotic cell cycle phase transitions, regulation of vesicle-mediated transport, covalent chromatin modification and nuclear transport (p-value<0.0001). The response of the host cell to nCoV19 infection has been shown to be significantly delayed and devoid of several anti-viral mechanisms.5 During early stages of the infection, it is possible that the virus induces a G1/S phase transition to surreptitiously synergize the replication of the viral genome with that of the host genome.45 In the later stages, it may block the G2/M phase transition to maximise the levels of viral genome.45 We found novel (predicted) interactions of host proteins with 34 proteins involved in cell cycle phase transition: ANAPC4, ANAPC7, ARPP19, CCNB3, CDC14B, CDC16, CDC7, CEP164, CETN2, CLSPN, CRLF3, DCTN1, DNM2, DYNC1H1, E2F6, ENSA, FBXL7, GFI1, GML, HYAL1, INHBA, JADE1, NEUROG1, NPAT, ORC2, PPM1D, RAD17, SPDYA, TAOK2, TICRR, TRIAP1, XPC, ZFP36L1, ZNF655.
Pathway Associations
Resveratrol-modulated sub-network of genes involved in the tristetraproline pathway
Using WebGestalt,46 we compiled the list of the Reactome pathways (Figure 4), which showed a statistically significant enrichment of several pathways related to viral entry and infection such as infectious disease, HIV life cycle, vesicle- mediated transport and membrane trafficking. Several immunity-related pathways which mediate the host response such as MyD88 dependent TLR4 signalling and ISG15 anti-viral mechanism were also identified.
The transcriptional profile of the host cell after nCoV19 infection had revealed a remarkably limited anti-viral response compared to that elicited by seasonal influenza-A and respiratory syncytial viruses.5 This prompted us to inspect a post- transcriptional regulatory pathway that was enriched in the HoP interactome, namely, tristetraproline (ZFP36) binds and destabilizes mRNA (p-value<0.0001). ZFP36 is an RNA-binding protein that targets AU-rich sites in the mRNA transcripts coding for immune proteins and destabilizes them by promoting the deadenylation of their poly(A) tails.47,48 YWHAB increases cytoplasmic localization of ZFP36, possibly preventing destabilization of these genes and attenuation of immune response.49 We extracted the direct PPIs of the 17 genes belonging to this pathway from the HoP interactome and isolated this sub-network for further inspection (Figure 5). Our predictions show that the host protein DCAF7, which is known to function as a scaffold protein and a facilitator of PPIs, interacts with YWHAB and ACE1, belonging to the class of receptors targeted by nCoV19 (Figure 5). This raises the possibility that the virus protein Nsp9 (which interacts with DCAF7) may somehow perturb YWHAB-induced cytoplasmic localization of ZFP36 through its action on DCAF7. Nsp9 may activate or promote the sequestration of YWHAB with DCAF7, thereby reducing its capacity to form a complex with YWHAB. YWHAB-mediated destabilization of immune genes may then lead to a weakened immune response, creating a conducive environment for nCoV19 infection. We also identified 3 drugs targeting the proteins in this sub-network using Drug Bank:50 resveratrol targeting KHSRP and APP, known interactors of the host protein EXOSC2, which is involved in the tristetraprolin (TTP) pathway, staurosporine targeting TTP-associated MAPKAPK2 which has been predicted to interact with PABPC1 and dacarbazine targeting the host protein POLA2 (Figure 5). Gene expression profiles induced by these drugs in various cell lines were found to have a negative correlation with SARS-associated gene expression profiles, namely, that of lung fibroblast MRC5 cells infected with SARS-CoV and in peripheral blood mononuclear cells of SARS patients (analysis using NextBio; https://www.nextbio.com).51,52. Resveratrol has been proposed as a therapeutic option for nCoV19 based on its antagonistic properties against MERS-CoV.53
Genetic Disorder Enrichment Analysis
Network proximity of genes associated with diabetes and hypertension to the host proteins
We studied the association of interactome genes with any genetic disorders/traits in the OMIM database. 155 genes in the interactome, including 9 host protein-encoding genes, and 121 known interactors and 25 novel interactors of host proteins, were found to be associated with 35 disorders (overlap of each disease had p-value<0.05). This included 13 types of cancers, 7 metabolic disorders, 4 neurological disorders, 3 developmental disorders, 2 eye-related disorders, 2 vascular diseases, 1 infectious disease, 1 inflammatory disorder, 1 respiratory disorder and 1 skin disease (Figure 6 and Table 1). Some of these diseases enriched in the interactome are co-morbidities among non-survivors and critically ill COVID patients (e.g. diabetes, hypertension, cerebrovascular events and cancer).42,54 Thirteen genes in the interactome were associated with non-insulin dependent diabetes mellitus (odds ratio=10.8, p-value=4.38E-10), 6 genes with essential hypertension (odds ratio=12, p-value=2.34E-05), 3 genes with ischemic stroke (odds ratio=14.4, p-value=0.0017) and 10 genes with lung cancer (odds ratio=14.1, p-value=2.36E-09). Network proximity of the proteins associated with these co- morbid conditions to the nCoV19 host proteins may explain why patients with these conditions are increasingly affected by the viral infection. Further investigations are necessary to dissect these co-morbidities. Treatment strategies that prevent the deterioration of the underlying genetic conditions must be devised to combat COVID-19 in susceptible individuals. Additionally, neurological disorders such as Alzheimer’s disease (odds ratio=15.3, p-value=5.13E-07) and schizophrenia (odds ratio=12, p-value=4.19E-06) were also found to be enriched in the interactome, warranting further investigations into these potential co-morbidities.
Interconnections to Ciliary Proteins
SARS coronavirus which emerged in 2002 has been known to induce necrosis in ciliated airway epithelium of humans in a species-specific manner.55 nCoV19's host receptor ACE2 is highly expressed in ciliated respiratory cells.56 Cilia may serve as virus entry points and potential modulators of viral pathogenesis. This conjecture prompted us to investigate the ciliary association of the host proteins and their interactors in the HoP interactome. For this, we studied its overlap with an interactome of 165 ciliary proteins that we constructed in a similar manner. The ciliary protein interactome contained 1,665 proteins. 617 of these proteins, and specifically 30 core ciliary proteins, are also found in nCoV19’s host protein interactome, and the overlap was found to be statistically significant (p-value=2.24E-10, odds ratio=1.22). Thirteen novel predicted interactions connected host proteins to ciliary proteins: NUP98-CHMP5, GG3BP1-DNAH1, SEPSECS- DNAH1, NEK9-IFT43, TLE1-DNAH5, ATP6AP1-CETN2, C1orf50-ZMYND12, RAB10-IFT172, TOR1AIP1-GPR161, DNAJC19-CETN3, NLRX1-IFT46, FKBP7-TTC30B, POLA2-TMEM216 and NDUFB9-DRC7.
Pathway analysis of the 617 common proteins (i.e., common to HoP and cilia interactomes) revealed two interesting pathways: budding and maturation of HIV virion (p-value=1.29E-06; odds ratio=8.8) and anti-viral mechanism by IFN- stimulated genes (p-value=0.013; odds ratio=2.98). We predicted that ciliary protein CHMP5 involved in the former pathway interacts with host protein NUP98 which is involved in the latter pathway. This prompted us to ask whether the predicted interaction connected the functional modules of viral budding to interferon (IFN) signaling.
Novel interaction of NUP98 with CHMP5 may activate an IFN-stimulated pathway that interferes with viral budding
We extracted the PPIs of the 20 proteins belonging to viral budding and IFN pathways and isolated this sub-network, containing 171 proteins and 176 PPIs, for further analysis. Firstly, we identified 343 functional interactions (i.e. activation, inhibition etc.) among 98 proteins in the network. Strikingly, distinct functional modules were identified for both the pathways; CHMP5 seemed to serve as a connector from the viral budding pathway to the IFN pathway through NUP98 (Figure 7). The gene UBC was shared between the clusters.
We then checked whether the genes in these modules were differentially expressed in Calu-3 lung cells infected with SARS CoV Urbani (for 72 hours) versus mock infected cells. This was done to identify the functional interactions that remain active during viral infection. It was assumed that differential expression of the genes would directly impact the proteins encoded by them and their interactions. 20 genes including NUP98 and CHMP5 were found to be differentially expressed (Figure 7). Viruses hijack the ESCRT/VPS4 (endosomal sorting complex required for transport) machinery of the host cell to release viral particles through membrane scission.57 This machinery is normally recruited during endocytic and membrane repair processes in the host cell. The process of membrane scission is catalyzed by various ESCRT-III proteins including CHMP5.57 VPS4 is an ATPase that is found in the cytoplasm in its inactive form. Activation of the VPS4 and its ATPase activity is essential for membrane budding and the release of viral particles.57 VPS4 is activated on membranes in the presence of its co-activator VTA (also known as LIP5). VTA is delivered to the membranes by ESCRT- III proteins such as CHMP5.57 Hence, the interaction of VPS4 and VTA is facilitated by CHMP5. However, when interferons are induced in the host cell following viral infection, ISGs (interferon stimulated genes) such as ISG15, a dimer homologue of ubiquitin, may be activated.57 ISG15 may then conjugate to CHMP5 and promote its accumulation in the membrane, effectively blocking the interaction of VTA with VPS4 and preventing viral budding.57 The novel interaction of CHMP5 with NUP98 may serve as the critical juncture at which the IFN-stimulated anti-viral mechanism interferes with viral budding. NUP98, a protein induced on viral expression, has been shown to promote anti-viral gene expression in drosophila.58 Both CHMP5 and NUP98 are overexpressed following SARS CoV Urbani infection. This interaction may serve as a signal for the initiation of ISG15-mediated interference of viral budding. ISG15 may further regulate this mechanism through feedback inhibition of NUP98. Hence, potentiation of this anti-viral mechanism through administration of recombinant interferon alfa-2b and interferon alfacon-1 may be a feasible therapeutic option for nCoV19. Both these interferons induce gene expression profiles negatively correlated with SARS-associated profiles. The machinery of ESCRT-III and VPS4 is co-opted into two subcellular structures that are intricately linked to cilia function, namely, the centrosomes and the midbody.44 It is important to study these structures as potential modulators of viral infections.