HIR V2: a human interactome resource for the biological interpretation of differentially expressed genes via gene set linkage analysis

doi:10.21203/rs.3.rs-26127/v1

Download PDF

Software

HIR V2: a human interactome resource for the biological interpretation of differentially expressed genes via gene set linkage analysis

https://doi.org/10.21203/rs.3.rs-26127/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

To facilitate biomedical studies of disease mechanisms, a high-quality interactome that connects functionally related genes is needed to help investigators formulate pathway hypotheses and to interpret the biological logic of a phenotype at the biological process level.

Results

Interactions in the updated version of the human interactome resource (HIR V2) were inferred from 36 mathematical characterizations of 6 types of data that suggest functional associations between genes. This update of the HIR consists of 88,069 pairs of genes representing functional associations that are of strengths similar to those between well-studied protein interactions. Among these functional interactions, 57.04% may represent protein interactions, which are expected to cover 32.48% of the true human protein interactome. The gene set linkage analysis (GSLA) tool is developed based on the high-quality HIR V2 to identify the potential functional impacts of observed transcriptomic changes, helping to elucidate their biological significance and complementing the currently widely used enrichment-based gene set interpretation tools.

Conclusions

We present the HIR V2, a high-quality functional interactome of human genes, along with the gene set linkage analysis (GSLA) webtool, which utilizes the HIR V2 to interpret the biological significance of transcriptionally changed genes. A case study shows that the annotations reported by the HIR V2/GSLA system are more comprehensive and concise compared to those obtained by widely used gene set annotation tools such as PANTHER and DAVID. The HIR V2 and GSLA are available at http://human.biomedtzc.cn .

Epigenetics & Genomics

human

functional interaction

database

gene set linkage analysis

transcriptomic analysis tool

Over the last two decades, advancements in omics technology have provided a set of powerful tools for better elucidation of the mechanisms of human diseases and for the acceleration of drug discoveries [1–3]. Compared to the traditional approaches, which focus only on the limited significantly changed genes, tools that were developed with omics technology can allow us to have a global overview of the functional association network of genes present in a cell or in an organism [4, 5]. A high-quality functional interaction network that groups functionally associated genes may not only facilitate the elucidation of biological pathways, helping investigators to focus on the more likely genes when extending existing mechanisms, but also facilitate the interpretation of biologically desired functional impacts at the subsystem (or biological process) level.

Although omics technology offers several opportunities in human research, it also enables resolving many challenges, such as achieving efficient analysis and interpreting vast and complicated omics data [6]. To describe the underlying design logic of physiological processes from molecular-level descriptions, the existing omics data-based methods used to obtain a high-level biological sense mostly rely on enrichment analysis of the observed transcriptomic changes (OTCs). Approaches based on enrichment analysis evaluate whether these changed genes are enriched or clustered in a certain biological process. To date, many enrichment-based annotation tools have been developed to analyse OTCs, including the widely used annotation tools PANTHER [7], KEGG [8], and DAVID [9].

Actually, the observed OTCs can be successfully summarized into established biological concepts in many cases through the above strategies. In practical use, however, enrichment-based methods are frequently reported to yield only conceptually general terms (such as GO: 0051704, a multi-organism process) and have even been reported to not enrich any annotation term. Similar to the no annotation term case, the conceptually general terms also provide little assistance to human research because no established biological concepts can be used to accurately describe the observed OTCs. However, if no established biological concepts exist to accurately describe the OTCs, we sometimes still need the established concepts to interpret the functional impacts of the observed OTCs. For example, observed OTCs may lead collectively to GO: 2000563 (positive regulation of CD4-positive, alpha-beta T cell proliferation), even when the OTCs themselves are not enriched in these terms (please see Discussion for details).

To meet this challenge, we developed gene set linkage analysis (GSLA) to interpret the potential functional impacts of the observed OTCs, especially when there are no established biological concepts or suitable concepts available to describe these changes. GSLA can classify an observed OTC as an established biological function if this OTC has strong functional associations with genes in the established biological process.

Previously, we developed a high-quality functional interactome, the HIR [10], and its associated GSLA service to interpret the potential functional impacts of observed OTCs. As an application example, this approach supported the analysis of the multiomics profiling of human bone marrow stem cells rescuing fulminate hepatic failure (FHF) in pig models. The HIR and GSLA identified a key signalling process that was not identified using other tools. Subsequent experiments confirmed that the cytokine regulating this process improved animal survival in both pig and rat models [11]. This report describes the first identification of a potential therapeutic strategy that may promote hepatic cell regeneration in FHF pathophysiology.

Since 2013, researchers have generated abundant data that suggest functional interactions among genes in humans. In this work, we present an updated version, the HIR V2, and its associated GSLA webtool. We show that the HIR V2 exhibits the best performance among the available interactomes in grouping functionally related genes together. Here, the HIR V2 integrates six types of functional association data from 9 public databases (before 2018). The updated version of the HIR includes 88,069 functional gene associations, which are expected to cover 32.48% of the protein-protein interactions in humans. Approximately 57.04% of these functional associations are expected to represent protein-protein interactions. A case study also shows that biological processes identified by the HIR V2 and the GSLA webtool were more comprehensive and informative for experimental investigators compared to the widely used annotation tools PANTHER [7] and DAVID [9].

We first describe the implementation workflow of HIR V2 that is used for functional association prediction between human genes. We subsequently describe the implementation of GSLA tool, which is developed based on high-quality HIR V2 to interpret the potential functional impacts based on the observed OTCs. Afterwards, we provide the description of backend implementation of HIR V2/GSLA website.

Data integration for the prediction of functional associations in humans

For the prediction of functional associations between genes in humans, we selected six types of evidence, which were collected from seven public databases for the years prior to 2018, including 22,004 expression profiles (Coxpresdb) [12], 288,375 gene annotations (GOC) [13], 59,617 subcellular gene localizations (Compartments) [14], 156,859 domain interactions (IDDI [15] and Pfam [16]), 20,567 phylogenetic profiles (DIOPT) [17] , and 9,220 human proteins and proteins from Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus, Rattus norvegicus, Saccharomyces cerevisiae, and Schizosaccharomyces pombe used to compute interologs (Inparanoid) [18] (Fig. 1). From these six types of evidence, 36 feature values were taken. We used these 36 feature values to measure the strength of functional associations (Additional file 1: Table S1).

In addition to the above six types of evidence, protein-protein interactions were also considered to be evidence of high-strength functional interactions between genes. In this work, we collected 319,696 protein-protein interactions that were reported in experimental studies of humans from two public databases, BioGRID [19] and IntAct [20] (Fig. 1 and Additional file 2: Table S2). To ensure the quality of the experimentally reported protein-protein interactions, we filtered the interactions that were reported in less than two independent studies and reported only in high-throughput experiments. The remaining 4,509 high-quality protein-protein interactions were used for subsequent prediction model training to obtain the inferred functional associations that are as strong as protein-protein interactions. In this work, the UniProt [21] and BioMart [22] software were used to convert different gene IDs to unique HGNC IDs according to the reference gene IDs of the HGNC database [23] (Fig. 1).

Computation and Evaluation of feature value

Thirty-six feature values of six types of functional association evidence were utilized to characterize the functional interactions between human genes (Additional file 1: Table S1). The detailed equations are on the HIR V2 website. These 36 feature values include 1 homologous interaction feature, 3 phylogenetic profile features, 23 domain interaction features, 4 subcellular colocalization features, 2 coexpression features and 3 shared annotation features (Additional file 3: Table S3).

To successfully separate protein interactions from random gene pairs, not all of these 36 features are suitable. Therefore, only those features showing a strong correlation with functional associations were retained, based on which we could decrease the signal-to-noise ratio in the subsequent step of functional association interference. To evaluate the power of the functional association indication of our selected 36 feature values, the area under the curve (AUC) of the receiver operating characteristic (ROC) curve was preferred. When computing the protein-protein interaction inference, each feature value with different cut-offs will lead to a series of sensitivities and specificities. We plotted the sensitivities and specificities related to different cut-offs as the ROC curve (X-axis, 1-specificity; Y-axis, sensitivity). Feature values with AUCs higher than 0.6 were considered informative, indicating strong functional associations (Additional file 4: Fig. S1). Eventually, a total of 18 features with AUCs higher than 0.6 were selected for the subsequent prediction of functional associations between human genes (Additional file 3: Table S3 and Additional file 4: Fig. S1).

Interference of functional associations between human genes

The LibSVM package was used to train and predict functional associations [24, 25] (Fig. 1). We chose 4,509 high-quality protein-protein interactions, which were confirmed by experiments and published before 2018, to serve as positive examples representing the strong functional associations between human genes. Negative examples were randomly generated gene pairs (overlapping gene pairs with positive examples were removed). Two random gene pairs may have functional associations, although the probability is low. Here, we set the positive-to-negative ratio to 1:100 in the training dataset to reduce the false positive rate in the negative examples so that only a notably small fraction of gene pairs have functional associations. This functional gene association prediction approach may be considered an implementation of transfer learning. Based on the evidence of functional associations, both protein interactions and functional gene associations may be predicted. Here, protein interactions may actually be considered one type of strong functional gene interaction. Thus, “knowledge” (i.e., the classification model) gained from predicted protein interactions may be used for the inference of functional associations between genes. In reality, gold-standard protein interactions have been reported by experiments; however, for strong functional gene associations, no well-established gold-standard dataset exists. When we predict the functional associations, the transfer learning strategy may help us to address this lack of a gold-standard dataset and to use the knowledge gained in predicting protein interactions (i.e., a special form of strong functional associations) to infer the functional associations between genes.

For the prediction model training, we used the soft-margin Gaussian kernel SVM algorithm. Two parameters, s (kernel width) and C (soft margin), were used to obtain an optimal harmonic mean of the sensitivity and specificity and were optimized with a 5-fold cross-validation. We trained the prediction model with the optimized s and C. An external validation dataset with 435 protein interactions (published after December 31, 2017) and randomly generated negative examples were used to validate the prediction model. This model showed a sensitivity of 32.48% and a specificity of 99.98%. Moreover, we evaluated the sensitivity of HPRD, HI-III, HIPPIE, STRING, and UniHI to see how well the predicted interactions in each database covered these new interactions. The comparison results are shown in Additional file 5: Table S4.

After we applied this model to all human gene pairs, a total of 83,125 predicted functional associations were obtained. In addition to these inferred functional interactions, we added 4,944 experimentally reported interactions to the HIR V2 dataset, which includes 88,069 interactions. The following equation was used to estimate the proportion of protein-protein interactions that were covered by the predicted functional interactome in humans.

where N_interactome is the expected number of all protein-protein interactions in humans; N_all-pairs is the number of all gene pairs in humans; N_predict is the number of predicted gene associations; and sensitivity and specificity are the accuracy measures produced when the prediction model was validated with the newly published protein interactions. Solving this equation gives an estimated human protein interactome size of 1.52 x 10⁵, which corresponds to 1 protein interaction among 1,230 gene pairs. This result is similar to the reported fraction of protein interactions in yeast (1/775, [26]). Based on the estimated interactome size (1.52 x 10⁵) and the estimated sensitivity (32.48%, the conservative estimation from the training stage sensitivity (32.88%) and the evaluation stage sensitivity (32.48%)), the predicted interactions in the HIR V2 are expected to include 86,359 protein interactions. Therefore, 57.04% of the HIR V2 functional interactions (49,249 out of 86,359) are expected to represent protein interactions.

Gene set linkage analysis tool

The GSLA web tool was first developed together with the predicted Human Interactome Resource (HIR 2013) [10] to interpret the potential functional impact from the observed OTCs in humans. Two hypotheses (Q1 and Q2) are assumed by GSLA to ensure that the reported functional associations between two gene sets are significant (Fig. 2). Q1 measures whether the density of inter-gene-set gene associations between two functionally associated gene sets is higher than the density of background gene associations connecting two random gene sets. Q2 assumes that the high density between functionally associated gene sets can be observed only in the biologically correct interactome and not in random interactomes. In other words, when we compare the density of the HIR V2 to a random gene association network, both consisting of the same genes and with each gene having the same number of neighbours, the HIR V2 will have a higher density. In a biological sense, Q1 examines the strength of the functional associations between two gene sets, while Q2 verifies that the observed strong functional association is the result of a biologically correct network topology (i.e., our knowledge of the molecular mechanisms) rather than the result of the compositions of these two gene sets. Some genes, known as hubs, have considerably more neighbours than other genes. Therefore, if the gene sets have many hubs, they are more likely to connect to other gene sets. To ensure the biological significance of functional associations that were detected between two gene sets, the second hypothesis (Q2) can remove the confounding factor of gene set composition. In general, Q1 and Q2 are related and different hypotheses. They complement each other so that the GSLA tool can increase its sensitivity and specificity. We set density > 0.01 for Q1 and p < 0.001 for Q2 as the default criteria for GSLA when reporting the functional associations between two gene sets.

Construction of the HIR V2/GSLA website

To deploy the online database, we used the LNMP system, which is an integrated system that includes Linux, Nginx, MySQL, and PHP. The MySQL database was used to store data. The web interface of the online database was developed using the Laravel framework using PHP. The front-end of the online database was implemented with the Vue.js script library, which implements a single page application (SPA). Vue.js is an open-source JavaScript library designed for SPA web interface creation. Cytoscape [27] was used for the visualization of the functional association networks.

Functional gene association network evaluation

To evaluate the quality of the updated version of the functional gene association interactome in humans, we measured the ability of the HIR V2 to group functional associated genes together. In this study, we assessed the function prediction accuracy of a gene with its network neighbours. We compared the quality of our predicted functional interactome with other human interactomes in a guilt-by-association gene function prediction assay, including HIPPIE [28], HPRD [29], PICKLE [30], UniHI [31], and STRING [32]. Apart from the above five public human interactomes, we also added our previous version of HIR (HIR 2013) [10] for interactome quality comparison. For each gene in each interactome, its GO biological process annotations were predicted as the terms enriched in the annotations of its first-degree network neighbours. In our evaluation, the term enrichment tool PANTHER [7] was used to compute enriched terms. Because the data integrated by the HIR V2 represent the period before December 31, 2017, we collected 13,648 genes from the GO database with new annotations added after Dec. 31, 2017. These genes contain a total of 398,441 annotations, 118,748 of which were newly reported since 2018. These genes and their annotations were used to evaluate the quality of our inferred human interaction network HIR V2.

A precision-recall curve was used for the comparison of the overall prediction accuracy of new annotations across seven interactomes. Recall measures the proportion of these 118,748 new annotations that are successfully predicted, while precision measures the proportion of PANTHER-predicted annotations that are consistent with the known annotations (both new and old annotations are included). Each annotation predicted by PANTHER has an enrichment significance value. Setting a higher cut-off value will result in more reported annotations and a higher recall but also a higher false positive rate. In contrast, setting a lower cut-off value will result in fewer reported annotations and a lower recall but also a higher precision. Therefore, the precision-recall curve has the advantage of showing precision and recall rates on different cut-offs so that a more comprehensive view of the interactome quality can be achieved. A higher AUC of the precision-recall curve indicates a better interactome that supports the “guilt-by-association” prediction of gene functions.

As shown in Fig. 3, the HIR V2 ranks highest with a significant margin relative to the other interactomes, indicating its strong ability to group functionally related genes together. Notably, the second place was occupied by the previous version of the HIR (HIR 2013). This version was published in 2013 and still performed better than several interactomes that included very recent data. Although the curves of HIPPIE, HPRD, PICKLE, and UniHI have high-precision regions, they did not reach the high-recall regions. In contrast, the curve of STRING reached the high-recall region, and its precision always stayed in the low-recall region and did not show a considerable increase. Based on the observation of STRING, it was suggested that STRING has a high proportion of weak functional gene associations. Therefore, during function prediction, STRING may raise the false positive rates. In general, both versions of the HIR showed a balance between coverage and accuracy. The overall qualities of the HIRs, even the version published in 2013, exceed those of the other compared interactomes.

Web interface of the HIR V2/GSLA

The interface of our developed HIR V2 is user-friendly. The HIR V2 has two search modes: a single gene search mode and a multiple gene search mode (Fig. 4A). We provided two search options with gene names and HGNC IDs to gain access to the HIR V2. The results of the single search mode show putative functional associations involving the query gene, and the results of the multiple search mode show functional associations between the query genes. Fig. 4B presents the functional associated interactions reported by the HIR V2 in tabular form. These reported functional interactions are also shown in a graphical view at the right side of the query interface. If users are interested in a functional interaction, they need only to click on this edge to check the feature values for the interaction prediction in our model. Here, we also provide a score value to measure the prediction reliability of the functional interactions between genes. A score between 0 and 1 indicates that the decision is within the error margin. Smaller scores are associated with lower confidence. A score that equals 1 indicates that the decision is outside the error margin and is therefore of good reliability. Similar to the graphical view of the functional associations, the thickness of the line is positively correlated with the functional association prediction reliability. In addition to the lines, users can click the nodes to view detailed information on their gene of interest. On the HIR V2 website, we also provide a full dump of our predicted interactome for download. More details about the HIR V2/GSLA are provided in the help section of our website.

On our HIR V2 website, users can access the GSLA online service to interpret the potential functional impact of an uploaded gene set. Fig. 4C shows the main interface of GSLA, which provides six types of human gene IDs for users to query OTCs, including the gene name, HGNC ID, UniProt ID, Ensembl gene ID, Ensembl protein ID, and NCBI Entrez ID. Here, the search type of HGNC IDs of query OTCs is suggested because the internal server can recognize only the HGNC ID. Therefore, all types of IDs submitted to our online service are automatically mapped to the HGNC ID before further computation (Fig. 4C). Users can optimize the criteria of reported significant functional associations by GSLA (Q1 and Q2 tests, as described above). Moreover, an email address is requested before submission. We recommend users to utilize the top 50 – 200 changed genes of observed OTCs during querying when they need to obtain optimal functional impact interactions. The top ten lines of the result file provide the analysis parameters (Fig. 4D). Below is a table that presents the functionally associated biological process, functional associations between genes in reported biological processes, and the genes in the query OTCs.

Using the HIR V2/GSLA system to reanalyse the Treg-DC dataset

Regulatory T cells (Tregs) play a pivotal role in maintaining immune homeostasis, including the maintenance of immune tolerance to the self and the prevention of excessive immune responses [33–36]. The suppressive function of Tregs is to inhibit the activities of CD4+ and CD8+ effector T cells, natural killer (NK) cells, and dendritic cell (DC) maturation [37–40]. However, these suppressive activities that are mediated by Tregs can contribute to the immune escape of pathogens or tumours [41]. One suppressive modality of Tregs, through suppression of the DCs to indirectly dampen immune activation, attracted Mavin et al. due to the limited amount of research on the modulation of the DC function by human Tregs [42]. They discovered novel evidence that Treg-treated DCs (Treg-DCs) impaired CD8+ T cell alloreactive responses and skewed CD4+ naive T cell polarization to a regulatory phenotype owning to the decreased IL-12 secretion by Treg-DCs.

Because previous studies focused only on the very narrow range of the ability of Treg-cultured DCs to stimulate CD4+ T cell proliferation, they performed a microarray analysis to search for molecular evidence of Treg-mediated modulation of the DC function to further our understanding (GEO database, GSE72893) [42–44]. Mavin et al. reported that Treg-DCs are a discrete population of mature-DCs and immature-DCs. Compared to mature-DCs, 51 and 93 Treg-DC genes were significantly over- or underexpressed. In this study, we reanalysed the differentially expressed genes in the microarray dataset GSE72893 [42]. As shown in Fig. 5, both DAVID and GO ontology analysis identified cytokine-mediated pathways, which is consistent with the results of the original publication (Additional file 6: Table S5 and Additional file 7: Table S6). However, both tools missed several functional impacts that were experimentally reported in the same publication, such as the suppression of CD8+ proliferation and the reduction of IL-6 secretion, as well as several functional impacts that were reported in independent studies of similar subjects, such as the negative regulation of CD4+ cell proliferation and the involvement of certain chemokine receptors (i.e., CCR2 and CXCR3) (Additional file 8: Table S7). Overall, DAVID reported 134 biological process terms in 18 clusters, GO ontology analysis reported 47 terms, and the HIR V2/GSLA reported 67 terms. Among these terms, 39 (29.10%), 35 (74.47%) and 32 (47.76%) were supported by previously published results. In this regard, GSLA provided more comprehensive and relevant annotations without unacceptably low accuracy. In this case study, many annotations identified by GSLA but missed by other tools could provide clues for further research, as reported in other studies.

To build the reference interactome for humans, many efforts have been made prior to our study. To date, many human interactomes have emerged that provide experimentally reported protein-protein interactions or predicted molecular interactions. For example, BioGRID [19] and IntAct [20] collect the molecular interactions that are reported by experiments. Others provide the molecular interactions that are predicted, such as STRING [32]. Actually, the molecular interactions reported by experiments are considered more accurate than those reported by prediction. However, the number of experimentally reported molecular interactions is too small. In addition to the limited number, molecular interactions reported by high-throughput experiments show a high rate of false positives and occupy the majority of experimentally reported molecular interactions. Moreover, some experimentally confirmed molecular interactions do not have biological significance, such as true interactions with no shared subcellular compartments in normal physiological conditions. In contrast, the predicted molecular interactions show limitations in reliability. STRING is a widely used interactome that provides predicted molecular interactions. It has a total of 7,195,686 predicted human interactions, which cover a very high proportion of the human interactome (78.63%); however, the reliability is only 1.66%, indicating that 1.66% of STRING interactions were expected to represent protein interactions. Therefore, in the evaluation of the new gene annotation prediction described above (Fig. 3), the HIR V2 performs better than both the experimentally reported interaction database and the predicted interactomes. Surprisingly, the previous version of the HIR developed in 2013 still performs better than the other interactomes. Both the HIR V2 and HIR 2013 show balanced sensitivity and reliability (Fig. 3). In conclusion, the HIR V2 is a high-quality reference protein interaction network that complements the existing resources for functional gene interaction analyses.

Based on our high-quality HIR V2, GSLA is able to interpret the functional impacts of observed OTCs in humans. The high precision and high coverage of the HIR V2 can help GSLA report significant functional associations between gene sets. The strategy of GSLA is to evaluate the density of functional gene interactions between individual genes in two gene sets. The previously developed interactomes cannot satisfy this requirement, as we described above. After the evaluation of the functional impact prediction of these interactomes, the HIR V2 showed the best performance. The HIR 2013 also faced this phenomenon [10]. The power of existing human interactomes for GSLA is not as effective as the high-quality interactomes that we specifically developed for humans.

The HIR V2/GSLA system extends the capability of the existing enrichment-based gene set annotation tools. Enrichment-based annotation tools categorize the observed OTCs into established biological processes. Here, GSLA shows the advantage of interpreting the functional impacts of OTCs when there is no established biological concept. In this case, other enrichment-based tools cannot give instructive annotations, while the HIR V2/GSLA system may still help investigators better understand how the observed change connects to related physiologies. In addition, the HIR V2 provides a useful and high-quality functional association resource to researchers that enables them to describe the molecular mechanism of their genes of interest.

The HIR V2 is a high-quality and reliable resource that can be used by investigators to query the functional associations between human genes to obtain a deeper understanding of the molecular mechanisms of their genes of interest. GSLA was developed based on the functional association network of the HIR V2 to interpret the potential functional impacts from observed transcriptomically changed genes, especially in cases where no established biological process can be used to accurately describe the observed OTC. We also present a case study of the HIR V2/GSLA system, the reported annotations of which are more comprehensive and concise compared to other widely used enrichment-based annotation tools, including PANTHER and DAVID.

Project name: HIR V2
Project home page: http://human.biomedtzc.cn
Operating systems: Linux
Programming language:Java,PHP.
Other requirements: No
License: Laravel source code is licensed under the MIT license.

Any restrictions to use by non-academics: No.

HIR human interactome resource

GSLA gene set linkage analysis

OTCs observed transcriptomic changes

FHF fulminate hepatic failure

AUC area under the curve

ROC receiver operating characteristic

Tregs regulatory T cells

NK natural killer

DC dendritic cell

Treg-DCs Treg-treated DCs

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Availability of data and materials

The predicted interactome HIR V2 and its associated GSLA web tool are available in the http://human.biomedtzc.cn.

Competing interests

The authors declare that they have no competing interests.

Funding

This work was supported by the National Natural Science Foundation of China (grant nos. 31571356 and 81830073). The funding body had no roles in the design of the study, and collection, analysis, and interpretation of data or in writing of the manuscript.

Authors' contributions

YT developed the computational workflow and analysed the data with the help of HY. JJ developed the web database and tools. LR wrote the manuscript. HB built the interaction model. XB evaluated the functional association network of HIR V2. WP performed interaction prediction. QL reanalysed the microarray dataset. PC and HY provided initial conceptualizations of the HIR V2/GSLA system. XC designed and coordinated the project and, together with YT, wrote the first draft of this manuscript. All authors reviewed and edited the final manuscript.

Acknowledgements

We thank Prof. Xiao-Ming Zhao for useful discussion and Taizhou Bigdata AI Research Center for providing computing resources.

Authors' information

Yu-Tian Tao, Xiao-Bao Ding, Jie Jin, Hai-bo Zhang, Wen-Ping Guo, and Li Ruan

Institute of Big Data and Artificial Intelligence in Medicine, School of Electronics and Information Engineering, Taizhou University, Taizhou, 318000, China

Qiao-lei Yang, Peng-Cheng Chen, and Heng Yao

Institute of Pharmaceutical Biotechnology, School of Medicine, Zhejiang University, Hangzhou, 310058, China

Xin Chen (corresponding author)

Institute of Big Data and Artificial Intelligence in Medicine, School of Electronics and Information Engineering, Taizhou University, Taizhou, 318000, China

Joint Institute for Genetics and Genome Medicine between Zhejiang University and University of Toronto, Zhejiang University, Hangzhou 310058, China

Institute of Pharmaceutical Biotechnology, School of Medicine, Zhejiang University, Hangzhou, 310058, China

Tel/Fax: +86-571-88208595; Email: [email protected]

Hasin Y, Seldin M, Lusis A. Multi-omics approaches to disease. Genome Biol. 2017;18. doi:10.1186/s13059-017-1215-1.
Mishra N. Science of omics: Perspectives and Prospects for human health care. Integr Mol Med. 2016;3. doi:10.15761/IMM.1000258.
Vlaanderen J, Moore LE, Smith MT, Lan Q, Zhang L, Skibola CF, et al. Application of OMICS technologies in occupational and environmental health research; current status and projections. Occup Environ Med. 2010;67:136–43.
Pinu FR, Beale DJ, Paten AM, Kouremenos K, Swarup S, Schirra HJ, et al. Systems Biology and Multi-Omics Integration: Viewpoints from the Metabolomics Research Community. Metabolites. 2019;9:76.
Ning M, Lo EH. Opportunities and challenges in omics. Transl Stroke Res. 2010;1:233–7.
Barash CI. Omics challenges and unmet translational needs. Appl Transl Genom. 2016;10:1.
Mi H, Huang X, Muruganujan A, Tang H, Mills C, Kang D, et al. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res. 2017;45:D183–9.
Kanehisa M, Sato Y, Furumichi M, Morishima K, Tanabe M. New approach for understanding genome variations in KEGG. Nucleic Acids Res. 2019;47:D590–5.
Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4:44–57.
Zhou X, Chen P, Wei Q, Shen X, Chen X. Human interactome resource and gene set linkage analysis for the functional interpretation of biologically meaningful gene sets. Bioinformatics. 2013;29:2024–31.
Shi D, Zhang J, Zhou Q, Xin J, Jiang J, Jiang L, et al. Quantitative evaluation of human bone mesenchymal stem cells rescuing fulminant hepatic failure in pigs. Gut. 2017;66:955–64.
Obayashi T, Kagaya Y, Aoki Y, Tadaka S, Kinoshita K. COXPRESdb v7: a gene coexpression database for 11 animal species supported by 23 coexpression platforms for technical evaluation and evolutionary inference. Nucleic Acids Res. 2019;47:D55–62.
Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 2015;43 Database issue:D1049-1056.
Binder JX, Pletscher-Frankild S, Tsafou K, Stolte C, O’Donoghue SI, Schneider R, et al. COMPARTMENTS: unification and visualization of protein subcellular localization evidence. Database (Oxford). 2014;2014:bau012.
Kim Y, Min B, Yi G-S. IDDI: integrated domain-domain interaction and protein interaction analysis system. Proteome Sci. 2012;10 Suppl 1:S9.
El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47:D427–32.
Hu Y, Flockhart I, Vinayagam A, Bergwitz C, Berger B, Perrimon N, et al. An integrative approach to ortholog prediction for disease-focused and other functional studies. BMC Bioinformatics. 2011;12:357.
O’Brien KP, Remm M, Sonnhammer ELL. Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 2005;33 Database issue:D476-480.
Oughtred R, Stark C, Breitkreutz B-J, Rust J, Boucher L, Chang C, et al. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 2019;47:D529–41.
Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, Broackes-Carter F, et al. The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2014;42 Database issue:D358-363.
UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47:D506–15.
Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, et al. BioMart--biological queries made easy. BMC Genomics. 2009;10:22.
Yates B, Braschi B, Gray KA, Seal RL, Tweedie S, Bruford EA. Genenames.org: the HGNC and VGNC resources in 2017. Nucleic Acids Res. 2017;45:D619–25.
Winters-Hilt S, Yelundur A, McChesney C, Landry M. Support vector machine implementations for classification & clustering. BMC Bioinformatics. 2006;7 Suppl 2:S4.
Chang C-C, Lin C-J. LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol. 2011;2:1–27.
Yu H, Braun P, Yildirim MA, Lemmens I, Venkatesan K, Sahalie J, et al. High-quality binary protein interaction map of the yeast interactome network. Science. 2008;322:104–10.
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–504.
Alanis-Lobato G, Andrade-Navarro MA, Schaefer MH. HIPPIE v2.0: enhancing meaningfulness and reliability of protein-protein interaction networks. Nucleic Acids Res. 2017;45:D408–14.
Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, et al. Human Protein Reference Database--2009 update. Nucleic Acids Res. 2009;37 Database issue:D767-772.
Gioutlakis A, Klapa MI, Moschonas NK. PICKLE 2.0: A human protein-protein interaction meta-database employing data integration via genetic information ontology. PLoS ONE. 2017;12:e0186039.
Chaurasia G, Iqbal Y, Hänig C, Herzel H, Wanker EE, Futschik ME. UniHI: an entry gate to the human protein interactome. Nucleic Acids Res. 2007;35 Database issue:D590-594.
Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43 Database issue:D447-452.
Josefowicz SZ, Lu L-F, Rudensky AY. Regulatory T cells: mechanisms of differentiation and function. Annu Rev Immunol. 2012;30:531–64.
Arce-Sillas A, Álvarez-Luquín DD, Tamaya-Domínguez B, Gomez-Fuentes S, Trejo-García A, Melo-Salas M, et al. Regulatory T Cells: Molecular Actions on Effector Cells in Immune Regulation. J Immunol Res. 2016;2016:1720827.
Galgani M, De Rosa V, La Cava A, Matarese G. Role of Metabolism in the Immunobiology of Regulatory T Cells. J Immunol. 2016;197:2567–75.
Romano M, Fanelli G, Albany CJ, Giganti G, Lombardi G. Past, Present, and Future of Regulatory T Cell Therapy in Transplantation and Autoimmunity. Front Immunol. 2019;10:43.
Trzonkowski P, Szmit E, Myśliwska J, Dobyszuk A, Myśliwski A. CD4+CD25+ T regulatory cells inhibit cytotoxic activity of T CD8+ and NK lymphocytes in the direct cell-to-cell interaction. Clin Immunol. 2004;112:258–67.
Chang W-C, Li C-H, Chu L-H, Huang P-S, Sheu B-C, Huang S-C. Regulatory T Cells Suppress Natural Killer Cell Immunity in Patients With Human Cervical Carcinoma. Int J Gynecol Cancer. 2016;26:156–62.
Pedroza-Pacheco I, Madrigal A, Saudemont A. Interaction between natural killer cells and regulatory T cells: perspectives for immunotherapy. Cell Mol Immunol. 2013;10:222–9.
Onishi Y, Fehervari Z, Yamaguchi T, Sakaguchi S. Foxp3+ natural regulatory T cells preferentially form aggregates on dendritic cells in vitro and actively inhibit their maturation. Proc Natl Acad Sci USA. 2008;105:10113–8.
Maldonado RA, von Andrian UH. How tolerogenic dendritic cells induce regulatory T cells. Adv Immunol. 2010;108:111–65.
Mavin E, Nicholson L, Rafez Ahmed S, Gao F, Dickinson A, Wang X-N. Human Regulatory T Cells Mediate Transcriptional Modulation of Dendritic Cell Function. J Immunol. 2017;198:138–46.
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013;41 Database issue:D991-995.
Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30:207–10.

Additional file 1: Table S1. Evidence of functional associations and the methods used to compute feature values.

Additional file 2: Table S2. Number of protein interactions and their component proteins collected from three databases.

Additional file 3: Table S3. Feature quality assessment.

Additional file 4: Fig. S1. Receiver operating characteristic (ROC) curves of 36 feature values. Features with areas under the curve (AUC) above 0.6 were selected for use in the SVM model to predict the functional interactions between genes.

Additional file 5: Table S4. Evaluation of the predicted interactions in different datasets.

Additional file 6: Table S5. Annotations produced by the GO enrichment analysis tool for downregulated genes.

Additional file 7: Table S6. Annotations produced by DAVID for downregulated genes.

Additional file 8: Table S7. Annotations produced by the HIR V2/GSLA for downregulated genes.

Download PDF

Version 1

posted

You are reading this latest preprint version

HIR V2: a human interactome resource for the biological interpretation of differentially expressed genes via gene set linkage analysis

Status:

Version 1

Abstract

Figures

Background

Implementation

Results

Discussion

Conclusions

Availability And Requirements

Abbreviations

Declarations

References

Supplemental Legends

Supplementary Files

Status:

Version 1