Identification of All-to-All Protein-Protein Interactions Based on Deep Hash Learning

doi:10.21203/rs.3.rs-778066/v1

Download PDF

Research Article

Identification of All-to-All Protein-Protein Interactions Based on Deep Hash Learning

https://doi.org/10.21203/rs.3.rs-778066/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Background: Protein-protein interaction (PPI) is vital for life processes, diseases treatment and new drugs discovery. The computational prediction of PPI is well accepted for its inexpensive and efficient nature comparing to the wet-lab experiment. When a new protein comes, one try to find whether there is any PPI relationship between this new protein and existing proteins, the current computational prediction methods usually compare this new protein to existing proteins one by one in pairwise. This is time comsuming.

Results: We proposed an more efficient model, Deep Hash Learning Protein-and-Protein Interaction (DHL-PPI) model, to predict all-to-all PPI relationship on a database. First, DHL-PPI encodes a protein sequence into a binary Hash code based on the features extracted from sequences by using deep learning technique. This encoding scheme enables the PPI discrimination problem to be a much simpler searching problem. A protein with a binary code can be regarded as a number. In the prescreen of PPI prediction stage, the string match problem of searching a string against a database with M proteins can be turned into a much more simpler problem: to find a number inside an sorted array with length M. This prescreen process narrows down proteins inside the whole database into a much smaller candidate set for further confirmation. At last, DHL-PPI uses the Hamming distance to determine the final PPI relationship.

Conclusions: The experimental results confirmed that DHL-PPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHL-PPI is superior or competitive to the other state-of-the-art methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHL-PPI decrease the usual time compexity of O(M2 ) to O(MlogM) for predicting all-to-all PPI interactons between any pairs in M proteins on a database. A protein database can be stored in the proposed encoding scheme and waited to be searched, which is a potential novel encoding scheme to cope with current searching problem for a large volume of database.

Bioinformatics

Protein-protein interaction

Deep learning

Binary Hash code

Binary search

Hamming distance

No competing interests reported.

Download PDF

Editorial decision: Major revision
27 Jan, 2022
Reviews received at journal
26 Jan, 2022
Reviews received at journal
05 Dec, 2021
Reviewers agreed at journal
28 Nov, 2021
Reviewers agreed at journal
28 Nov, 2021
Reviewers invited by journal
04 Sep, 2021
Editor assigned by journal
04 Sep, 2021
Editor invited by journal
01 Sep, 2021
Submission checks completed at journal
01 Sep, 2021
First submitted to journal
03 Aug, 2021

You are reading this latest preprint version

Identification of All-to-All Protein-Protein Interactions Based on Deep Hash Learning

Status:

Version 1

Abstract

Full Text

Additional Declarations

Status:

Version 1