Background
Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: i) they identify a large number of triplex forming lncRNAs, but the limited number of experimental verified triplex forming lncRNA indicate that maybe not all of them can from triplex in practice, and ii) their prediction only consider the theoretical relationship while lacking the features from the experimentally verified data.
Results
In this work, we develop an integrated program named TriplexFPP (Triplex Forming Potential Prediction), which is the first machine learning model in DNA:RNA triplex prediction. TriplexFPP predicts the most likely triplex forming lncRNAs and DNA sites based on the experimentally verified data, where their high-level features are learned by the deep neural networks. In the 5-fold cross validation, its average values of Area Under the ROC curves and PRC curves for triplex forming lncRNA and DNA sites predictions are 0.9949 and 0.9999, 0.8775 and 0.9692, respectively. Besides, we also briefly summarized the cis and trans targeting of triplexes lncRNAs.
Conclusions
The TriplexFPP can predict the most likely triplex forming lncRNAs from all the lncRNAs with computationally defined triplex forming capacities, and predict the potential of a DNA site to become a triplex. It may provide insights to the exploration of lncRNA functions.