We developed an alignment-free multimodal deep learning framework (namely lncRNA_Mdeep) to distinguish lncRNAs from protein-coding transcripts (Figure 1, Methods). In statistical prediction, the jackknife test, q-fold cross-validation (CV) test, and independent dataset test are often used to examine the effectiveness of a predictor in practical application [26]. Of the three test methods, the jackknife test is deemed the least arbitrary that can always yield a unique result for a given benchmark dataset [27]. However, for large scale database, the jackknife test needs to spend lots of time to generate the prediction results. To reduce the computational time and evaluate the generalization performance of a predictor, in this study, we adopted the 10-fold cross-validation (10CV) test and independent dataset test as done by most investigators [17, 28-30]. For 10CV test, the transcripts in the training set are randomly partitioned into 10 subsets with approximately equal size, and one of the 10 subsets is singled out in turn as test transcripts and the other 9 subsets are used as the training transcripts. This process is repeated for 10 iterations, each time setting aside a different test subset. The results from the 10 folds can then be averaged to produce a single estimation [31, 32]. For independent dataset test, all transcripts in testing set are outside the training set.
To evaluate the performance of lncRNA_Mdeep, we first investigated the performance of lncRNA_Mdeep with different model architectures on human dataset in 10CV test, and shown the effect of different hyper-parameters in DNNs and CNN, then compared lncRNA_Mdeep with eight existing state-of-the-art methods (i.e., CNCI [14], CPAT [15], PLEK [16], lncRNA-MEDL [17], CPC2 [19], lncRNAnet [20], LncFinder1 and LncFinder2 [21]) on human and 11 cross-species datasets in independent test. LncFinder1 means the LncFinder without secondary structure, and LncFinder2 means LncFinder with secondary structure.
LncRNA_Mdeep is implemented in python 3 using keras 2.2.4 [33] with the backend of Tensorflow-gpu (1.9.0) [34]. All the experiments are implemented on an Ubuntu system with a NVIDIA TITAN V GV100.
Performance of lncRNA_Mdeep
Performance of different model architectures
We separately implemented the DNN model with OFH feature as input (namely OFH_DNN), DNN model with k-mer feature as input (namely k-mer_DNN), CNN model with one-hot encoding as input (namely One-hot_CNN), the combinations of these models (i.e., OFH_DNN + k-mer_DNN, k-mer_DNN + One-hot_CNN, and OFH_DNN + One-hot_CNN), and the decision fusion of three models on Human training dataset in 10CV test. The results are shown in Table 1, from which we can see that the accuracy, Sn, Sp and MCC of lncRNA_Mdeep are 98.73%, 98.95%, 98.52% and 0.9748, respectively. By comparing the performance of OFH_DNN, k-mer_DNN, One-hot_CNN and lncRNA_Mdeep, we found that the accuracy of lncRNA_Mdeep is 2.99%, 2.20%, and 2.91% higher than that of OFH_DNN, k-mer_DNN, and One-hot_CNN, respectively. The MCC of lncRNA_Mdeep is 0.0577, 0.0441, and 0.0579 higher than that of OFH_DNN, k-mer_DNN, and One-hot_CNN, respectively. The Sn of lncRNA_Mdeep is 4.51 %, 2.55%, and 1.94% higher than that of OFH_DNN, k-mer_DNN, and One-hot_CNN, respectively. The Sp of lncRNA_Mdeep is 1.48%, 1.86%, and 3.89% higher than that of OFH_DNN, k-mer_DNN, and One-hot_CNN, respectively. These results show that lncRNA_Mdeep through incorporating three different input modalities achieves better performance than individual models. In addition, k-mer_DNN shows the best performance among three individual models (i.e., OFH_DNN, k-mer_DNN, and One-hot_CNN).
By comparing the performance of different combination of three individual models and lncRNA_Mdeep, we found that the accuracy of lncRNA_Mdeep is 2.76%, 0.37%, and 1.13% higher than that of OFH_DNN + k-mer_DNN, k-mer_DNN + One-hot_CNN, and OFH_DNN + One-hot_CNN, respectively. The MCC of lncRNA_Mdeep is 0.0537, 0.0074, and 0.0222 higher than that of OFH_DNN + k-mer_DNN, k-mer_DNN + One-hot_CNN, and OFH_DNN + One-hot_CNN, respectively. These results show that lncRNA_Mdeep through fusing three models achieves better performance than that of fusing any two models.
Furthermore, we also compared lncRNA_Mdeep with a decision fusion strategy of voting. As shown in Table 1, the performance of lncRNA_Mdeep is 0.31% and 0.0059 higher than that of voting fusion strategy in terms of accuracy and MCC. All the results from Table 1 show that lncRNA_Mdeep is a superior deep learning framework and it can effectively distinguish lncRNAs from protein-coding transcripts.
Performance of lncRNA_Mdeep
Performance of different model architectures
We separately implemented the DNN model with OFH feature as input (namely OFH_DNN), DNN model with k-mer feature as input (namely k-mer_DNN), CNN model with one-hot encoding as input (namely One-hot_CNN), the combinations of these models (i.e., OFH_DNN + k-mer_DNN, k-mer_DNN + One-hot_CNN, and OFH_DNN + One-hot_CNN), and the decision fusion of three models on Human training dataset in 10CV test. The results are shown in Table 1, from which we can see that the accuracy, Sn, Sp and MCC of lncRNA_Mdeep are 98.73%, 98.95%, 98.52% and 0.9748, respectively. By comparing the performance of OFH_DNN, k-mer_DNN, One-hot_CNN and lncRNA_Mdeep, we found that the accuracy of lncRNA_Mdeep is 2.99%, 2.20%, and 2.91% higher than that of OFH_DNN, k-mer_DNN, and One-hot_CNN, respectively. The MCC of lncRNA_Mdeep is 0.0577, 0.0441, and 0.0579 higher than that of OFH_DNN, k-mer_DNN, and One-hot_CNN, respectively. The Sn of lncRNA_Mdeep is 4.51 %, 2.55%, and 1.94% higher than that of OFH_DNN, k-mer_DNN, and One-hot_CNN, respectively. The Sp of lncRNA_Mdeep is 1.48%, 1.86%, and 3.89% higher than that of OFH_DNN, k-mer_DNN, and One-hot_CNN, respectively. These results show that lncRNA_Mdeep through incorporating three different input modalities achieves better performance than individual models. In addition, k-mer_DNN shows the best performance among three individual models (i.e., OFH_DNN, k-mer_DNN, and One-hot_CNN).
By comparing the performance of different combination of three individual models and lncRNA_Mdeep, we found that the accuracy of lncRNA_Mdeep is 2.76%, 0.37%, and 1.13% higher than that of OFH_DNN + k-mer_DNN, k-mer_DNN + One-hot_CNN, and OFH_DNN + One-hot_CNN, respectively. The MCC of lncRNA_Mdeep is 0.0537, 0.0074, and 0.0222 higher than that of OFH_DNN + k-mer_DNN, k-mer_DNN + One-hot_CNN, and OFH_DNN + One-hot_CNN, respectively. These results show that lncRNA_Mdeep through fusing three models achieves better performance than that of fusing any two models.
Furthermore, we also compared lncRNA_Mdeep with a decision fusion strategy of voting. As shown in Table 1, the performance of lncRNA_Mdeep is 0.31% and 0.0059 higher than that of voting fusion strategy in terms of accuracy and MCC. All the results from Table 1 show that lncRNA_Mdeep is a superior deep learning framework and it can effectively distinguish lncRNAs from protein-coding transcripts.
Effects of different hyper-parameters
We evaluated the effects of two parameters of k in k-mer feature and maxlen for padding one-hot encoding. The accuracies of k-mer_DNN and One-hot_CNN on Human training dataset in 10CV test at different k and maxlen are shown in Figure 2. As shown in Figure 2A, we found that k-mer_DNN achieved the highest accuracy when k = 6. Results in Figure 2B shows that One-hot_CNN achieves the highest accuracy when maxlen = 3000, Therefore, we set k = 6 when we extract the k-mer feature, and fix the one-hot encoding of a transcript as a 4 × 3000 matrix. All other hyper-parameters in lncRNA_Mdeep are selected by using hyperopt [35] strategy, and these parameters are listed in Additional file 2.
Comparison with other existing methods
We compared lncRNA_Mdeep with other eight existing alignment-free methods (i.e., CNCI, CPAT, PLEK, lncRNA-MEDL, CPC2, lncRNAnet, LncFinder1 and LncFinder2) on human datasets and cross-species datasets. LncRNA_Mdeep is trained on Human training dataset, and since most existing methods do not provide the retraining option, we used their per-trained models.
Comparison performance on human dataset
We first compared the performance of lncRNA_Mdeep and other eight existing methods on Human testing dataset. The results are shown in Table 3, from which we can see that lncRNA_Mdeep achieves an accuracy of 93.12%, which is 6.72%, 5.14%, 15.41%, 7.65%, 15.14%, 0.94%, 6.90%, and 6.24% higher than that of CNCI, CPAT, PLEK, lncRNA-MEDL, CPC2, lncRNAnet, LncFinder1, and LncFinder2, respectively. MCC and Sp of lncRNA_Mdeep are 0.8653 and 88.97%, which are at least 0.0183 and 1.24% higher than that of eight methods. Although CNCI achieves 97.42% sensitivity, which is 0.15% higher than that of lncRNA_Mdeep, it shows lower performance in terms of accuracy, Sp, and MCC.
To further evaluate the “memory” effect, we trained 3 lncRNA_Mdeep models (namely model-1, model-2, and model-3) on Human training dataset, Human gene-wise training dataset, and Human non-gene-wise training dataset, respectively, and compared the performance of these 3 lncRNA_Mdeep models and other eight existing methods on Human gene-wise testing dataset. The results are shown in Additional file 3, from which we can see that lncRNA_Mdeep model-1 achieves best performance, and lncRNA_Mdeep model-2 and lncRNA_Mdeep model-3 still show the better performance than most of the existing methods. By comparing the performance of lncRNA_Mdeep model-2 with model-3, we found that lncRNA_Mdeep model-2 show the better performance than model-3. These results indicate that even if there is no overlapped originated gene between training transcripts and testing transcripts, our lncRNA_Mdeep method can still achieve a superior performance. It should be pointed out that all of other compared methods have not been retrained, and some transcripts in their training datasets may be same as the transcripts in Human gene-wise testing dataset, thus the prediction performance of these compared methods might be over-estimated.
Comparison performance on cross-species datasets
We also compared the performance of lncRNA_Mdeep and other eight existing methods by using 11 cross-species datasets as the independent testing datasets. In these tests, our lncRNA_Mdeep and other eight predictors are trained on the human dataset, and no additional training processes with other species were implemented. The results are shown in Table 4. On mouse testing dataset, lncRNA_Mdeep achieves 92.52% accuracy, which is 5.43%, 2.05%, 20.63%, 3.99%, 12.09%, 0.71%, 4.05%, and 3.53% higher than that of CNCI, CPAT, PLEK, lncRNA-MEDL, CPC2, lncRNAnet, LncFinder1, and LncFinder2, respectively. On other 10 cross-species testing datasets, the accuracies of lncRNA_Mdeep for arabidopsis, Bos taurus, C. elegans, chicken, chimpanzee, frog, fruit fly, gorilla, pig and zebrafish are 95.73%, 97.33%, 98.87%, 96.06%, 96.76%, 96.80%, 96.10%, 96.65%, 96.87%, and 96.76%, respectively. LncRNA_Mdeep shows the best performance on 5 out of 11 cross-species testing datasets, and lncRNA-MDFL shows the best performance on 3 cross-species testing datasets, and lncFinder2 shows the best performance on 3 cross-species testing datasets, and CPAT shows the best performance on 1 cross-species testing datasets. The results on the 11 testing datasets show that lncRNA_Mdeep has the superior performance for distinguishing lncRNAs from protein-coding transcripts.
Furthermore, we compared the prediction results of lncRNA_Mdeep model-1, model-2, and model-3 on 11 cross-species datasets. The results are shown in Additional file 4. By comparing the prediction results of lncRNA_Mdeep model-1, model-2, and model-3, we found that the lncRNA_Mdeep model-1 shows the best performance on 7 out of 11 cross-species datasets, and lncRNA_Mdeep model-2 shows the best performance on one cross-species dataset. By comparing the prediction results of lncRNA_Mdeep model-2 and model-3, we found that lncRNA_Mdeep model-2 achieves the better performance than model-3 on all 11 cross-species datasets. These results further show that the “memory” effect on lncRNA_Mdeep is limited, and our lncRNA_Mdeep show a better generalization performance.