ncRNA profiles within mucosal biopsies separate CD from controls.
Overall ncRNA transcriptomic profiles from ileal biopsies explained 34% and 12% of the variances with the first two PCs (Fig. 1a), whereas in rectal biopsies it only explained 16% and 7% variances (Fig. 1b), respectively. The first two PCs from the entire ncRNA transcriptomic profile showed that ncRNA levels have the potential by nature to discriminate CD from non-IBD controls (Fig. 1a, b). Further differential expression (DE) analysis identified a total of 89 DE ncRNAs in the ileum when comparing 274 CD cases to 71 controls. Of them, 62 were up-regulated in CD, while 27 were down-regulated (Fig.1c). A similar comparison in rectal biopsies showed 41 DE ncRNAs when 329 CD cases were compared to 61 controls (Fig. 1d). Of those, 18 were up-regulated and 23 were down-regulated in CD. A hierarchical clustering of DE ncRNAs showcased two independent clusters representing CD and control groups. This pattern was observed for both ileum and rectal biopsies using the corresponding DE ncRNAs observed in each tissue separately (Fig. 1e, f). List of all FDR significant DE ncRNAs at FDR < 0.05 as well as the nominally significant DE ncRNAs (P < 0.05) for both ileal and rectal biopsies are provided in Table S3. To further test whether the expression of CD specific ncRNAs are consistent across the different location of the intestine, we compared the log2FC of ncRNAs that are observed in both ileal and rectal biopsies. Of the 130 DE ncRNAs tested, 17 were shared in both the ileum and rectum, 87 were differentially expressed only in ileum, and 25 were differentially expressed only in the rectum (Fig. 1g, Table 2, Fig. S2a). We determined whether the disease-specific expression of the 130 DE ncRNAs are expressed in the same direction or magnitude regardless of tissue type by comparing the log2FC of ileal DE ncRNAs to those in the rectum (Fig. 1g). Surprisingly, 88% (n=114) of DE ncRNAs were expressed in the same direction regardless of tissue types, with a strong positive correlation of R=0.69; P <2.2e-16. The remainder (n=16) were directionally inconsistent, with a strong negative correlation of R=-0.79; P <2.2e-16 (Fig. 1h). These 16 ncRNAs showed unique, statistically significant differences when examined in both ileal and rectal samples amongst disease status. Similar to previous analysis10, most of the DEncRNA found in our analysis were either antisense or lincRNAs.
Next, to test our findings robustness and reliability, we compared our current results to our previous study10, which contains a subset of ileal samples from the same RISK cohort. For this comparison, we excluded the matched ileal samples (n=212) that were shared with our previous study, and in this subset analysis, 188 DE ncRNAs were observed. A direct log2FC comparison of ncRNAs expression in CD patients from both the studies showed a directionally consistent pattern with a strong positive correlation of R2=0.86; P <2.2e-16, validating our methods and replicability (Fig. S2b).
Taken together, these results show that changes in mucosal ncRNA levels are specific to CD and are most prevalent in the small intestine, but interestingly shows distinct changes in ncRNA signatures in the rectum. In contrast, from the 114-disease specific DE ncRNAs, we also found a small number of were miRNAs (ncRNAs with < 22 length of nucleotides), namely, MIR-1244-1, MIR1244-2, and MIR1244-3, regardless of tissue location.
Pathways annotated to CD specific ncRNAs.
The total ncRNA transcriptomic data appear to show larger CD-specific differences in the ileum than in the rectum. Based on this, it was not surprising that the TopGO pathway analysis on DE ncRNAs results showed more significant pathways hits in ileal (n=136) than rectal biopsies (n=36) between cases and controls (Table S4-S5). However, there were 29 common pathways observed in ileal and rectal gene ontologies, including intracellular transport (GO:0046907) in the cellular component category that was annotated by RN7SL2.
Behavior-specific DE ncRNAs in intestinal biopsies.
Next, we examined if PC1 of 130 DE ncRNAs that were observed between CD and controls were able to differentiate Crohn’s disease behaviors (B1, B2 and B3). As expected, the PC1 had a potential to differentiate one from the others (Fig. 2a). Therefore, we further extended our analysis based on CD disease behaviors. First, we compared the expression of ncRNAs among individual CD behavior group against controls in both ileal and rectal biopsies and revealed ncRNAs specific to distinct CD behavior groups. We noticed more DE ncRNAs in ileum; B1 (n=70), B2 (n=124) and B3 (n=22) (Fig. 2b, Table S6) than in the rectal biopsies; B1 (n=23), B2 (n=9) and B3 (n=14) (Fig. 2c, Table S7).
Similarly, the comparison among CD disease behavior groups from inflammatory (B1) to stricturing (B2) to penetrating (B3) showed an increased pattern of the variance explained 32%, 35%, 45%, respectively (Fig. S5 a-g). The DE analysis among B1, B2 and B3 showed a similar tread, which is an ileal-centric nature with more DE ncRNAs were observed in ileal B2 vs. B1 (n=35), B3 vs. B1 (n=13) and B3 vs. B2 (n=14) than was found in the rectum (Table S8; Fig. 2c). Interestingly, all DE ncRNAs observed in B3 vs. B1 were also observed in B2 vs B1 and B3 vs B2 comparisons, potentially demonstrating certain CD characteristics that may be present across B1, B2, and B3. Notably, most of them were antisense or lincRNAs types of non-coding elements (Fig. S6 a, b). Similar comparison in rectal biopsies showed no DE ncRNAs to be statistically significant (Fig. 2d). Thus, our results indicate that a set of ncRNAs in ileal biopsies reflects the Montreal CD disease behaviors, whereas such a pattern was not observed in rectal biopsies of CD patients.
Inflammation and disease location-specific ncRNAs in CD.
Inflammation is often a visible hallmark signature of CD, thus we next examined whether the expression of ncRNAs in the inflamed group of CD patients could distinguish them from the non-inflamed groups and furthermore, the location of disease. We used two groups that are assigned by physicians based on i) inflammatory status (inflamed vs non-inflamed), and ii) disease (inflammation) location such as ileal-centric (L1), colonic (L2), and ileocolonic (L3). We tested whether the PC1 obtained from 130 CD specific DE (Fig. 1) can differentiate inflammatory status and disease location in CD patients. Overall, PC1 obtained from ileal biopsies showed significant differences between inflamed and non-inflamed/controls samples. Especially PC1, as it largely differentiated the CD patients with L1 and L3 ileal inflamed disease locations from controls (p<2.2e-16), as compared to L2 CD patients with colon inflamed disease location (p<2.2e-14) or patients with non-inflamed sites (p<3.5e-08) groups (Fig. 3a).
Therefore, we further subjected the CD patients to identify DE ncRNAs specific for inflammation status and location of the disease, which are classified through physician’s clinical assessment. Since this study is primarily focused on CD, where the disease largely occurs in the ileum, we restricted this analysis to only ileal biopsies. In order to identify the ncRNAs specific for disease locations, the L1 and L3 CD patients (n=198) were combined as the inflamed group and then we compared with non-inflamed ileal CD patients (n=20) alone and then to non-inflamed+L2 (n=76) groups together, keeping in mind that the L2 CD patients were inflamed only in the colonic location, not in the ileum. Our DE analysis on the ileal biopsies showed 21 DE ncRNAs for L1+L3 vs. non-inflamed (Fig. 3c), and 31 DE ncRNAs for L1+L3 vs. non-inflamed+L2 (Fig. 3c) (Table S9). A total of 10 DE ncRNAs were shared in both comparisons. Likewise, using log normalized FPM (fragments per million) of 21 DE ncRNAs showed better differentiation between the inflamed (L1+L3) non-inflamed groups (Fig. 3e) rather than the other comparison with 31 DE ncRNAs (Fig. 3f), which incorporated L2 samples into non-inflamed group. Further, the FPM analysis on specific inflammation location showed both L1 and L2 groups as being similar, while L2 and non-Inflamed groups as more closely related (Fig. 3g-h). Using these results in comparing disease inflammation and location status, ncRNA transcriptomic profiles of L2 CD patients were more like CD patients with non-inflamed ileal disease location than inflamed ones (L1, L3).
DE ncRNAs RN7SL2, mir-1244-2,3,4.
miRNAs have been observed to regulate multiple facets of gene expression including other non-coding RNAs, and are known to be dysregulated during CD 24, yet the mechanisms remain unclear. Of interest is the down regulation of RN7SL2 by mir-125b to control cell death 25. In our analysis, we found an increase in the levels of mir-1244-2,3,4 and a decrease in the levels of RN7SL2 in CD vs controls (FDR significance, but not in log2FC) (Fig. 4a). Using IntaRNA to test for molecular interactions amongst RN7SL2 and miRNA-1244-x (2,3,4), we obtained six possible predicted conformations with stable base pairing (Table 3). For two of these possible interactions, one had complementary base pairing for the miRNA-ncRNA interaction at RN7SL2, nucleotides 268-289 with miRNA-1244-2,3,4 at nucleotides 28-48, while the second predicted interaction was RN7SL2, nt 195-199 with miRNA-1244-2,3,4 at nts 8-12. The G values of these interactions suggest stable binding and schematically represented in Fig. 4b, along with a more realistic molecular model generated by using SRP RNA structures based on ribosomal RNA interactions (Fig. 4c) 26. Taken together, these results suggest that changes in miRNA levels during CD have physiological impacts that can change cellular function and potentially alter disease outcomes, with RN7SL2 being a potential candidate for targeted therapy.
ncRNA as a potential tool to predict disease status, disease behaviors and disease location in IBD.
Lastly, we tested the accuracy of these non-coding elements to predict disease from controls, disease behavior, and disease inflammation in ileal biopsies through RandomForest27 approach. To test whether ncRNAs serve as a potential index to predict disease status, we used the entire dataset of both CD and CTRLs. The specificity and sensitivity of the modeling showed an average AUC of 0.80 with 84% accuracy, reflecting the robustness of these DEncRNAs to decipher CD vs controls (Fig. 5a). Whereas, in terms of disease behavior, due to our dataset being composed of limited sample size in B2 and B3 when compared to B1, we arbitrarily down sampled the larger dataset in each comparison with respect to the smaller comparative dataset. Therefore, to predict the B2, and B3, from B1, we randomly down sampled B1 to mitigate sample bias. With this, our model predicted B2 from B3 with a mean AUC of 0.84 and 80% accuracy (Fig. 5c), B2 from B1 with 0.72 AUC and 62% accuracy (Fig. 5b), and B3 from B1 with 0.68 AUC and 68% accuracy (Fig. 5d). Likewise, in comparing inflammatory status, the inflamed samples were down sampled with respect to non-inflamed samples. Our model showed better prediction to non-inflamed (without L2) from inflamed with 0.63 AUC and 72% accuracy (Fig. 5e). Interestingly, a poorer prediction was observed when non-inflamed and L2 were tested against inflamed displaying 0.55 AUC and 61% accuracy (Fig. 5f). Details of sample sizes in each comparison and 5-fold cross-validation prediction results for disease status, disease behavior and inflammation status are provided in Table S10-S15.