Background. Intron mediated enhancement (IME) is the potential of introns to enhance expression of its respective gene. This essential function of introns has been observed in a wide range of species, including fungi, plants, and animals. Studies in the plant Arabidopsis thaliana have shown that enhancing introns exhibit a distinct base composition and are generally the first intron located close to the transcription start site. However, the mechanisms underlying the enhancement are as of yet poorly understood. The goal of the study was to identify potential IME-related sequence motifs and genomic features found in first introns of genes in the plant Arabidopsis thaliana.
Results. Based on the rationale that functionale sequence motifs are evolutionarily conserved, we exploited the deep sequencing information available for Arabidopsis thaliana, covering more than one thousand Arabidopsis accessions, and identified 81 candidate hexamer motifs with increased conservation across all accessions, and which also exhibited positional occurrence preferences. Of those, 71 were found associated with increased correlation of gene expression of genes harboring them, suggesting a cis-regulatory role. Filtering further for effect on gene expression correlation yielded a set of 16 hexamer motifs, corresponding to five consensus motifs. While all five motifs represent new motif definitions, two are similar to the two previously reported IME-motifs, whereas three are altogether novel. To identify additional IME-related genomic features, Random Forest models were trained for classification of gene expression level based on an array of different sequence-related features. The results indicate that introns harbor information with regard to gene expression level and suggest sequence-compositional features as most informative, while position-related features, that were thought to be of central importance before, were found with lower than expected relevance.
Conclusions. Exploiting deep sequencing and broad gene expression information and on a genome-wide scale, this study confirmed the regulatory role on first-introns, characterized their intra-species conservation, and identified a set of novel sequence motifs located in first introns of genes in the genome of the plant Arabidopsis thalian a that may play a role in inducing high and correlated gene expression of the genes harboring them.