RNA modification is a crucial post-transcriptional process that plays a pivotal role in fine-tuning gene expression, mRNA stability, splicing, translation, and ultimately, cellular function (1). These chemical modifications encompass a wide array of covalent alterations, including but not limited to methylation, pseudouridylation, and acetylation, each adding an additional layer of complexity to the regulatory landscape (2). Unravelling the intricacies of RNA modifications and deciphering their underlying regulatory mechanisms have emerged as fundamental endeavours within the domain of molecular biology (3). Consequently, these investigations hold the potential to provide valuable insights into disease pathogenesis and offer novel avenues for therapeutic intervention.
In particular, RNA modification is a crucial concern in the In vitro transcribed (IVT)-mRNA vaccine, which has emerged as a promising therapeutic modality for a variety of clinical applications due to its potential to deliver specific genetic information into target cells (4, 5). IVT-mRNA offers advantages such as high specificity, low toxicity, and the ability to encode a wide range of therapeutic proteins (6). However, the immunogenicity of the IVT-mRNA vaccine can pose challenges to its clinical applicability, necessitating the development of strategies to enhance its stability and efficacy while minimizing adverse immune responses (7). Uridine modifications, including Pseudouridine (Ψ) (8), N1-Methylpseudouridine (m1Ψ) (9), 5-methoxyuridine (5moU) (10), have been recognized as crucial elements in mRNA design to improve stability, translation efficiency, and immunogenicity profile. One area of interest in optimizing the performance of the IVT-mRNA vaccine is the modification of uridine residues within the mRNA molecule (11). Among the various uridine modifications mentioned, the 5-Methoxyuridine (5moU) modification has newly garnered significant attention due to its potential to enhance the properties of IVT-mRNA (10). The 5-Methoxyuridine (5moU) modification is a chemical modification of RNA in which a methoxy group is added to the fifth carbon of the uridine base, which has also previously been reported in the database MODOMICS that contains information for 5moU RNA modification in native biological samples and its link to human disease (12). In the context of IVT-mRNA, the 5moU modification has been associated with increased resistance to nuclease degradation, improved mRNA half-life, and reduced immunogenicity, thus accurate detection is crucial for the quality control of the IVT-mRNA vaccine (13), thus boosting a better understanding of the impacts of the modification on mRNA stability, translation, and immune responses with the respective functional design of IVT-mRNA vaccine.
Nonetheless, the comprehensive detection and quantification of 5moU modifications in IVT-mRNA remain challenging. Conventional next-generation sequencing (NGS) based experimental methods for 5moU detection includes antibody-based methods (14), chemical labelling methods (15), enzymatic conversion and detection (16), and mass spectrometry (MS) (17). The major limitation of above methods in the case of therapeutic IVT mRNA studies is they cannot provide read-level resolution. Fortunately, third generation platform, Oxford nanopore direct RNA sequencing, has emerged to address this issue (18). This technique capitalizes on the detection of alternations in electric current as individual RNA molecules pass through nanopores, allowing for the reconstruction of nucleotide sequences based on these electrical signals, and offers long reads that can cover the entire length of transcripts, enabling the preservation of epitranscriptomic information without the need for reverse transcription (19). By analyzing the disruptions in electric signals when a modified base is present in the nanopore, these modifications can be identified by comparing the observed current with a reference ground truth (20, 21).
In 2021, ELIGOS presented the first, which is also the only published ONT-based 5moU prediction tool until now. This method relies on statistical comparison of site-level base call error profiles in target and unmodified control datasets (22). Landscapes of modifications on individual molecules cannot be known in this case. Also, a control sample is required. Here, we presented a novel machine-learning based 5moU detection tool that allows de novo (i.e., no control sample required) and read-level analysis. We utilized signal features (i.e., current intensities mean, standard deviation, median and dwell time) extracted from 100% modified and unmodified IVT data. Classical machine learning algorithms, including Random Forest (RF), Support Vector Machine (SVM) and XGBoost were used to train 5-mer specific models (see Fig. 1). 5-mer signal features, plus XGBoost algorithm, achieved exceptional performance for read-level modification detection (maximum AUROC = 0.9567 for AGTTC, minimum AUROC = 0.8113 for TGTGC), which surpasses the existing comparison models (ELIGOs AUC 0.751 for 5moU IVT dataset) (22).