Cells and viruses
In this study, we used equid alphaherpesvirus 1 (EHV-1) strain MdBio (EHV-1-MdBio)42. EHV-1 was propagated in confluent rabbit kidney (RK-13) epithelial cell line (ECACC: 00021715). RK-13 cells were cultivated in DMEM (Sigma), supplemented with 10% fetal calf serum (Gibco) and 80 μg of gentamycin per ml (Gibco) at 37 °C in the presence of 5% CO2. Virus stock solution was prepared by infecting the freshly prepared cells with 0.1 multiplicity of infection [MOI = plaque-forming units (pfu)/cell]. Infected cells were harvested when complete cytopathic effect was observed. As a next step, three successive cycles of freezing and thawing of infected cells were performed to release of viruses from the infected cells. For the time-course sequencing experiments, the cells were infected with MOI=4 of EHV-1-MdBio in three technical replicates. For the synchronization of infections, cells were incubated for 1 h at 4 °C, followed by the removal of the virus suspension and washing the cells with phosphate-buffered saline. As a next step, fresh culture medium was added to the infected cells, which were incubated for 1, 2, 4, 6, 8, 12, 18 24 and 48 hours post-infection. After the incubation, the culture medium was removed, and the infected cells were frozen at -80 °C until further use.
RNA isolation
The cells underwent RNA extraction following the spin column-based procedure detailed in the NucleoSpin RNA kit from Macherey-Nagel. Initially, the cells were bathed in a lysis buffer with chaotropic ions, a step which served to neutralize RNases. This was followed by the adhesion of DNA and RNA molecules to the silica membrane. All samples were subsequently treated with DNase I to eradicate any lingering genomic (g)DNA. The total RNA was then drawn out into nuclease-free water. Any potentially persistent gDNA was removed with the TURBO DNA-free™ Kit from Invitrogen. The concentration of the samples was ascertained using the Qubit 4.0 fluorometer and Qubit Broad Range RNA Assay Kit, both from Invitrogen. The Agilent TapeStation 4150 was employed for quality control, and only those samples with RIN scores of 9.2 or higher were selected for cDNA production for subsequent procedures.
Purification of polyadenylated RNA
The poly(A)+ RNA fraction was isolated from the total RNA samples using the Oligotex mRNA Mini Kit from Qiagen. Briefly, the samples' volumes were adjusted to 250 µL using RNase-free water. The samples were then treated with 15 µL of Oligotex suspension and 250 µL of OBB buffer, both supplied in the Qiagen kit. The samples were subsequently heated to 70°C for a duration of 3 minutes, before being cooled to 25°C for 10 minutes. Following this, the mixtures were spun down at 14,000×g for 2 minutes, after which the supernatants were disposed of. A quantity of 400 µL of OW2 wash buffer, also from the Oligotex kit, was added to the samples before they were transferred onto the spin columns from the Qiagen kit. They were then spun at 14,000×g for 1 minute. This wash step was repeated once more. Ultimately, the polyadenylated RNA fraction was collected from the membrane by introducing 50 µl of heated elution buffer (EB, Qiagen kit). The RNA was collected in 60 µl EB, and a secondary elution step was also performed to ensure maximum yield.
ONT MinION – dcDNA sequencing
Both poly(A)+-enriched and poly(A)+-enriched samples processed with Terminator were employed to construct libraries for direct cDNA sequencing on the ONT MinION device. The ONT Direct cDNA Sequencing Kit (SQK-DCS109) was used according to the instructions provided in the kit’s manual. Briefly, the RNA samples were combined with the VN primer (VNP; from the ONT kit) and 10 mM dNTPs, then heated to 65°C for 5 minutes. Following this, 5x RT Buffer, RNaseOUT (from Thermo Fisher Scientific), and Strand-Switching Primer (SSP; from the ONT Kit) were incorporated, with the mixtures then heated to 42°C for 2 minutes. The first cDNA strand was produced by adding Maxima H Minus Reverse Transcriptase enzyme (from Thermo Fisher Scientific) to the samples. RT and strand-switching reactions took place at 42°C for 90 minutes, before the enzyme was heat inactivated at 85°C for 5 minutes. The RNase Cocktail Enzyme Mix (from Thermo Fisher Scientific) was used to remove RNA from the RNA-cDNA hybrids, with this step carried out at 37°C for 10 minutes.
For the synthesis of the second strand, LongAmp Taq Master Mix [from New England Biolabs (NEB)] and PR2 Primer (PR2P) were utilized. The specific details of the PCR reactions can be found in the provided reference 42. The fragmented DNAs underwent end-repair and dA-tailing using the NEBNext End repair /dA-tailing Module (NEB) at 20°C for 5 minutes and then 65°C for 5 minutes. This was succeeded by the ligation of the sequencing adapter at room temperature for 10 minutes, with the NEB Blunt /TA Ligase Master Mix (NEB) used for this process. The ONT dcDNA libraries were marked using barcodes as detailed in the provided reference 42 and using the ONT Native Barcoding (12) Kit following the manufacturer's instructions. The prepared and tethered cDNA libraries (200 fmol/flow cell) were purified and introduced into ONT R9.4.1 SpotON Flow Cells. In total, five flow cells were used for the dcDNA sequencing.
In order to circumvent possible "barcode hopping," the earlier time point samples were sequenced separately from those taken at later time points. AMPure XP Beads were utilized after each additional enzymatic process. The samples were then eluted in UltraPure™ nuclease-free water (from Invitrogen), and their concentration was determined with a Qubit 4.0 fluorometer, in conjunction with the Qubit dsDNA HS Assay kit.
Pre-processing and data analysis
The MinION data underwent base calling using the Guppy base caller version 3.4.1, with the --qscore_filtering feature enabled. Reads achieving a Q-score of more than 7 were mapped onto a combined host genome, which comprised the Oryctolagus cuniculus (rabbit) GCF_000003625.3 and Equid alphaherpesvirus 1 (EHV-1) NC_001491.2 reference genomes. This was accomplished using the minimap2 software, with the “-ax splice -Y -C5” options. We chose to exclude MAPQ=0, secondary, and supplementary alignments from all subsequent analysis. Reads aligned to the host genome were associated with host genes based on the GCF_000003625.3_OryCun2.0_genomic.gff genome coordinates. We only used reads that corresponded with the exon structure of the host reference genes (employing a +/- 5 base pair window for matching exon start and end positions) for the computation of raw gene counts. A summary of mapped reads, virus/host ratio, read length/quality for each time point and experiment can be found in Supplementary Table S9.
Differential expression analysis of host genes
EdgeR_3.24.3 29 along with R version 3.5.1 was utilized for the differential expression (DE) analysis. We eliminated host genes that had less than ten reads across any of the three technical replicates, retaining 7308 genes above our chosen threshold (Supplementary Table S10). The three technical replicates were then categorized in EdgeR (utilizing the DGEList function) according to their respective time points (mock, 1h, 2h, 4h, 6h, 8h, 12h, 18h, 24h, 48h). We carried out data normalization (with the calcNormFactors function) using the method="TMM" option, and the robust=True option in the subsequent analysis (with the estimateDisp, glmQLFit, and glmQLFTest functions). We sought to identify key genes involved in the earliest response to viral infection in the host by testing for DE between the mock and the 1h time points. To identify genes that exhibited significantly altered expression levels (utilizing the decideTests function), we set a false discovery rate (FDR) threshold of 0.01, adjusting p-values using the Benjamini & Hochberg procedure (with the p.adjust function and method="BH" option, Hochberg and Benjamini, 1990). In addition, PCA was run using the prcomp R function on the normalized count matrix. The first two principal components were used for a coarse DEG analysis between the mean gene expression of early (mock vs. 1h-6h timepoints) and late (between mean values of 1h-6h vs. 8h-48h samples). Functional enrichment of the resulting DEG lists was done in Cytoscape using the ClueGO plugin (Reactome Pathway database, significant hits with adjusted p-value < 0.05) and manually edited for readability. The normalized pseudo-counts of DE genes, used for clustering and visualization demonstrated superb reproducibility across the three technical replicates (an average R=0.994 in all cross correlations between the three replicates and various time points), as reinforced by the first two PCA coordinates. Supplementary table S1 gathers significant DEGs across all timepoints and in reference to mock samples, and may be easily browsed and filtered according to the “contrast” column. Furthermore, Pearson correlation between normalized gene counts in the virus-host ratio across time points was determined by the corr.test function of the “psych” R package, and significantly correlated genes collected in Supplementary Table S2.
Examination of host genes exhibiting differential expression during viral infection
In order to categorize genes based on changes in their relative expression (rather than absolute expression levels) throughout the entire viral infection, we adjusted the expression of DE genes in relation to their peak expression levels. We executed cluster analysis utilizing the amap_0.8-16 R package Kmeans function with the Euclidean distance method to pinpoint genes with similar expression kinetics during viral infection. This resulted in the identification of six clusters of genes with unique expression profiles (Supplementary Table S4). We visualized the gene expression heatmap of these clusters using the pheatmap R package (version 1.0.12). The median of the relative gene expressions for each time point within each of the identified gene clusters was visualized with the ggplot2 R package (version 2.3.3.2) using the geom_smooth function.
Employing the identified subset of genes, we carried out an over-representation analysis for each cluster using PANTHER (version 16.0, using the 2020-12-01 data set release; https://doi.org/10.1093/nar/gkaa1106) software tool. The reference was the 7308 genes that were above the selected expression threshold. We compiled the results of the over-representation analysis (FDR<0.05) using the Gene Ontology (GO) biological processes and GO molecular functions annotation datasets.
We carried out a STRINGDB network analysis (version 11.5) to uncover possible gene networks in clusters 1-3, where the over-representation analysis revealed little or no enrichment in known pathways. Within these specific gene clusters, we also pinpointed the shared regulatory elements of our DE genes based on the Gene Regulatory Network database (GRNdb) data (Fang et al. 2021). For this analysis, we used the Human Embryo (E-MTAB-3929) transcription factor (TF) database, which consists of 1529 single-cell experiments from samples across five different embryonic stages, resulting in 170256 TF-gene associations between 18355 target genes and 695 TFs. We only included the high-confidence TF-gene pairs (annotated in the original SCENIC cisTarget database or inferred by orthology (Aibar et al., 2017) in our investigation.