Genome analysis
Genome sequences from SARS-CoV-2 (isolate NC045512.2 = Wuhan-Hu-1), SARS-CoV-1 (AY291315.1 = FFM1), MERS-CoV (NC_019843.3 = EMC2012), human pathogenic CoVs (NC-006577.2 = HKU1; AY391777.1 = OC43, NC-002645.1 = 229E; NC-005831.2 = NL63) and bat CoVs (MN996532.2 = RaTG13, KC881005.1 = RsSHC014; MG916904.1 = Ra1359) were downloaded from GenBank (https://www.ncbi.nlm.nih.gov/genbank/). Retro.hg38.v1 (https://github.com/mlbendall/telescope_annotation_db/tree/master/builds) was employed as an RE database. The database contains 28.513 RE and is made of “RepeatMasker” hits for 60 HERV families (RepeatMasker Open-4.0, http://www.repeatmasker.org/) and all LINE elements from “L1base v2” (https://l1base.charite.de/)[51]. Alignment of the retro.hg38.v1 database to CoV genomes was done by the genome sequence aligner “nucmer”[52] (4.0.0beta2) on galaxy.org[53] and a local installation of “LAST” (v1250), a programme for genome scale sequence comparison[54]. The minimum sequence length cut-off (with 100 % sequence identity) was stepwise chosen at 12, 15, 18, 21, 24, and ³ 27, based on an immuno-relevant epitope size of about 4 – 12 amino acids (aa) (many epitopes are less than 8 aa, about 25 % £ 6 aa, but only a few at 4 aa[55]). The nucmer “-b” and “-L” variables were used accordingly, and “Show-Coords” as well as “Mummerplot” from the “MUMmer 4” package[52] employed to extract and plot data. Regarding “LAST,” firstly, an RE database was built (“lastdb -uNEAR -c RE_ db retro.hg38.v1.fa”) and then CoV genomes were compared to the RE database (“lastal -D100 RE_db CoV_genome.fa > RE_db_CoV.maf”).
Epitope-specific antibody data in COVID-19 patients
The SARS-CoV-2 epitope-specific antibody data (IgG) in severely vs. mildly affected COVID-19 patients are from Schwarz et al.[56] “Peptide microarray data – severe vs. mild – IgG,” with the peptides: 1060 (NSP12, QTVKPGNFNKDFYDF, LogFC 5,3, p-value 2,4E-04, FDR-adj. p-value 2,8E-02), 1243 (NSP16, ENDSKEGFFTYICGF, LogFC 2,2, p-value 4,0E-02, FDR-adj. p-value 5,2E-01), 1227 (NSP13, IPARARVECFDKFKV, LogFC -0,9, p-value 3,2E-01, FDR-adj. p-value 5,3E-01) and 1690 (Spike, AQVKQIYKTPPIKDF, LogFC 0,2, p-value 8,3E-01, FDR-adj. p-value 8,5E-01). “L1base v2” was used for comparison with coding LINE1 sequences (https://l1base.charite.de/)[51]. Known SARS-CoV-2 B- and T-cell epitopes are from Phan et al.[57] and Griffoni et al.[58]. The PDB data for the SARS-CoV-2 RdRp (PDB ID: 7BW4), helicase (PDB ID: 7NNG), 2'-O-ribose methyltransferase (PDB ID: 7JYY) and -spike protein (PDB ID: 7LSS) were downloaded from https://www.rcsb.org and epitopes displayed by “UCSF Chimera v1.15” (for Mac OS).[59]
Transcriptome analysis
Total RNA sequencing data from SARS-CoV-2-infected macrophages (BioProject ID PRJNA637580, Sequence Read Archive (SRA) ID mock: SRR11934391, SRR11934392, SRR11934393, infected: SRR11934394, SRR11934395, SRR11934396)[60], Calu-3 adrenocarcinomic lung epithelial cells (PRJNA615032, mock: SRR11517744, SRR11517745, SRR11517746, infected: SRR11517747, SRR11517748, SRR11517749)[61] and bronchoalveolar lavage (BALF) samples from intensive care COVID-19 patients (PRJNA605983SRA, SRA: SRR11092056, SRR11092057, SRR11092058, SRR11092059, SRR11092060, SRR11092061, SRR11092062, SRR11092063, SRR11092064)[62] compared to healthy controls (PRJNA316136, SRA: SRR3286988, SRR3286989, SRR3286990, SRR3286991, SRR5515942, SRR5515943, SRR5515944)[63] were downloaded from SRA (https://www.ncbi.nlm.nih.gov/sra), quality controlled by FastQC (Babraham Institute, Cambridge, UK, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and Illumina adapters trimmed by Trimmomatic[64]. Salmon[65] and DESeq2[66] were employed for differential RE analysis, with standard parameters after indexing the retro.hg38.v1 database (“salmon index -t retro.hg38.v1.fa -i retro.hg38.v1_index -k 31”). Heatmaps were done by iDEP v0.92[67] and graphs by GraphPad Prism software version 8.0 for OS X (GraphPad Software Inc., USA).