Study Design
Patients admitted to Stanford Hospital with signs and symptoms of COVID-19 and confirmed SARS-CoV infection by RT-qPCR of nasopharyngeal swabs were recruited. Venipuncture blood samples were collected in K2EDTA- or sodium heparin-coated vacutainers for peripheral blood mononuclear cell (PBMC) isolation or serology on plasma, respectively. Recruitment of COVID-19 patients, documentation of informed consent, collections of blood samples, and experimental measurements were carried out with Institutional Review Board approval (IRB- 55689).
Healthy adult controls (HHCs)
The data set containing control immunoglobulin receptor repertoires has been described previously7. In summary, healthy adults with no signs or symptoms of acute illness or disease were recruited as volunteer blood donors at the Stanford Blood Center. Pathogen diagnostics were performed for CMV, HIV, HCV, HBV, West Nile virus, HTLV, TPPA (Syphilis), and T. cruzi. Volunteer age range was 17-87 with median and mean of 52 and 49, respectively.
Molecular and serological testing on COVID-19 patients
SARS-CoV-2 infection in patients was confirmed by reverse-transcription polymerase chain reaction testing of nasopharyngeal swab specimens, using the protocols described in16,17. Plasma antibody testing for IgG and IgM specific for SARS-CoV-2 spike protein receptor binding domain (RBD) was carried out with an enzyme-linked immunosorbent assay based on the protocol and antigen protein production described in18.
HTS of immunoglobulin heavy chain (IGH) libraries prepared from genomic DNA and cDNA
The AllPrep DNA/RNA kit (Qiagen) was used to extract genomic DNA (gDNA) and total RNA from PBMCs. For each blood sample, six independent gDNA library PCRs were set up using 100 ng template/library (25ng/library for 7453-D0). Multiplexed primers to IGHJ and the FR1 or FR2 framework regions (3 FR1 and 3 FR2 libraries), per the BIOMED-2 design were used19 with additional sequence representing the first part of the Illumina linkers. In addition, for each sample, total RNA was reverse-transcribed to cDNA using Superscript III RT (Invitrogen) with random hexamer primers (Promega). Total RNA yield varied between patients and between 6 ng-100 ng was used for each of the isotype PCRs using IGHV FR1 primers based on the BIOMED-2 design19 and isotype specific primers located in the first exon of the constant region for each isotype category (IgM, IgD, IgE, IgA, IgG). Primers contain additional sequence representing the first part of the Illumina linkers. The different isotypes were amplified in separate reaction tubes. Eight-nucleotide barcode sequences were included in the primers to indicate sample (isotype and gDNA libraries) and replicate identity (gDNA libraries). Four randomized bases were included upstream of the barcodes on the IGHJ primer (gDNA libraries) and constant region primer (isotype libraries) for Illumina clustering. PCR was carried out with AmpliTaq Gold (Applied Biosystems) following the manufacturer's instructions, and used a program of: 95°C 7 min; 35 cycles of 94°C 30 sec, 58°C 45 sec, 72°C 60 sec; and final extension at 72°C for 10 min. A second round of PCR using Qiagen’s Multiplex PCR Kit was performed to complete the Illumina sequencing adapters at the 5’ and 3’ ends of amplicons; cycling conditions were: 95°C 15 min; 12 cycles of 95°C 30 sec, 60°C 45 sec, 72°C 60 sec; and final extension at 72°C for 10 min. Products were subsequently pooled, gel purified (Qiagen), and quantified with the Qubit fluorometer (Invitrogen). Samples were sequenced on the Illumina MiSeq (PE300) using 600 cycle kits.
Sequence quality assessment, filtering, and analysis
Paired-end reads were merged using FLASH20, demultiplexed (100% barcode match), and primer trimmed. The V, D, and J gene segments and V-D (N1), and D-J (N2) junctions were identified using the IgBLAST alignment program21. Quality filtering of sequences included keeping only productive reads with a CDR-H3 region, and minimum V-gene alignment score of 200. For cDNA-templated IGH reads, isotypes and subclasses were called by exact matching to the constant region gene sequence upstream from the primer. Clonal identities were inferred using single-linkage clustering and the following definition: same IGHV and IGHJ usage (disregarding allele call), equal CDR-H3 length, and minimum 90% CDR-H3 nucleotide identity. A total of 518,403 clones (per sample, mean number of clones: 74,058; median number of clones: 9,030 for each isotype) were identified. A total of 6,158,222 IGH sequences amplified from cDNA were analyzed for the COVID-19 subjects (mean: 879,746 per individual; median: 910,437) and 68,831,446 sequences from healthy adult controls (mean: 603,785 per individual; median: 637,269). Each COVID-19 patients had on average 280,307 in-frame gDNA sequences and each adult control had an average of 8,402 in-frame gDNA sequences.
For each clone, the median somatic mutation frequency of reads was calculated. Mean mutation frequencies for all clonal lineages from a subject for each isotype were calculated from the median mutation frequency within each clone, and so represent the mean of the median values. Clones with <1% mutation were defined as unmutated and clones with ≥ 1% were defined as being mutated. Subclass fractions were determined for each subject by dividing the number of clones for a given subclass by the total number of clones for that isotype category. Expanded clones were defined as a clone found in one subject which is present in two or more of the gDNA replicate libraries. Clonal expansion in the isotype data was inferred from the gDNA data. Analyses were conducted in R22 using base packages for statistical analysis and the ggplot2 package for graphics23.
To determine convergent rearranged IGH among patients with SARS-CoV-2 infection, we clustered heavy-chain sequences annotated with the same IGHV and IGHJ segment (not considering alleles) and the same CDR-H3 length were clustered based on 85% CDR-H3 amino acid sequence similarity using cd-hit24. To exclude IGH that are generally shared between humans and to enrich the SARS-CoV-2-specific IGH that are likely shared among the patients, clusters were selected as informative if (1) they contained at least five IGH sequences from each COVID-19 patient and were present in at least two subjects; (2) no IGH sequences from HHC samples (collected prior to the 2019 SARS-CoV-2 outbreak) were identified in the same convergent cluster. The same selection criteria were used to determine the convergent clusters between the COVID-19 samples and previously reported IGH sequences specific to SARS-CoV- 1 and SARS-CoV-2.