1. Study cohorts, data collection, and biologic samples
To identify biomarkers of disease severity, we evaluated IC protein and gene expression using complementary methodologies in three COVID-19 autopsy cohorts and in a cohort of early COVID-19 patients in the ED.
1.1. Study cohorts
1.1.1. Massachusetts General Hospital autopsy cohorts:
Two autopsy cohorts were prospectively enrolled at the Massachusetts General Hospital (MGH) in Boston, Massachusetts. These cohorts are referred as to the MGH autopsy discovery cohort and the MGH COVID-19 autopsy development cohort.
1.1.1.1 MGH autopsy discovery cohort
The cohort consisted of prospectively enrolled consecutive decedents with respiratory failure and DAD due to SARS-CoV-2 infection (n=20) during the first wave of the pandemic in Northeastern U.S. (March–May 2020) and retrospective consecutive decedents with respiratory failure and DAD (n=21) prior to the SARS-CoV-2 pandemic. This cohort has been reported previously [23]. Baseline clinical and laboratory characteristics of the cohort are described in Supplementary Tables 1 and 2. SARS-CoV-2 nuclei acid was not identified by specific in-situ hybridization in the pre pandemic autopsy lung tissues of any of the non-SARS-CoV-2 control cases as expected (Supplementary Table 20). Patients were unvaccinated against SARS-CoV-2 and did not receive COVID-19-specific therapeutics.
1.1.1.2 MGH COVID-19 autopsy development cohort
An extended cohort of consecutive COVID-19 autopsies from MGH (n=39) was prospectively enrolled for this study from March to July 2020 (Table 1). COVID-19 autopsy cases included in the MGH autopsy discovery cohort (n=20) were also part of the MGH COVID-19 autopsy development cohort.
Autopsy protocols and sample processing (MGH autopsy cohorts)
Autopsies were carried out and tissues were collected following Centers for Disease Control and Prevention (CDC) guidelines for biosafety and infection control practices. The specimen collection was performed as previously described[23]. All autopsies were performed at the MGH Department of Pathology following consent of the legal next of kin or healthcare proxy as appropriate.
Two representative FFPE-tissue blocks of the MGH autopsy cohorts were selected by a board-certified pathologist, and hematoxylin and eosin (H&E)-stained slides, chromogenic immunohistochemistry (IHC) for different markers, and chromogenic SARS-CoV-2 RNA in situ hybridization (RNA-ISH; RNAscope™, Advanced Cell Diagnostics, Inc., Newark, CA) were performed in each selected FFPE whole-tissue section. As described in our prior report[23], the two slides included the most congested and the least congested lung parenchymal region out of all histological sections. All selected slides contained sections of at least three different generations of airways (from both the conducting and respiratory zones), which could include: lobar bronchi, segmental bronchi, subsegmental bronchi, bronchioles, lobular bronchioles, terminal bronchioles, respiratory bronchioles, alveolar ducts, alveolar sacs, and alveoli.
In addition, we performed machine-learning derived morphometrics of the vascular congestion in the alveolar septa in autopsy cases of the MGH COVID-19 autopsy discovery cohort in which Nanostring RNA data was available (n=10), as detailed in section 2.2.
1.1.2 International COVID-19 autopsy cohort
The COVID-19 international lung autopsy consortium was established between March 1, 2020, and February 28, 2021 as a research collaboration between pathology, pulmonary, critical care, and infectious diseases physicians and investigators from 47 research sites in 16 countries. The consortium aimed to inform the international medical community with a standardized and systematic description of the histopathological findings and pathomechanisms of SARS-CoV-2 infection in the lung. This cohort is referred as to the international cohort.
The consortium amassed the largest international, multi-center autopsy cohort including a clinical and pathological metadata registry of 535 decedents with confirmed SARS-CoV-2 infection. The cohort includes both autopsies of hospitalized and non-hospitalized COVID-19 deceased persons. The cohort also included autopsy cases from MGH. The fact that vaccination became available worldwide after December 8, 2020, for the elderly and selected subpopulations (e.g., healthcare workers) is reflected in the absence of post-vaccination cases. Because participants were unvaccinated against SARS-CoV-2, and most did not receive targeted therapies, the cohort is suitable to study the natural history of COVID-19.
Individual-participant data and a pre-specified list of 269 variables were requested from investigators at eligible research sites and retrieved from the medical records or the laboratory information system at each participating institution. The underlying cause of death was determined in each case by the pathologist who performed the autopsy. A centralized histopathological review of all cases was performed by board-certified subspecialized thoracic pathologists at MGH blinded to clinical data. Additional details about site enrollment and data collected are included in Supplementary Tables 21-23.
Search strategy and selection criteria of collaborating research sites (International cohort)
Briefly, we searched for prospective or retrospective cohort, case-control studies, observational studies, randomized controlled trials (RCTs), and case reports describing autopsies of patients with laboratory confirmed SARS-CoV-2 infection published between Dec 31, 2019, and September 20, 2020, in MEDLINE, and Embase electronic databases, to identify potential collaborative research sites.
Research site eligibility was determined based on the content of scientific publications from the respective sites. Since the pathological description of COVID-19 autopsies was our primary study outcome, we excluded studies not including COVID-19 autopsy cohorts, population-based studies, outbreak reports, editorials, letters to editor without sufficient pathological description of autopsy cases, reviews, theoretical models, and any other studies describing COVID-19 in which autopsies were not performed.
The 9-month timeframe was chosen on the basis of expected availability of clinical and pathological autopsy data after the Wuhan Municipal Health Commission reported the first cluster of cases of SARS-CoV-2 related pneumonia in Wuhan, Hubei Province, in China on December 31, 2019[51][52]. We also included unpublished data of relevant studies available as preprints in medRxiv, bioRxiv, and arXiv databases between Dec 31, 2019, and September 20, 2020.
We included the following terms in our search strategy: (“autopsy”, OR “biopsy”, OR “pathology”, OR “pathological findings”, OR “histopathology”, OR “histopathological findings”) AND (“SARS-CoV-2”, OR “COVID-19”, OR “coronavirus disease”). We restricted our search to studies in humans. The search strategy was performed by one reviewer (YL). We additionally reviewed reference lists of all relevant studies to identify additional studies.
One reviewer (JV) reviewed articles in two stages to select potential collaborative research sites: evaluation of titles and abstracts followed by full-text review. Relevant articles were subject to a full text review by the reviewer. The senior investigator (MMK) was consulted in certain situations regarding research site inclusion. If research site eligibility could not be assessed from the full-text manuscript because of missing information, we contacted authors for clarification. We evaluated eligible articles for duplication of data on the same individuals and research autopsy cohorts and excluded manuscripts if necessary.
Reviewers (International cohort)
JV is a board-certified pathologist in anatomic pathology, clinical pathology, and medical microbiology, with fellowship training in pulmonary pathology. JV has extensive clinical and research expertise in pulmonary and infectious disease pathology and epidemiology. JV was a clinical fellow in pathology at the MGH and Harvard University when the study started. He is currently the lead of the human diagnostic pathology team at the Infectious Diseases Pathology Branch of the Centers for Disease Control and Prevention (CDC), and an adjunct assistant professor of the Department of Pathology and Laboratory Medicine at Emory University School of Medicine. YL is a board-certified infectious disease physician with extensive research and clinical expertise in COVID-19, HIV, and hepatitis B virus epidemiology and pathogenesis. YL was a clinical fellow at Brigham and Women’s Hospital and Harvard University when the study started. He is currently a clinical assistant professor of Medicine at the University of Pittsburgh Medical Center (UPMC). MMK is a board-certified pathologist in anatomic pathology with over 20 years of experience in pulmonary pathology. MMK is a Professor of Pathology at Harvard Medical School, the vice chair for Anatomic Pathology, and the director of Pulmonary Pathology at the MGH.
Selection criteria of research sites (International cohort)
To be eligible for inclusion, a research site found through a published study needed to comply with the following criteria:
(1) Include at least one autopsy case of a deceased patient with laboratory-confirmed SARS-CoV-2 infection, (2) either women or men, (3) from hospital or non-hospital settings, in which (4) the cause of death was determined by a pathologist, in which (5) minimal demographic data (age, gender, date of death) was available, and (6) availability to share pathological material from the lungs obtained during the autopsies with the main research site (The Massachusetts General Hospital). The material shared could include scanned whole slide images (WSIs) if physical material could not be sent to MGH for different reasons. Appropriate ethics approval was also an inclusion criterion for all research sites within the consortium.
Exclusion criteria (International cohort)
We excluded sites that were unable to demonstrate laboratory confirmation of SARS-CoV-2. We also excluded research sites in which COVID-19 autopsy cases were evaluated as consultations and referrals for second opinions since the clinical data from those cases was often not available.
Eligible research sites, and site enrollment (International cohort)
After the search strategy was performed, a total of 103 research sites around the world were eligible to be contacted and included in the COVID-19 International lung autopsy consortium.
We contacted the corresponding authors of the selected studies and invited them to participate in the COVID-19 International lung autopsy consortium. A list of the references of scientific publications from all research sites that were contacted and invited to join the COVID-19 International lung autopsy consortium is included in the Supplementary Information. E-mails of the authors were obtained from the published manuscripts or online scientific platforms (e.g., https://www.researchgate.net/). The authors were contacted in English, with few exceptions in which we used their language of origin, based on the official language of the country of origin of the institution they listed in the main manuscript. Authors who did not respond on at least two occasions were deemed to have not replied. In multi-institutional studies we contacted authors from different institutions. If their e-mails were not listed in the manuscript, we contacted the corresponding authors and asked for the contact information of the authors affiliated to other institutions. In addition, we contacted authors with unpublished COVID-19 autopsy cohorts via our collaborators and included their cohorts if inclusion criteria were met.
Participating research sites (International cohort)
A total of 47 research sites in 16 countries and 4 continents were included in the COVID-19 International lung autopsy consortium. A list of all included sites is shown in SupplementaryTable 21.
We endeavored to contact the authors of the other 56 studies, but 11 research groups declined our request, 12 research groups had initially accepted and then were unresponsive after four attempts, three research groups mentioned that the pathological material was unavailable as it was not collected in their studies, 29 research groups were unresponsive after two attempts, and one research group mentioned that their cases could not be shared as they belonged to a third party that had sent them to this institution for second opinion and ancillary testing. A detailed list of the 103 research sites that were contacted, and their studies (when applicable), can be found in Supplementary Table 22.
Autopsy protocols and selection of pathological material in each research site (International cohort)
Briefly, autopsies were performed following internal protocols at each participating site. A selected anatomic pathologist in each site evaluated all available postmortem lung tissue. The underlying cause of death was determined in each case by the pathologist who performed the autopsy examination. Pathologists of the participating research site were asked to select the tissue slides, sections or blocks that were most representative of the pathological process occurring within the lung of each autopsy.
The research sites were asked to share pathological material with the MGH. The acceptable lung pathological material included: a) H&E-stained slides or b) whole slide images (WSIs) of H&E-stained slides from the lung parenchyma, and c) their corresponding FFPE lung tissue blocks, or d) their corresponding unstained FFPE tissue sections. There were no limits imposed, and the pathologist in each research site could submit as many FFPE tissue blocks, unstained tissue sections, or tissue slides that were deemed necessary. As for example if the pathological process seen in one COVID-19 autopsy included only acute diffuse alveolar damage (DAD) with hyaline membranes, the pathologist in the research site selected the most representative tissue slide/block. If there were two processes occurring in the lungs of one autopsy the pathologist was invited to select the two tissue slides/blocks that were most representative. In occasions, two pathological processes were present in a single tissue slide/block. In addition, some research sites also provided pathological material from premortem lung biopsies of patient with laboratory-confirmed SARS-CoV-2 infection.
Sample processing, and histopathological review (International cohort)
Selected H&E-stained slides, and their corresponding unstained FFPE tissue sections and/or FFPE tissue blocks were sent to MGH and archived at the international autopsy biorepository. Unstained FFPE tissue sections were stained with H&E in those cases in which H&E-stained slides were not provided by the research sites. The H&E-stained slides of each autopsy were reviewed by board-certified subspecialized thoracic pathologists. Different histopathological features were systematically recorded from all available H&E-stained tissue slides, and one representative tissue section from each case was stained with immunohistochemical markers (section 2.1).
Data request (International cohort)
Pre-specified individual-participant data and a pre-specified list of 269 variables were requested from authors of all eligible research sites. Data was retrieved from the medical records or the laboratory information system in each participating institution.
The variables included participant demographic characteristics, characteristics of the autopsy, gross and microscopic autopsy findings, underlying cause of death, characteristics of the submitted histopathological material, presenting symptoms and illness onset, characteristics of SARS-CoV-2 testing, pre-existing conditions, smoking history, characteristics of hospital stay, characteristics of intensive care unit care stay, clinical complications during hospitalization, microbiological testing during hospitalization and on autopsy, laboratory testing during hospitalization (hematology testing, inflammatory markers and acute phase reactants testing, clinical biochemistry testing, arterial blood gas testing), gas exchange and respiratory mechanics parameters, pharmacotherapy during disease course and hospitalization, radiographic findings and relevant clinical outcomes (e.g. oxygen- and ventilatory-support parameters, time to death from symptom onset, length of hospital stay). A detailed list of the requested data is described in the Supplementary Table 23.
This study follows PRISMA-IPD guidelines for individual-participant data reporting[53]. The consortium protocol is registered with PROSPERO (CRD42022237969)[54] and includes a pre-specified analytical plan. Data from eligible cohorts for aggregate data statistics were extracted by one author (JV).
Characteristics of the cohort (International cohort)
Of the participants with available FFPE tissue-blocks (n=329), 71% were at least 60 years old, 36% were female, the median time between symptom onset and death was 17 days (interquartile range [IQR] 11–28; n=262), and the median time of between las hospital admission and death was 11 days (IQR 5–21; n=313). A total of 311 of 329 participants (95%) were admitted to the intensive care unit, and 174 of 303 participants (57%) required invasive mechanical ventilation with a median duration of 10 days (IQR 5–20; n=172). During their hospitalization, 12% of patients (36/307) received Remdesivir, 10% (28/291) dexamethasone, and 42% (122/291) any type of corticosteroid therapy.
1.1.3 MGH ED cohort
To identify cellular and immune responses associated with outcome, clinicians and researchers at MGH enrolled longitudinally acutely ill patients in the Emergency Department (ED) in a large, urban, academic hospital in Boston, Massachusetts (with institutional review board approval) from March 24, 2020, to April 30, 2020, during the first peak of the COVID-19 surge in the U.S. Northeast. Participant enrollment is described in previously published reports[24][55].
The cohort included patients 18 years or older with a clinical concern for COVID-19 upon ED arrival, and with acute respiratory distress with at least one of the following:
1) tachypnea ≥ 22 breaths per minute;
2) oxygen saturation ≤ 92% on room air;
3) a requirement for supplemental oxygen; or
4) positive-pressure ventilation.
Participants with symptomatic SARS-CoV-2 infection confirmed by nucleic acid tests (n=126) were included in this study. Participants had blood drawn for proteomic analysis (using the Olink® platform) and for plasma SARS-CoV-2 viral mRNA load quantification on hospital days 0, 3, and 7. Clinical course was followed to 28 days after enrollment or until hospital discharge if that occurred beyond 28 days. Data from this longitudinal cohort is publicly available to the wider research community from the Olink® website[56].
Disease severity definitions (MGH ED cohort)
Enrolled participants of the MGH ED cohort who had a positive SARS-CoV-2 nasopharyngeal RT-PCR test (n=126) were categorized into five disease groups:
A1, death within 28 days;
A2, requiring mechanical ventilation and survival to 28 days;
A3, requiring hospitalization on supplemental oxygen within 28 days;
A4, requiring hospitalization without need of supplemental oxygen within 28 days; and
A5, discharge from ED and no subsequent requirement of hospitalization within 28 days.
Severe disease was defined as belonging to group A1 or A2. Data for classification was obtained from the electronic medical record.
RNAemia clearance definitions (MGH ED cohort)
For any participants enrolled in the MGH ED cohort whom the VL at admission (day 0) was greater than or equal to 3 log copies/ml, RNAemia clearance was defined as a VL at day 7 at least unquantifiable (<2 log copies/ml) or a VL at Day 3 at least unquantifiable (<2 log copies/ml if the VL at Day 7 was not available). For any participants in whom the VL at day 0 was lower than 3 log copies/ml, RNAemia clearance was defined as an undetectable VL at day 7 or a VL at day 3 at least unquantifiable (<2 log copies/ml if Day 7 VL not available).
Characteristics of the cohort (MGH ED cohort)
Half of the participants were at least 65 years old (63/126), 41.3% were female (52/126), and the median time between symptom onset and presentation to the ED was 7 days (IQR 4-11). 39 of 126 participants (31.0%) had a baseline SARS-CoV-2 viral load above the limit of quantification (2 log10 copies/mL), 76 (60.3%) had severe disease, 73 (57.9%) required invasive mechanical ventilation, and 19 (15.1%) died of COVID-19.
Details about plasma viral load quantification and proteomics analyses are described in sections 2.7, 2.8, and 2.9.
2. Procedures
2.1 Histopathological and immunohistochemistry analyses
Semiquantitative histological and IHC analyses were performed on cases from the three autopsy cohorts to characterize histopathological features and immunostaining patterns of selected ICs (Figure 2A-E; SupplementaryTable 24).
Histopathological features and semiquantitative scoring
Histological features that were recorded in all autopsy cohorts included the phases of lung injury (ALI, acute DAD, acute and organizing DAD, predominantly organizing DAD, and fibrotic DAD), and the presence of capillary congestion and dilatation.
Alveolar capillary vascular congestion and dilatation were scored in a semiquantitative way as described in Supplementary Table 25.
Immunohistochemistry
Chromogenic immunostains were performed in an automated platform (Leica BOND-III, Leica Biosystems, Deer Park, IL) and using antibodies against PD-L1/CD274, HAVCR2/TIM-3, PD-1, and CD8. Details about the antibodies and IHC protocols used in this study are included in Supplementary Table 24.
Semiquantitative immunohistochemistry scoring
All IHC-stained slides of the MGH autopsy discovery cohort were scored in a semiquantitative fashion on a scale of 0-3 by two board-certified thoracic pathologists [ARS, MMK] that were blinded to disease outcome and clinical characteristics of the patients. Aggregate measurements were compared across all cases. In the international cohort and the extended MGH autopsy cohort, all cases were scored in a semiquantitative fashion on a scale of 0-3 by one board-certified thoracic pathologist [ARS]. All data curation and statistical analyses performed with the immunohistochemical data from both cohorts was performed by another pathologist that was not involved in the semiquantitative review, and who was also blinded to clinical characteristics and disease outcomes of the patients. Semiquantitative scales used in the current study are described in Supplementary Table 26.
For survival analyses, participants were sub-grouped in two categories based on lung tissue density of PD-L1/CD274+ and HAVCR2/TIM-3+ cells as follows:
Lung tissue PD-L1/C274+ cell density
1- Participants with low-density of PD-L1/C274+ cells: participants with none-to-rare PD-L1/CD274 staining, which correspond to a semiquantitative score = 0
2- Participants with high-density of PD-L1/C274+ cells: participants with focal, multifocal, diffuse to DIP-like PD-L1/CD274 staining, which correspond to a semiquantitative score > 0
Lung tissue HAVCR2/TIM-3+ cell density
1- Participants with low-density of HAVCR2/TIM-3+ cells: participants with none to rare/weak HAVCR2/TIM-3 staining, which correspond to a score ≤ 1
2- Participants with high-density of HAVCR2/TIM-3+ cells: participants with moderate to strong/scattered to patchy HAVCR2/TIM-3 staining, which correspond to a semiquantitative score > 1
RNA in situ hybridization
Chromogenic RNA in situ hybridization (RNA-ISH) was performed on FFPE tissue sections of the MGH COVID-19 autopsy discovery cohort (in COVID-19 and non-COVID-19 DAD controls) using SARS‐CoV‐2 RNA‐specific probes RNAscope 2.5 LS Probe-V-nCoV2019-S [Advanced Cell Diagnostics (ACD), Newark, CA, USA; 848568] and RNAscope 2.5D DAB on an automated diagnostic Leica Biosystems BOND-III apparatus (Leica Biosystems, Danvers, MA, USA). The probe is specific for the S gene encoding the spike protein. RNA-ISH was performed on 5‐µm‐thick FFPE lung tissue sections. All of the steps from baking for 1 h at 60°C to counterstaining with hematoxylin were performed on the BondRx machine. RNA unmarking was performed with Bond Epitope Retrieval Solution 2 for 15 min at 95°C, and this was followed by protease treatment for 15 min and probe hybridization for 2 h. Signals were amplified by a series of signal amplification steps, and this was followed by color development. The reactivity was visualized as brown dots. Appropriate controls were used when performing RNA-ISH to confirm RNA preservation.
RNA-ISH-stained slides from autopsies of the MGH COVID-19 discovery cohort were scored in a semiquantitative fashion by two board-certified thoracic pathologists and aggregate measurements were compared across all cases. Each case had two RNA-ISH-stained slides which were scored on a scale of 0-3, as described in Supplementary Table 27.
2.2 Machine-learning derived morphometrics
Briefly, a random forest classifier was trained and used to label every pixel in each H&E-stained lung whole slide image (WSI) as belonging in one of three categories: stroma, red blood cell (RBC), or air. Two H&E-stained histological slides were selected by the same pathologist and scanned under 80X optical magnification at approximately 8,200 pixels per micron. The two H&E-stained slides included the most congested and the least congested lung parenchymal region out of all 50 different lung tissue histological slides per autopsy (10 H&E-stained sections per each lung lobe).
The alveolar septal CVasc represents the proportion of surface area occupied by RBC pixels in (dilated) vascular lumina in one cluster of 10 contiguous alveoli that were manually annotated by a board-certified pathologist in a WSI. We annotated three clusters per WSI, and a total of six clusters per autopsy case as two slides were selected. Intra-alveolar contents (e.g., inflammatory cells, hemorrhage, fibrin, and debris) were excluded from the alveolar manual annotations. Of note, annotations were not performed within areas of lung infarction and necrosis, areas with extensive intra-alveolar hemorrhage or inflammation, areas of fibrosis, or areas in which the lung microarchitecture was severely disrupted. Analyses were performed within QuPath[57], which uses the OpenCV library for pixel classification. The final classification was applied within the clusters of alveoli to produce detailed quantitative measurements. The quantitative data of the classification in the alveolar clusters were used for morphometrical determination of the CVasc, which represents the proportion of surface area occupied by (dilated) vascular lumina in the alveolar clusters of the lung parenchyma, defined as:
Morphometric data were obtained from all of the six alveolar clusters of contiguous alveoli that were manually annotated per case. From each autopsy, we also separately designated one individual annotation with the highest CVasc, also known as “interslide-MaxCVasc”, and another annotation with the lowest CVasc, also known as “interslide-MinCVasc”, out of the six annotations of alveolar clusters per autopsy case.
Subsequently, we stratified the cases into two morphometrically defined subgroups based on the median interslide-MaxCVasc and the median interslide-MinCVasc of the COVID-19 cases of the discovery cohort. We used these morphometrically defined subgroups to evaluate differences in transcriptomic data and to perform downstream differential gene expression analyses as described in section 2.3. The two morphometrically defined subgroups per each parameter were named as follows:
1- For interslide-MaxCVasc:
-Patients with an interslide-MaxCVasc less than the median interslide-MaxCVasc of the cohort were regarded as “cases with minimally-congested-alveolar-septa in highly congested areas of the lung”
-Patients with a mean interslide-MaxCVasc equal to or greater than the median interslide-MaxCVasc of the cohort were regarded as “cases with highly-congested-alveolar-septa in highly congested areas of the lung”.
2- For interslide-MinCVasc:
-Patients with an interslide-MinCVasc less than the median interslide-MinCVasc of the cohort were regarded as “cases with minimally-congested-alveolar-septa in minimally congested areas of the lung”
-Patients with a mean interslide-MinCVasc equal to or greater than the median interslide-MinCVasc of the cohort were regarded as “cases with highly-congested-alveolar-septa in minimally congested areas of the lung”.
2.3 Transcriptomic profiling, pathway analysis, and differential gene expression analysis
Downstream differential gene expression (DGE) and pathway enrichment analyses were performed for COVID-19 patients in the MGH COVID-19 autopsy discovery cohort whose formalin-fixed paraffin-embedded lung-tissue-derived RNA was available. We compared DGE in morphometrically defined participant groups classified by a machine-learning derived analysis of the pulmonary microvasculature (Supplementary Figure 3).
FFPE tissue-derived RNA counts generated by the NanoString nCounter® Coronavirus Panel Plus[58][59] underwent quality control and normalization with housekeeping genes using the nSolver software version 4.0 (NanoString Technologies). The NanoString nCounter® Coronavirus Panel Plus was added to the nCounter® Human Organ Transplant Panel created through a collaboration between NanoString and the Banff Foundation for Allograft Pathology. The latter includes 770 genes related to immune, inflammation and tissue damage [60]. The downstream analyses were performed on R using DESeq2 package[61] and RUVSeq package[62][63] was used to remove unwanted variation with internal reference genes defined in the Banff 2019 Meeting report[58]. Gene set enrichment analysis (GSEA) was conducted using the fgsea package[64] and gene expression was compared against Molecular Signatures Database Hallmark gene set[65]. We used the whole gene set for GSEA, taking their normalized differences as input (refer to https://stephenturner.github.io/deseq-to-fgsea/). Weighted correlation network analysis (WGCNA) was used to identify gene modules and gene adjacency[66].
We used the STRING v11.5 (Search Tool for the Retrieval of Interacting Genes/Proteins) knowledgebase and software tool to assess for known and predicted protein-protein interactions (PPI), co-expression, and association[67]. The top 100 adjacent genes were assessed in the STRING and we used K-means to cluster the genes. The cluster that contained gene of interest was singled out and we selected the gene interaction with high confidence to build the final interaction network.
2.4 Single-cell RNA sequencing (scRNA-Seq) data exploration
Four publicly available single-cell RNA sequencing (scRNA-Seq) datasets[18][19][20][21] were interrogated to evaluate expression patterns of genes encoding ICs (HAVCR2, PDCD1, CD274, LGALS9, CTLA4, LAG3, TIGIT, and IL2RA) and other proteins of interest. scRNA-Seq data from the publicly available datasets[18][19][20][21] were accessed and imported to Seurat v4.0 for further analyses. Doublets were removed. We used the same cell identity as labeled in the original publications. mRNA expression levels and percentage in different cell types were extracted from the dataset expression matrices. The mean expression level for each gene was scaled against its highest absolute mean expression level among different cell types. Uniform manifold approximation and projection (UMAP) graph and heatmaps were plotted using Seurat 4.0.
2.5 Survival analyses
We explored whether the degree of PD-L1/CD274 and HAVCR2/TIM-3 expression in different cell subpopulations correlated with the timing of progression to death in autopsy cases of our cohorts. We decided not to investigate PD-1 as a prognostic marker due to its rarity of cellular staining observed in the MGH discovery cohort. Time-to-events in the two COVID-19 autopsy cohorts were defined as time-to-death from symptoms onset, length of hospital stay, time-to-death from intensive care unit (ICU) admission, time-to-death from initiation of in-hospital oxygen therapy, and time-to-death from initiation of mechanical ventilation. No data was censored. Kaplan Meier analysis was used, and groups were compared using the log-rank (Mantel-cox) test.
We also performed Cox proportional hazard model analyses with data from the two autopsy cohorts to estimate hazard ratios and to test whether the association of chromogenic IHC semiquantitative expression of PD-L1/CD274 and HAVCR2/TIM-3, and different time-to-event outcomes would be affected by different clinical variables of the cohorts, including age, sex, body mass index (BMI), corticosteroid and remdesivir therapy.
2.6 Plasma SARS-CoV-2 viral load quantification
Plasma SARS-CoV-2 viral load (VL) measurement was performed with biological samples from the MGH ED cohort as reported in previous studies[68][24]. Briefly, SARS-CoV-2 VL was quantified using the US CDC 2019-nCoV_N1 primers and probe set[68]. The lower limit of SARS-CoV-2 N gene quantification was 100 copies/mL. Samples with a positive signal but VL lower than 100 copies/mL were denoted as detectable but unquantifiable.
2.7 Proteomics analyses
Proteomic analyses used in this study have been previously described in prior studies[1][55]. Briefly, proteins in plasma from participants of the MGH ED cohort were measured using the Olink® proximity extension assay, a high-dimensional multiplex oligonucleotide-labeled antibody assay for high-specificity measurement of low-abundance proteins. Signal amplification is performed by PCR. Therefore, measurements are expressed in normalized protein expression (NPX) units, reflecting relative abundance on a log2 scale, rather than absolute concentration. This assay allows within-analyte comparisons between different samples but not between-analyte comparisons. For example, HAVCR2/TIM-3 plasma levels between two different patients can be compared, but the HAVCR2/TIM-3 plasma level cannot be directly compared to interleukin-6 plasma level in the same patient. Analytical performance validation for each protein assay is available online (www.olink.com). Information regarding protein expression at the tissue and blood cell levels, protein function, and protein localization was derived from the Human Protein Atlas[69][70].
We have previously reported that SARS-CoV-2 RNAemia is associated with adverse outcomes in COVID-19[24][71]. To evaluate the role of immune checkpoints as immunologic mediators of viral bloodstream spread, we measured plasma NPX levels of immune checkpoints in participants presenting to an emergency department and who had viral ‘clearance’ or viral ‘persistence’ at hospital day 7 as defined above in section 1.1.3.
2.8 Sparse Partial Least Squares Discriminant Analysis (sPLS-DA)
In order to define a minimal set of the most important plasma proteins associated with severe COVID-19 and RNAemia persistence, we used sparse partial least squares discriminant analysis (sPLS-DA) to evaluate participants presenting to the ED with confirmed SARS-CoV-2 infection and available clinical data within 28 days of hospital admission (n=126).
Sparse PLS-DA was conducted using the “mixOmics” package in R 4.2.0[72][73]. We included 5 components for model tuning, with ten-fold cross validation, balanced error rate (BER) and centroids distance for tuning performance measurement. Each feature was scaled before performing Sparse PLS-DA. Receiver operating characteristic (ROC) curve was plotted using the “pROC” package. 95% confidence interval was calculated using the deLong method. Correlation network graph was plotted using the “igraph” and “ggraph” package in R. Heatmap was plotted using the “pheatmap” package.
3. Other analytic considerations
We summarized continuous variables using median and IQRs. Categorical variables were evaluated using the χ2 test or Fisher’s exact test. To compare multiple groups, a Kruskal–Wallis test was used with a Dunnett test correcting for multiple comparisons. We used Spearman’s rank correlation coefficient to evaluate correlation between different continuous variables, and the P values were corrected by Benjamini–Hochberg correction. Five distinct measures of disease progression were evaluated in the MGH COVID-19 autopsy development cohort and in the COVID-19 International cohort. These included time-to-death from symptom onset, hospital length-of-stay, intensive care unit (ICU) length-of-stay, length-of-oxygen-therapy, and length-of-invasive mechanical ventilation. Each outcome of disease progression was modeled independently, and unadjusted and adjusted models were compared. Survival analyses were performed using the Kaplan–Meier method, and groups were compared by log-rank test. Cox proportional-hazards models were used to estimate hazard ratios. Analyses were performed using R (version 4.1.0), PythonSciPy v.1.4.1, JMP 16.0 (SAS Institute Inc.), and GraphPad Prism 7.0 (GraphPadSoftware, Inc.).