Mouse strains, influenza virus infection, and antibody treatments
Transgenic mouse models of breast cancer, using mammary tumor virus (MMTV) long terminal repeats, are widely used. In brief, MMTV-PyMT and MMTV-erbB2/neu/HER2 (MMTV-Her2) mice express the oncogenes Polyoma virus Middle T antigen (PyMT) and rat Erbb2 (encoding Her2), respectively, upstream of the MMTV promoter which confers expression in the mammary epithelium, as described elsewhere10,13,14. The MMTV-PyMT transgene is congenic in the FvB background, and the MMTV-Her2 transgene is congenic in the FvB and C57Bl/6 backgrounds. MMTV-Her2 mice (FvB) were crossed with IL-6 KO mice as described9,15.
Eight-week-old MMTV-PyMT and 12- to 14-week-old MMTV-Her2 mice were infected with 500 EIU Puerto Rico A/PR/8/34 H1N1 influenza A (IAV) through intranasal administration in 50 ml PBS. For viral administration, mice are anesthetized using 5% induction isoflurane and 2% maintenance, performed with a SomnoFlo Low-Flow electronic vaporizer machine in an induction chamber. After ensuring adequate anesthesia with slow and deep breathing, droplets of viral fluid are placed on the mouse’s nostrils. The mouse inhales the fluid through the nostrils. Once the fluid has been inhaled, the mouse is placed on a heating pad to recover.
For immune cell depletion experiments, mice were injected intraperitoneally with rat IgG as control (MP Biochemicals; cat# MPBio 0855951; Singapore), 100 mg anti-CD4 (Bio X cell, clone GK1.5; cat# BP003-1; Lebanon, NH) (Sup. Fig. 7), or 100 mg anti-CD8 (Bio X cell, clone2.43; cat#; Lebanon, NH) 1 day before IAV infection and every 6 days afterward, or 200 mg anti-Ly6G (Bio X cell, clone 1A8; cat#BP0075-1; Lebanon, NH) on the day of the flu infection, then 24 hours and every other day afterwards, until being euthanized. The University of Colorado Institutional Animal Care and Use Committee (IACUC) reviewed and approved all animal experiments, which were conducted in accordance with the NIH Guidelines for the Care and Use of Laboratory Animals.
SARS-CoV-2 MA10 propagation
Mouse-adapted SARS-CoV-2 MA10 (BEI Resources NR-55329) was propagated in Vero E6 cells (ATCC CRL-1586) as previously described16. Briefly, low passage Vero E6 monolayers were inoculated at an MOI of 0.01 with SARS-CoV-2 MA10. When Vero E6 monolayers exhibited 70-75% CPE (2-3 days post inoculation), supernatants were collected, clarified by centrifugation, supplemented with an additional 10% FBS, aliquoted and stored at -80°C. SARS-CoV-2 titers were determined by plaque assay on Vero-E6 cells. Vero-E6 cells were maintained at 37°C in Dulbecco’s Modified Eagle medium (DMEM, HyClone 11965-084) supplemented with 10% fetal bovine serum (FBS), 10 mM HEPES (pH 7.3) and 100 U/mL of penicillin-streptomycin.
SARS-CoV-2 MA10 infection of mice
C57BL/6J MMTV-Her2 mice at 14-19 weeks of age were anesthetized by intraperitoneal injection (i.p.) with a mixture of ketamine (80 mg/kg) and xylazine (7.5 mg/kg) in a volume of 100-200 uL. Fully anesthetized mice were inoculated intranasally (i.n.) with 104 PFU of SARS-CoV-2 MA10 diluted in PBS supplemented with 1% bovine calf serum by administration of 25 uL of inoculum in each nostril for a total volume of 50 uL. Mouse weights were collected daily for 15 days, and mice inoculated with SARS-CoV-2 MA10 exhibited weight loss beginning at 2 dpi, with greatest loss achieved at 3-4 dpi as previously reported16. As controls, C57BL/6J MMTV-Her2 mice were mock inoculated with 50 uL of PBS/1% bovine calf serum.
Immunohistochemistry (IHC) and immunofluorescence (IF) staining
Lungs and mammary glands were collected and fixed in 10% neutral buffered formalin overnight, transferred to 70% ethanol the next day, and then embedded in paraffin. Tissue was sectioned (5 μm) and used for IHC and IF. Slides were deparaffinized in three incubations of 15 min in Histo-clear (Fisher Scientific, cat# 50-899-90147; Hampton, NH) then descending 10-min ethanol incubations: three at 100%, followed by 95%, and 70% followed by 10-min H2O incubation. Heat-induced antigen retrieval was carried out for 10 min in a pressure cooker in citrate buffer (10 mM Citric Acid, pH 6.0). For IHC, samples were incubated in 1% H2O2 for 15 min to block endogenous peroxidase activity. Permeabilization was performed using 0.1% normal goat serum in 0.4% Triton-X 100 in PBS for 30 min. Sections were blocked for 1 h at room temperature (RT) with blocking solution (Abcam cat# AB64226; Cambridge, UK) containing M.O.M. blocking reagent (Vector Laboratories cat# MKB2213-1; Newark, CA), incubated with primary antibodies (Sup. Table. 3) at 4°C overnight in antibody diluent (Abcam cat# 64211; Cambridge, UK), then washed 3 times for 30 minutes each in 0.1% triton-X 100 in PBS. For IHC samples, sections were incubated in ImmPRESS HRP goat anti-rabbit or rat IgG polymer detection kit (Vector Laboratories cat# MP-7451/MP7404; Newark, CA) and ImmPACT DAB substrate, Peroxidase HRP (Vector Laboratories cat# SK4105; Newark, CA) according to the manufacturer’s instructions. IHC slides were mounted using micromount mounting medium (StatLab, cat# MMC0126; McKinney, Texas). For IF, sections were incubated with secondary antibodies for 1 h at RT in antibody diluent (Abcam cat# 64211; Cambridge, UK). Sections were then washed in 0.1% Triton-X 100 in PBS 3 times for 30 min each and were mounted using fluoroshield mounting media with DAPI (Abcam cat#104139). Immunofluorescence images were collected using a Zeiss Axiovert 200m fluorescence microscope. IHC images were collected using a Keyence BZ-X800 microscope. Sections staining, image capturing, and image analysis were done manually using ImageJ and were carried out by a researcher who was blinded to sample identities. Subsequent grouping and graphing were done by a different lab personnel unblinded after image analyses and quantification were completed.
Bronchoalveolar Lavage Fluid (BALF) processing
Bronchoalveolar lavage (BAL) was performed with 1 mL of PBS (ThermoFisher cat#14190-144; Waltham, MA) after mice were euthanized. BALF was collected and centrifuged at 500 x g for 5 min at 4ºC. Supernatant was flash frozen in liquid nitrogen and stored at -80ºC until analysis. Cells were resuspended in FACS buffer (PBS with 2% FBS and 2 mM EDTA) and were counted manually.
Flow cytometric analyses
Cells recovered from BALF were stained with antibodies (Extended Data Table. 3). Alternatively, whole lungs were harvested and digested using the method described elsewhere17. Briefly, lung digestion mix (1.5 mg/mL collagenase A (Sigma Aldrich cat# COLLA-RO; St. Louis, MO), 0.4 mg/mL deoxyribonuclease I (Worthington cat# LS002139; Lakewood, NJ), 10 mM HEPES pH 7.2, 5% FBS) was injected into the lungs through cannulae and were incubated at 37°C for 30 min. Digested lungs were passed through a 50 mm cell strainer and red blood cells were lysed using hemolytic buffer (150 mM NH4Cl, 1 mM NaHCO3, 1.1 mM Na2EDTA). Single cells were resuspended in FACS buffer and stained with antibodies (Extended Data Table 3) for flow cytometry. Data were collected on the LSR II flow cytometer (BD Biosciences) and analyzed with FlowJo software v10.
Fixed single cell RNA-seq
Cells exhibiting >80% viability were fixed in a 4% formaldehyde solution and using the Chromium Next GEM Single Cell Fixed RNA Sample Preparation Kit (10X Genomics, Pleasanton, CA). The whole transcriptome probe pairs (10X Genomics) were added to the fixed single cell suspensions to hybridize to their complementary target RNA during an overnight incubation at 42°C. After hybridization, unbound probes were removed by washing. The fixed and probe-hybridized single cell suspensions were loaded onto a Chromium X (10X Genomics) microfluidics instrument to generate partitioned nanoliter-scale droplets in oil emulsion. The target is for each droplet to contain a barcoded gel bead, a single cell, and enzyme Master Mix (10X Genomics) for probe pair ligation and gel bead primer barcode extension. The droplets in oil emulsion were placed in a thermal cycler for 60 min at 25°C, 45 min at 60°C, and 20 min at 80°C. The single cell-barcoded, ligated probe products underwent library preparation using standard 10X Genomics protocols in preparation for Illumina next-generation sequencing. The gene expression library derived from single cell-barcoded, ligated probe product were sequenced as paired-end 150 bp reads on the Illumina NovaSeq 6000 (Illumina, San Diego, CA) at the University of Colorado Genomics Shared Resource (Aurora, CO, USA) at a target depth of 20,000 reads per cell for all samples.
Data Processing for single cell RNA-seq analysis
Single cell RNA-seq (scRNAseq) fastq files were processed using Cell Ranger software (version 7.1.0)18 from 10X genomics to assign reads to genes based on Cell Ranger’s Chromium mouse transcriptome probe set (version 1.0.1). The counts were analyzed using the Seurat R package19 and cells with less than 100 genes and genes that were seen in fewer than 10 cells were excluded. The R package scDblFinder20 was used to identify and subsequently remove doublets from the data. Based on the distribution of UMIs, gene counts, and percentage of mitochondrial reads, data were filtered to remove cells with fewer than 200 and more than 7,500 UMIs or genes detected and mitochondrial reads greater than 2.5%. The data were then log transformed and scaled, regressing out cell cycle difference score, total UMI and % mitochondrial reads.
Principal Component Analysis (PCA) was performed using the top 2,000 variable genes. PCs (N=30) that captured most of the variation were then included in further data processing steps. Clusters were identified (at a resolution of 1.8) using the K-nearest neighbors’ algorithm. The top 100 enriched genes (with an adjusted P < 0.05 and higher average fold changes) within each cluster and within each sample were used for performing over-representation analysis (ORA)21 with gene sets from the MSigDB C8 collection22, PanglaoDB23, MSigDB24, GO biological processes25 and Hallmark KEGG pathways26 databases, annotated using enriched gene sets and expression of canonical cell type markers. Differentially expressed genes (DEGs) were identified using the Wilcoxon Rank Sum test within each of the cell types identified for the indicated comparisons. Gene set enrichment analysis (GSEA) was performed using the clusterProfiler R package (v 4.6.2) for the indicated comparisons and within cell types of interest. Benjamini-Hochberg method was used to calculate the adjusted P values. Raw and processed scRNAseq data will be deposited to the Gene Expression Omnibus.
Influenza virus RNA quantification
Whole lung tissue was homogenized, and RNA was isolated via TRIzol/chloroform extraction per the manufacturer’s protocol (ThermoFisher; Waltham, MA and MilliporeSigma; St. Louis, MO, respectively). RNA was reverse transcribed with an iScript cDNA synthesis Kit (Bio-Rad Laboratories, Inc., Hercules, CA) and the viral load was determined by qPCR for the PR8 acid polymerase (PA) gene compared to a standard curve of known PA copy numbers as previously described27.
Quantification and statistical analyses (mouse models)
Statistical analyses were performed using Prism 10.2.1 software (GraphPad). Investigators were not blinded to allocation during virus (IAV or SARS-CoV-2) inoculation or antibody treatment. Quantification and image analysis were done in a blinded manner. N indicates number of mice per group. A minimum of 3 slides per mouse were used for image analysis. Total Her2+ cell counts (Fig. 1c, Fig. 2b), Her2+ cells, and Her2+ Ki67+ cells were counted manually using ImageJ. Three lung sections at least 50 µm apart per mouse were counted and summed. For other image quantifications, whole lung images were divided into fields using ImageJ’s grid function, 8-10 fields were selected at random per image and counted. For experiments with two groups, a two tailed student t-test was used; for experiments with more than one group, one-way ANOVA tests were used unless otherwise stated. Data were expressed as mean ± standard deviation (s.d.). P-values ≤ 0.05 were interpreted as evidence against the null hypothesis (i.e. no effect, no difference).
Human Observational Data
For the human follow-up studies, we selected SARS-CoV-2 infections as the driver virus as SARS-CoV-2 infection due to mandatory reporting during the first phases of the pandemic provides an opportunity to utilize real-world data to investigate the hypothesis that respiratory viral infections promote metastatic disease. We used two complementary datasets: 1) The UK Biobank, a population-based study including 502,356 adult volunteers aged 40 to 69 years at recruitment from 2006 to 201028,29. 2) Breast cancer patients/survivors from the Flatiron Health's nationwide electronic health record (EHR)-derived database
30,31.
Study 1. Population-based analyses in the UK Biobank
We used data from the UK Biobank, a population-based study including 502,356 volunteers aged 40 to 69 years at recruitment from 2006 to 201028,29. Participants provided data on lifestyle, anthropometry, exposures, sociodemographic factors, medical history, and medications at baseline. Previous cancer diagnoses were obtained through (consented) linkage to the national cancer registry and confirmed SARS-CoV-2 test positivity status through linkage to national registers. Mortality data were obtained from the national death registries (NHS Digital, NHS Central Register, National Records of Scotland). We considered all-cause mortality, non-COVID-19 mortality (by excluding deaths with ICD codes U07.1 and U07.232 or any death within one month of the latest recorded test positive result), and cancer mortality (considering cause of death with ICD codes listed in Extended Data Table 1).
As summarised in Figure 4a, we included 502,356 participants, excluding those who withdrew from the study. We excluded 65,374 participants due to missing data on covariates including sex, age, BMI, ethnicity, smoking status, alcohol consumption, education, employment status, and household income (N= 63,557); missing date on SARS-CoV-2 testing when the primary cause of death was COVID-19 (N=129); and missing cancer diagnosis date if the primary cause of death was cancer (N=1,688), leaving 436,982 participants. Of these, 91,518 had been diagnosed with cancer at the latest follow-up (1st June 2022). Overall, 22,597 participants had died before the start of the COVID-19 pandemic (here defined as 1st January 2020) (N=14,974) or were diagnosed with cancer after that date (N=9,543) and were therefore excluded. Participants diagnosed with multiple cancers before baseline were also excluded (N= 6,654).
Of the 60,347 participants with a cancer diagnosis before 1st January 2020, a total of 3,462 had been reported to have tested positive for SARS-CoV-2. To ensure that test-positive and test-negative participants had similar cancer risk profiles, we adopted a non-parametric matching (without replacement) approach33 to identify (up to) 10 test-negative participants for each test-positive participant. We first matched for the cancer diagnosis date (prior to 1st January 2020) using generalised full matching34. In a second step, we performed an exact matching based on cancer type and sex. We finally matched for age, ethnicity, smoking status, alcohol consumption, education, employment status, and household income using the nearest neighbour method, an algorithm based on propensity score matching. All the matching steps were without replacement. The resulting matched population included 28,855 participants: 3,400 test-positives matched to 25,455 test-negatives. Restricting this matched population to those who tested positive before vaccine rollout (here defined as 1st December 2020) resulted in a total of 7,705 participants including 916 test positives and 6,789 matched test negatives.
Using test positivity as the predictor, we ran a series of unconditional logistic regression models for the three outcomes (all-cause, non-COVID-19, and cancer mortality). Models were adjusted for all matching factors to account for possible residual confounding. As a sensitivity analysis, we considered participants with cancer diagnosed at least five or ten years before the start of the COVID-19 pandemic in the UK by excluding the test-positive participants (and their matched test-negative controls) who were diagnosed with cancer between 1st January 2015 and 31st December 2019 (N=538 test positives and N=4,107 test negatives), and between 1st January 2010 and 31st December 2019, respectively (N=316 test positives and N=2,437 test negatives).
Data availability: This study used the UK Biobank resource under application number 69328 granting access to the corresponding UK Biobank genetic and phenotype data. The UK Biobank received ethical approval from the North West Multi-centre Research Ethics Committee (REC reference: 11/NW/0382) to obtain and disseminate participant data and samples (http://www.ukbiobank.ac.uk/ethics/).
Study 2. Flatiron - EHR-based Analyses
Data source
Flatiron Health's nationwide electronic health record (EHR)-derived database includes deidentified data from ~ 280 US cancer clinics (~ 800 sites of care). The database is longitudinal, comprising de-identified patient-level structured and unstructured data, curated via technology-enabled abstraction30,31. The majority of patients in the database originate from community oncology settings; relative community/academic proportions may vary depending on study cohort. Institutional Review Board approval of the protocol was obtained prior to study conduct and included an informed consent waiver.
Included in our study were women aged at least 18 years old at the time of initial cancer diagnosis, and who had:
- Early Breast Cancer. The cohort includes a probabilistic sample of patients with a diagnosis of Stage I - III Breast Cancer on or after January 1, 2011, including those who presented with non-metastatic disease but who subsequently developed recurrent or progressive disease, with at least two visits occurring on or after January 1, 2011 or -
- Metastatic Breast Cancer. The cohort includes a probabilistic sample of patients diagnosed with Stage IV breast cancer on or after January 1, 2011, and those who presented with earlier stage breast cancer but who subsequently developed metastatic disease on or after January 1, 2011, and who had at least two clinic encounters evident in the database occurring on or after January 1, 2011.
- Adult female patients ages 18+ years at the initial diagnosis
Real-world data source: The index date was defined as the date of the initial diagnosis of breast cancer. The COVID-19 status was defined positive if any COVID diagnosis (ICD codes B97.29, B97.21, J12.81, B34.2, U07.1) was made after the index date, and before the diagnosis of lung metastases or the last follow-up date. The start date of COVID-19 positivity status was the earliest COVID-19 diagnosis date. Baseline characteristics of gender, race, ethnicity, and age at index date were obtained from structured data.
Analyses: Baseline characteristics were summarized using descriptive statistics. Cause-specific analysis was conducted (death was censored). Univariable and multivariable Cox Proportional Hazard Models were used to evaluate the effect of COVID-19 diagnosis on the risk of metastasis to the lungs, in which COVID-19 diagnosis status was treated as a time varying covariate. The multivariable model adjusted for patient characteristics considered relevant, including age, race ethnicity, and gender. The unadjusted and adjusted hazard ratio with the corresponding 2-sided 95% confidence interval were reported. The two-sided likelihood ratio tests were conducted. The significant level was 0.05. Time to metastases to the lungs was defined as time from index date to date of metastases to the lungs. Patients without a date of pulmonary metastases were censored at the last confirmed activity date or death. Last confirmed activity was defined as the latest date of vitals record, medication administration, or reported laboratory tests/results. The statistical analyses were conducted using R version 4.1.035.