In total, 972 SARS-CoV-2 cases were identified among university students and staff over the course of term (5th October to 6th December 2020). High-quality genomes were recovered from 446/778 (57.3%) positive cases from the university testing programme. High-quality genomes were recovered from 107/266 (40.2%) cases identified through the Healthcare worker (HCW) screening programme (95 HCWs, 8 students, 4 university staff) and 104 patients identified by hospital testing (71 SARS-CoV-2 positive patients from Cambridge University Hospitals (CUH) and 33 from other medical facilities in Cambridgeshire). A further 797 local cases identified by community testing during the study period were present within the COG-UK dataset, of which 17 were identified as students, 7 as university staff and 26 as HCWs (Figure 1). Of all identified SARS-CoV-2 cases from Cambridgeshire (university and community) during this period, 8.0% were sequenced (Extended Data Figure 1).
SARS-CoV-2 lineages and transmission clusters
Over the 9-week term, 62 Pango lineages were identified across the university and community (Figures 2a and 2c). In the university, 23 Pango lineages were identified, and 438/482 (90.9%) cases were from just 4 lineages (B.1.60.7, B.1.177, B.1.36, B.1.177.16), all of which were detected by the second week of term. 12 lineages were only observed after the second week of term and accounted for 6.9% cases. By comparison, 57 lineages were identified in the local community over the same 9-week period. Viral genomes containing mutations in the spike protein that have been linked to decreased sensitivity to antibody-mediated immunity or impact viral transmission were observed in the university population; 3 sequences from the B.1.258 lineage containing the N439K mutation and ∆H69/∆V70, 2 cases of B.1.1.7 and its associated mutations12, and 88 cases of B.1.177 with the A222V mutations13.
In total, 198 putative transmission clusters were defined by CIVET, including 16 clusters of 2 or more university members. Only 8 clusters contained 5 or more university members (range 6-337), which represented 91.3% of all university cases, signifying that the majority of introductions into UoC did not cause ongoing transmission. To further investigate the largest of these, cluster 1 described below, we identified groups of identical samples (0 SNP differences) which produced 19 additional clusters (a total of 34 university clusters) for further analysis.
Determinants of viral spread across the university
To determine transmission dynamics following introduction into the university, we performed a detailed investigation of the largest genomic cluster (Cluster 1), which accounted for 337/484 (69.6%) sequenced university cases (Figure 3). This was widely dispersed across the university by the middle of term, affecting students from 29/31 Colleges, 28 undergraduate courses and 208 households in university accommodation alone (Figure 4).
Cluster 1 was classified as belonging to Pango lineage B.1.160.7. No mutations previously noted to be associated with increased transmissibility were observed in this lineage compared to other genomes in the study. Interrogation of the entire COG-UK dataset of samples from 2020 showed that this lineage was first identified in the UK on 4th October 2020, in Wales, before becoming predominantly sampled in UoC (Figure 3b). The B.1.160.7 lineage was not identified in the local community until term week 3, suggesting that the university cases were introduced from outside Cambridgeshire. This was supported by the median estimate of the time to the most common recent ancestor of cluster 1, in comparison to its most closely related cluster from Cambridgeshire community isolates of 115 days (C.I. 91-148) prior to the start of term. Additional analysis with A2B-COVID showed that these sequences were consistent with a single introduction into the university (Figure 3c).
National and university contact tracing data were used to identify the initial source of dispersion of this cluster. Ten students from the first two weeks of term reported visiting the same nightclub (venue A). Nine individuals either had an isolate from cluster 1 or (in the event that their sample did not yield a high-quality sequence) were household contacts of an individual with a sequenced cluster 1 isolate. No information was available for one student.
Transmission of cluster 1 was sustained from the first week of term until a national lockdown was enforced on 5th November. Students testing positive in the two weeks around lockdown reported common exposure events predominantly linked to nightclub venues (25/59 (42.4%) of exposures external to the university reported by 48 students). Venue A, identified above as the possible source of dispersion of this cluster at the start of term, was also the most common venue identified in the two weeks around lockdown (n=16). 9/16 cases had sequences in cluster 1, and a further 5 individuals (where no sequence was available) were household contacts of sequenced cases in cluster 1 (Extended Data Figure 5).
To determine the impact of lockdown and other control measures within the university, a birth-death skyline model14 was used to measure changes in the effective reproduction number (Re) within cluster 1. The model indicated an initial Re at the start of term that was slightly larger than 1, albeit with wide uncertainty (median 1.11; 95% HPD: 0.24-2.08 on 5th October). Over the next 2 weeks Re continued to rise (median 1.54; 95% HPD 1-2.22 on 15th October) followed by a subsequent gradual decline over the next 2 weeks (Figure 5a). There was a rise immediately prior to the start of lockdown (median 1.53; 95% HPD 1.24-1.84 on 5th November), followed by a steep decrease thereafter (median 0.25; 95% HPD 0.09-0.44 on 19th November) (Figure 5a), consistent with declining absolute numbers of SARS-CoV-2 infections seen during this time (Figure 2c). The model estimated the median effective infectious period for individuals in the cluster at 2.91 days (95% HPD: 2.38-3.47 days) (Figure 5b). As the model does not explicitly incorporate an incubation period and assumes that individuals cannot transmit after being sampled, the effective infectious period represents the mean time from infection until testing positive and assumes perfect infection control measures thereafter. Estimates of Re and the effective infectious period are robust to model parameterisations (Extended Data Figures 6 and 7). Sampling proportion estimates largely overlap with empirical estimates based on the number of positive cases that were sequenced during each week (Figure 5c). Although sampling proportion estimates are sensitive to the prior specifications, Re estimates are unaffected (Extended Data Figure 8).
Transmission within university households
There was evidence of transmission of SARS-CoV-2 in student accommodation in 18/34 university clusters. In cluster 1, 169/337 (50.1%) students had a virus genome sequence identical to at least one other student living in the same or neighbouring household (sub-clusters within 0 SNPs ranging between 2-11 students).
The largest cluster associated with transmission in accommodation was cluster 2 (lineage B.1.36). By term week 3, this cluster involved 30 students, of which 24 (80%) lived in the same accommodation block in College A and 4 students lived in 2 separate households in the same college (Extended Data Figure 9). Interventions from the university, supported by local public health authorities, included isolation of all households in the main accommodation block and individual screening offered to all students. Half of all cases in this cluster were diagnosed by asymptomatic screening. No further genomically-related isolates were identified after term-week 3, indicating a successful intervention, and cessation of transmission.
To quantify the importance of household transmission, a Reed-Frost Chain Binomial Model was employed to estimate the household attack rate. 265 households in which the data were consistent with only 1 introduction of SARS-CoV-2 were identified using A2B-COVID. The per household contact probability that an infected person passed on the virus to an uninfected individual within the same household was estimated at 7.8% (95% C.I. 6.9-8.7%).
Further genomic clusters where transmission between household members was implicated are outlined in supplementary table 1. They follow similar patterns, with groups of cases confined to a single college not leading to sustained transmission.
Other transmission routes among university members
In addition to household transmission, there was evidence of viral spread between students in the same course and year of study in 14/34 genomic clusters, with the highest proportion being students in their first year of study. In cluster 1, 203/337 (60.2%) students had an identical isolate to at least one other student studying the same course in the same year (cluster size range 2-14 students). Statistical modelling using data from cluster 1 across the term showed a bias towards infections being observed in first year students (p-value=0.002) (Extended Data Figure 10, model details in supplementary methods). Two further small clusters comprise postgraduate students working in the same university department.
However, we were not able to determine the probable location of transmission in most cases: there is considerable overlap between course and household clusters, as well as complex social and study networks between students (illustrated in supplementary table 1, for example in clusters 3, 4 and 10). Of note, 23/34 clusters with 2 or more genomically linked cases in the dataset contained at least one university member that could not be epidemiologically linked with any other case in their cluster.
The number of SARS-CoV-2 sequences from university staff members were limited in comparison to students (n=30). There was evidence of transmission between staff members working in the same department, college or ancillary role in four genomic clusters. Two clusters contained staff members who shared the same household. There are 8 clusters involving both university staff and students. However, epidemiological associations between these two groups could only be identified in one cluster: a shared household between a student and staff member working in separate university departments.
Transmission between the university and local community
We next sought to address the degree of transmission between the university and the local community. Two distinct phylogenetic approaches, shown in figure 2, demonstrate segregation of the majority of community and university cases into separate clusters and therefore a lack of substantial cross-transmission. Of the 198 clusters across the dataset, 29 (14.6%) contained both university and community cases. Only 6 clusters contained 5 or more university cases and included 3 or more community cases.
CIVET was run separately with university and hospital (patient and healthcare worker) cases for a focused phylogenetic analysis of this setting. Associations were identified between university and hospital settings, with 17 clusters involving both university members and either patients or staff. Cluster 1 (69.6% of student cases), contained only 1 patient and 1 healthcare worker with no identifiable epidemiological link to students. The remaining 16 clusters comprised 133 individuals, including 26 patients, 55 hospital staff or their family members and 52 university members (including 18 staff and 15 clinical medical students). The second largest cluster of university members (n=21 university and hospital cases) included 9 medical students, 5 healthcare workers and 2 patients. Phylogenetically, the medical students and one of the healthcare workers were closely linked (Extended Data Figure 11) and analysis of these cases with A2B-COVID confirmed plausible transmission. All 9 medical students were on clinical rotations at the time of diagnosis of the index case; 7/9 lived in neighbouring households in the same college and the remaining 2 were named contacts of the index student. Plausible transmission events between this group and the other cluster members were refuted using A2B-COVID (Extended Data Figure 11).
To further investigate epidemiological associations in clusters involving university members and the local community, 1243/1455 of the cases sequenced over the sampling period were linked to national contact tracing data (excluding hospital cases). 219 (17.6%) cases reported 127 common exposure events. Cluster 1, representing 69.6% of cases within the university, included only 18/976 (1.8%) community cases; only one community case had a common exposure with a university student, dining at the same restaurant. No other epidemiological links were identified in all other genomic clusters. Transmission suspected in 19 epidemiologically linked clusters defined by common exposures was refuted by phylogenetic variation.