Transmission pairs and serial intervals
We compiled a database of 1,407 COVID–19 transmission pairs, in which symptom onset dates and social relationship were available for both the infector and infectee of 679 transmission pairs (see Table S1 for entire database, and supplementary materials section 1 and 2 for more data descriptions). Household and non-household transmissions were identified from the information of social relationships (e.g., familial members of the same household, non-household relatives, colleagues, classmates, friends, and other face-to-face contacts). The data were reconstructed from the publicly-available reports of 9,120 confirmed COVID–19 cases reported by 27 provincial and 264 urban health commissions in China outside Hubei province. Data from Hubei province were excluded because there was less reliable information on chains of transmission during widespread community circulation of COVID–19, whereas outside Hubei province it was more straightforward to link connected cases and derive serial intervals. We focused on 677 transmission pairs with infectors developed symptom from January 9 through February 13, 2020. This 36-day period covers a series of key events related to the evolving epidemiology and transmission dynamics of COVID–19 in mainland China (22–24).
We first calculated the number of transmission pairs in our database by the onset dates of infectors (fig. S2). Since a large number of infectors (339, ~50% of data) developed symptom during January 23–29, 2020, we defined this 1-week period as the peak-week period, the earlier14-day period (January 9–22, 2020) as the pre-peak period, and the later 15-day period (January 30—February 13, 2020) as the post-peak period. We computed the serial interval as the number of days between the symptom onset dates of the infector and that of the infectee for each transmission pair. Empirical serial interval distributions for transmission pairs with infectors developed symptom during each of these 3 successive and non-overlapping periods suggest that the serial intervals were substantially shortened over time (Fig. 1A)..
We then estimated the serial interval distribution during each non-overlapping period by fitting a normal distribution to the corresponding serial intervals data (see Supplementary Materials section 3). In analysis of the entire dataset of 677 transmission pairs, we estimated that the serial interval distribution had a mean of 5.1 (95% credibility interval, CrI: 4.7, 5.5) days and standard deviation of 5.3 (95% CrI: 5.0, 5.6) days (table S2), which are consistent with recent studies (16, 21, 25). However, fitting to data of non-overlapping subsets revealed considerable variation in serial interval distributions over time (Fig. 1B).. The mean and standard deviation of serial intervals were estimated to be 7.8 (7.0, 8.6) days and 5.2 (4.7, 5.9) days during the pre-peak period, reduced to 5.1 (4.6, 5.7) days and 5.0 (4.6, 5.4) days during the peak-week period, and be further shortened to 2.6 (1.9, 3.2) days and 4.6 (4.2, 5.1) days during the post-peak period, respectively (table S2).
Next, we examined the evolution of transmission pairs using a series of running time windows with fixed length of 10, 14 or 18 days (fig. S3). In stark contrast to the use of a single stable distribution of serial intervals, our analysis suggests that the serial intervals were gradually shortened over the study period (Fig. 1C),, which is robust against alternative specifications of time windows (fig. S3). By fitting the transmission pairs data of each running time-window via Markov Chain Monte Carlo (MCMC) method (Fig. 1C and table S3), we estimated that during the first 14-day period (January 9—22, 2020) the serial intervals were longer on average (mean: 7.8 (95% CrI: 7.0, 8.6) days, and standard deviation (sd): 5.2 (95% CrI: 4.7, 5.9) days), whereas during the last 14 days (January 30—February 13, 2020) the serial intervals were much shorter on average (mean: 2.2 (1.5, 2.9) days, and sd: 4.6 (4.1, 5.1) days). Notably, the mean serial intervals were shortened by more than threefold over the 36-day period.
In our data of 677 transmission pairs, the information of age, sex, household, and isolation delay (i.e., time duration from symptom onset to isolation) is available for most infectors. This allows a granular stratification. Using either non-overlapping or running time windows for data stratified by each of these factors, we find a similar decreasing pattern of serial intervals over time (Fig. 1, B and C, tables S2 and S3). Therefore, we termed this evolving serial interval over time as the “effective serial interval”, which accounts for temporal changes due to the effects of its potential driving factors. Notably, the length of effective serial intervals is positively associated with the length of isolation delay (Figs. 1C and 2A to C, tables S3 and S4), accounting for the decreasing pattern of isolation delay over time (fig. S4). Considering all 677 transmission pairs together, early isolation (shorter than the median isolation delay) is associated with shorter serial intervals (mean: 3.3 (2.7, 3.8) days, and sd: 4.5 (4.1, 4.9) days), and delayed isolation (longer than the median isolation delay) is associated with a longer serial interval (mean: 6.8 (6.2, 7.3) days, and sd: 5.3 (4.9, 5.7) days) (table S2). Stratification by age, gender or household shows no clear difference in serial interval estimates. Our findings are robust against using alterative distributions (e.g. Gumbel distribution) for model fitting (fig. S5).
Effect of non-pharmaceutical interventions on shortening effective serial intervals over time
To understand the influence of isolation delay, we first used a probabilistic model of transmission pairs to analyze the effect of reducing the time delay between illness onset and isolation of the infector on serial interval distribution. This model considers that the infector has a time-varying infectiousness that starts Ci days before symptom onset, reaches a peak around the time of illness onset, and then declines thereafter (Supplementary Materials section 4). We parameterized the start time Ci and infectiousness profile with published data (20). This simple model suggests that serial intervals tend to be shortened with reduced time delay in isolating infector, regardless of when infector starts to be infectious before illness onset (Fig. 2A)..
To further examine the association between serial interval and isolation delay, we simulated serial intervals using an individual-based model that tracks the infection and symptom onset times of each case (Supplementary Materials section 5, Fig. 2, B and C, and table S4). Given a mean generation time of 7.8 days (as per our estimate for the pre-peak period of COVID–19 in mainland China), the simulated mean serial intervals reduces from ~8.0 to ~1.2 days when the isolation delay reduces from 10 to 0 days. Given a mean generation time of 5.1 days (as per our estimate for the peak-week period of COVID–19 in mainland China, which is similar to the estimates by Zhang. et. al. (25)), the simulated mean serial interval reduces from ~5.1 to ~1.5 days when isolation delay reduces from 10 to 0 days. Similar outcomes were obtained with alternative generation times with a mean of 2.6 days (estimate for the post-peak period) and 8.4 days (as estimated for the 2003 SARS epidemic (26)) (table S4). The evolution of serial intervals is less sensitive to the change in initial effective reproduction number, Re (a measure of initial transmission accounting for the effect of control measures). The above analytical and simulated models validate that serial interval is positively associated with isolation delay.
Since the implementation of a cordon sanitaire around Wuhan on January 23, 2020, multiple NPI strategies have been implemented in more than 260 Chinese cities, including the isolation of confirmed and suspected cases, suspension of intra-city public transport, suspension of travel between cities, social distancing by closure of entertainment and public gathering venues (e.g., bar, cinema, park) as well as public services (e.g., shopping malls, restaurants), and recruitment of governmental staff and volunteers to enforce quarantine (fig. S6). As the pandemic unfolds, the accumulation of population immunity may also alter the risk of infection over time (27–29). To study the influence of these factors on COVID–19 transmission, we developed a series of linear multivariable regression models to predict empirical serial intervals with infectors that developed symptom on each day (Supplementary Materials section 6). The basic regression model only accounting for isolation delay can explain up to 51.5% of variability in daily empirical serial intervals, indicating isolation delay as the prime factor. The improved models that combine the basic model with one of the additional factors (NPI strategy or accumulation of population immunity) can explain a further maximum of 15.6% - 20.3% variability in daily empirical serial intervals (table S5). The model fitting further suggests a potential explanation about how serial intervals can be modulated by respective interventions over the span of outbreak (Fig. 2D to F).. We found that, per day of early isolation, the predicted serial interval decreased by 0.7 (95% confidence interval, CI: 0.4, 0.9) days on average. Although the effects of these additional factors in combination of isolation delay are identified specifically in non-household setting, we were not able to detect their individual contribution to change serial intervals (table S5).
Real-time transmissibility estimated with a single stable serial interval distribution versus effective serial interval distributions
The real-time transmissibility of an infectious disease is often characterized by the instantaneous reproduction number (Rt), which is defined as the expected number of secondary infections caused by an infector on day t. The epidemic is capable to spread when Rt>1 and under control when Rt<1. To estimate Rt, a routine protocol is to approximate the generation time distribution with a single stable serial interval distribution. Let wi be the serial interval distribution that approximates the infectiousness profile of an infected individual at i-th day since infection. Then, the daily estimate of
is calculated as the ratio between the number of cases It on day t and the weighted average of infectiousness caused by cases infected before day t,
To examine the effect of serial intervals, we first obtained the daily number of cases based on the onset dates of infectors and infectees in our entire database of 1,407 transmission pairs (Fig. 3A). Then using the statistical method developed by Cori et al (30), we estimated Rt for each day between January 20 and February 13, 2020. We noticed a substantial difference in estimates of Rt between using a single stable serial interval distribution and time-varying effective serial interval distributions. The magnitude of this difference depends on the discrepancy between the single fixed serial interval distribution and the time-varying serial interval distributions, and is more prominent during the pre-peak and post-peak periods compared to that of during the peak week when Rt≈1 (Fig. 3 and fig. S7).