The observed SIs of all 21 samples have a mean at 4.3 days, median at 4 days, interquartile range (IQR) between 2 and 5, and range from 1 to 13 days. For the 12 ‘infector- infectee’ pairs, the observed SIs have a mean at 3 days, median at 2 days, IQR between 2 and 4, and range from 1 to 8 days. Fig 1 shows the likelihood profiles of varying SI with respect to μ and σ of SI. In Table 1, for the non-truncated scenario (i.e., using Eqn (1)), we found the three distributions have almost equivalent fitting performance in terms of the AICc. The Lognormal distribution has the lowest AICc, and thus it is presented as the main results for the SI estimation. By using all 21 samples, we estimated the mean of SI at 3.9 days (95%CI: 2.8−7.2) and SD of SI at 2.6 days (95%CI: 1.6−9.3). Between the observed and the fitted distributions, the Pearson’s correlation is 0.98, and the R-squared is 0.97. These estimates largely matched the results in the existing literatures (27, 30, 31). Limiting to only consider the 12 ‘infector-infectee’ pairs, we found the Lognormal distribution also outperformed, and we estimated the mean of SI at 3.0 days (95%CI: 1.9−6.8) and SD of SI at 2.0 days (95%CI: 1.0−10.5). In this case, the Pearson’s correlation is 0.96, and the R-squared is 0.92. The fitted Lognormal distributions were shown in Fig 2.
For the right-truncated scenario (i.e., using Eqn (2)), the Lognormal distribution also outperformed in terms of the AICc, see Table 1. By using all 21 samples, we estimated the mean of SI at 4.9 days (95%CI: 3.6−6.2) and SD of SI at 4.4 days (95%CI: 2.9−8.3). By only using the 12 ‘infector-infectee’ pairs, we estimated the mean of SI at 3.0 days (95%CI: 2.1−3.9) and SD of SI at 2.0 days (95%CI: 1.2−4.6). The Pearson’s correlation and coefficient of determination were no longer applicable here since the likelihood function was adjusted and thus not solely depended on the SI observations.
Comparing to the SI of SARS with mean at 8.4 days and SD at 3.4 days (32), the estimated 4.9-day SI for COVID-19 indicated rapid cycles of generation replacement in the transmission chain. Hence, highly efficient public health control measures, including contact tracing, isolation and screening, were strongly recommended to mitigate the epidemic size. The timely supply and delivery of healthcare resources, e.g., facemasks, alcohol sterilizer and manpower and equipment for treatment, were of required in response to the rapid growing incidences of COVID-19 (4, 33). In the places with less developed healthcare systems and limited medical resources, such rapid growing of the epidemic may cause huge burden to public health system. Therefore, preparedness and pre-cautious for the risk of COVID-19 are crucial to minimize impacts (34, 35).
As also pointed out by recent works (27, 30, 31), the mean of SI at 4.9 days is slightly smaller than the mean incubation period, roughly 5 days, estimated by many previous studies (36-39). The pre-symptomatic transmission may occur when the SI is shorter than the incubation period. If isolation can be conducted immediately after the symptom onset, the pre-symptomatic transmission is likely to contribute to the most of SARS-CoV-2 infections. This situation has been recognized by a recent epidemiological investigation evidently (40), and implemented in the mechanistic modelling studies of COVID-19 epidemic (4, 41), where the pre-symptomatic cases were contagious. As such, merely isolating the symptomatic cases will lead to a considerable proportion of secondary cases, and thus contact tracing and immediately quarantine were crucial to reduce the risk of infection. In addition, we would like to point out that minor negative SI observations were reported in recent studies (30, 31, 42-44). The negativity in the SI may occur when the incubation period is short with a large variance. However, negative value was not observed in our dataset, which may be due to the small sample size. We further remark that this is unlikely to bias estimation of mean SI, but may lead to a slight underestimation of the SD of SI. The purpose of estimating SI is to approximate the generation interval (time lag of infections of successive cases) which is strictly positive. Caution should be taken when dealing with negative SI.
A recently epidemiological study used 5 ‘infector-infectee’ pairs from contact tracing data in Wuhan, China during the early outbreak to estimate the mean SI at 7.5 days (95%CI: 5.3−19.0) (37), which appeared larger than our SI estimate at 4.9 days. Although the 95%CIs of SI estimate in this study, consistent with previous studies (27, 29-31), and those in Li et al (37) were not significantly separated, the difference in the SI estimates might exist. If this difference was not due to sampling chance, one of the possible explanations could be enhanced public awareness and swift control measures including the contact tracing and isolation implemented in Hong Kong. Since Hong Kong was the hit-hardest in the SARS outbreaks in 2003 (18, 19), the local public health control was one of the most effective in the world. In the initial phase of the outbreak in Wuhan, the transmission occurred without sufficient awareness and effective intervention, thus the SI estimate in Li et al (37) may be regarded as the intrinsic (wild) SI of COVID-19. Whereas the SI estimate in Hong Kong may be regarded as the effective SI, in more practical situation when timely action (quarantining cases and their close contacts) in place (45), such that one case could be isolated before having chance to further infect others. If timely action was not in place, infections of longer serial interval may occur. Thus, shorter SI observations might be an outcome of effectiveness in control in a location. The practice in Hong Kong is an example for other regions, including less developed countries.
The SI estimate can benefit from larger sample size, and the estimates in our study was based on 21 identified transmission events including 12 ‘infector-infectee’ pairs. Although the sample size was smaller than 28 transmission events in Nishiura et al (27), 71 in You et al (31) and 468 in Du et al (30), the advantage of this analysis included the 21 transmission events are all identified in Hong Kong. Hence, the surveillance data were under consistent reporting and recording standards, which further reduced the heterogenicity in the observations. Our analysis can be improved if larger records on the local transmission events. Furthermore, a comparison between different localities is important, which sheds light on the effects of different external factors on SI.
Accurate and consistent records on dates of illness onset were essential to the estimation of the SI. All samples used in this analysis were identified in Hong Kong and collected consistently from the CHP (16, 17). Hence, the reporting criteria were most likely to be the same for all COVID-2019 cases, which potentially made our findings more robust.
The clusters of cases can occur by person-to-person transmission within the cluster, e.g.,
- scenario (I): person A infected B, C and D, or
- scenario (II): A to B to C to D, or
- scenario (III): a mixture of (I) and (II), e.g., A to B, B to C and D, or
or they can occur through a common exposure to an unrecognized source of infection, e.g.,
- scenario (IV): unknown person X infected A, B, C and D; or
- scenario (V): a mixture of (IV) and (I) or (II), e.g., X to A and B, B to C and D; or
The lack of information in the publicly available dataset made it difficult to disentangle such complicated situations. The scenarios (I) and (II) can be covered by the pair of ‘infector-infectee’ such that we could identify the link between two unique consecutive infections. Under the scenario (III), we cannot clearly identify the pairwise match between the infector and infectee, which means there were multiple candidates of infector for one infectee. As such, we employed the PDF h(∙) in Eqn (1) to account for the possible time of exposure ranging from Tlow to Tup. There is no information available on the SI for scenarios (IV) as well as (V) due to the onset date of person X is unknown, and thus our analysis was limited in the scenarios (I)-(III). We note that extra-cautious should be needed to interpret the clusters of cases because of this potential limitation. Although we used interval censoring likelihood to deal with the multiple-infector matching issue, more detailed information of the exposure history and clue on ‘who acquires infection from whom’ (WAIFW) would improve our estimates.
Longer SI might be difficult to occur in reality due to the isolation of confirmed infections, or to identify and link together due to the less accurate information associated with memory error occurred in the backward contact tracing exercise. The issue associated with isolation could possibly bias the SI estimates and lead to an underestimated result (27). It is possible that at the initial stage the SI is longer than later when strict isolation takes place. Nevertheless, a comparison of estimated SI for SARS and COVID-19 in Hong Kong is still meaningful. We found that the SI of COVID-19 estimated appears shorter than that of SARS. It would be hard to imagine that isolation is responsible for the difference. It is unlikely that the isolation is more rapid in cases of COVID-19 than cases in SARS in Hong Kong, as well as other limitations (would have happened for both). Thus, the difference we observed for COVID-19 and SARS is likely intrinsic. In conclusion, given the rapid spreading of the COVID-19, effective contact tracing and quarantine/isolation were even more crucial for successful control.