Genetic diversity of SARS-CoV-2 in the SBCVIC was similar to sequences in GISAID during the same period
According to CoVariants.org[21], 21 K in the Nextstrain[22] clade, specifically BA.1 in the PANGO lineage[23], was initially prevalent in the sixth wave of COVID-19 infection in Japan in early 2022. Subsequently, 21L/BA.2 dominated in the late phase. In June 2022, the beginning of the study period, the sixth wave concluded. Starting from this period, BA.2 was gradually replaced by BA.5 (Fig. 1). The seventh wave peaked in July–August 2022. After September, the prevalence of BA.5 decreased. Simultaneously, more evolved lineages were being detected (Fig. 1). The genetic diversity of the full-length viral sequences from SARS-CoV-2 positive cases diagnosed at the SBCVIC remained constant throughout the study period. It resembled the diversity of sequences registered in GISAID from Japan between June and September, except for GISAID-registered sequences in October, which were approximately twice as diverse of the other monthly groups (Fig. 2). The lineages contributing to this genetic diversity initially included BA.2.3, BA.2.24, and other BA.2s in June. However, they were gradually replaced by infected populations with BA.5.2, BF.5, and other BA.5s starting from July (Fig. 3A). This composition was similar between the SBCVIC and GISAID. However, in October, 46% of the viral sequences from GISAID collected contained BA.2.3.20 and BA.2.75, which were absent in the SBCVIC data.
Number of Omicron variants diagnosed in the SBCVIC differed from that of GISAID and was consistent with the number of reported cases in Japan
Although the number of viral full-genome sequences collected in Japan and registered in GISAID was as high as 887 in June, the number of sequences from the SBCVIC was much lower. The number of sequences from the SBCVIC increased after July, peaked at 762 in August, and subsequently decreased to 69 in October (Fig. 3B). The change in the number of sequences at the SBCVIC was consistent with the number of COVID-19 cases reported in Japan during the same period (Fig. 3B). Conversely, the number of GISAID entries from Japan decreased after June, with only 98 entries in August, the peak of the seventh wave. Bayesian skyline plot analysis of the sequence data revealed that the relative population size trends inferred from SBCVIC-derived sequences were consistent with COVID-19 prevalence since the outbreak of Omicron variants in January. In contrast, the inference using GISAID-derived sequences showed a single wave of the sixth wave and no seventh wave. (Supplementary Fig. 1).
Genetic differences in sequences from the SBCVIC and GISAID showed greater temporal divergence within each month than those across different collection routes
The gross genetic distance between groups of viral sequences collected by the SBCVIC and GISAID in their respective months was, at most, as small as 0.096% of the substitutions/sites (Supplementary Table 1). The maximum net genetic distance between the groups was 0.054%, indicating that 48% of the divergence was attributed to differences in months and collection routes. The net genetic distances between collection routes in the same month were often smaller than the differences between sampling months along the same route (Supplementary Table 1). A dendrogram illustrating the similarity between the groups showed that the groups were divided into two clusters (Fig. 4). One cluster consisted of populations primarily infected with BA.2s and included the June SBCVIC, June GISAID, and July GISAID. The second cluster was primarily composed of populations infected with BA.5s, including the majority of the seventh wave. The October GISAID group was at the top of this cluster, reflecting the high rate of BA.2.75 registrations during this period.
Viral variants from GISAID and the SBCVIC were mixed in the transmission network for BA.2, but were separate subclusters for BA.5.
Transmission cluster and network analyses identified at least 60 components and 157 singleton cases of BA.2- (27/64 components/singletons) and BA.5-related (27/93) lineages detected during the study period in Japan. Among these components, 12 pairs were cases with BA.2 lineages, and 14 pairs were cases with BA.5 lineages (Supplementary Fig. 2A). The BA.2-related components were primarily composed of GISAID entries (Supplementary Fig. 2B), and the SBCVIC cases were located at different locations in both the phylogenetic clusters (Supplementary Fig. 3) and the components (Fig. 5A). The network graph of BA.2 exhibited lower densities (median = 0.202) than BA.5 (0.603); however, the difference was not significant for the Mann–Whitney U test (p = 0.157). Among the 12 BA.2 components containing > 5 cases, the largest one with BA.2.3–2.3.20 consisted of several smaller clusters and two main clusters. One of these clusters included GISAID-derived BA.2.3 cases from August onwards, BA.2.3.20 cases in October, and earlier cases from the SBCVIC. In another large component, BA.2.3.13, two clusters of GISAID entries from different months were observed, with individual divergence of SBCVIC-derived cases. The BA.5-related components of the seventh wave featured numerous cases from the SBCVIC (Supplementary Fig. 2B), with GISAID entries often forming a subcluster separate from the SBCVIC in certain network components (Fig. 5B). Of the 11 BA.5 components containing > 5 cases, most appeared to form separate clusters of SB- and GISAID-derived cases, except for BA.5.2.1. BA.5.2 primarily consisted of three clusters of SB-derived cases, divided by central GISAID entries from August to September. BA.5.1 had more GISAIDs than the other components, and the GISAID cluster was divided into two areas: one collected in July and the other in September. BF.5 distinctly separated the SBCVICs and GISAIDs without exhibiting clear temporal characteristics.