Using GBMTM, for the first time we identified four distinct trajectory groups based on individual time series data of the long-term pneumonia lesion percentage and lymphocyte patterns, providing a comprehensive depiction of the COVID-19 disease course from admission to post-discharge. Regarding lesion percentage, all four trajectories appeared to be positively skewed hill-like curves, and the peak value progressively increased from Groups 1 to 4. For lymphocyte counts, two main types were observed. Groups 1 and 2 started at a low level and then rapidly increased, while Groups 3 and 4 decreased for a short time and then slowly recovered. Patients in the higher order groups were older and more likely to have hypertension and diabetes. Patients in Groups 1 to 4 were characterised by sequentially more severe disease. Furthermore, we found fatal cases were worse than in Group 4 with significantly more pneumonia lesions and lower lymphocyte counts. In contrast, the mild cases demonstrated no pneumonia lesions and mostly normal lymphocyte counts.
Characterising the long-term trajectories enabled us to assess the population heterogeneity over the COVID-19 disease course. Despite the importance of longitudinal evaluation of chest CT and laboratory results as standards of clinical care, only simple and qualitative assessments of the follow-up reports were carried out subjectively in clinical practice. In research areas, a few studies explored the temporal changes of chest CT findings and reported that the lung abnormalities peaked 6–11 days post onset(2, 6),(19), which was consistent with our finding (Table 2). Chen et al. observed that decreased lymphocyte counts recovered to normal in convalescent patients, whereas they maintained low levels among the deceased(5). This finding was confirmed by our results. However, these previous studies ignored heterogeneity between patients by only estimating cross-sectional population-level summaries of only one biomarker in each analysis. For example, Wang et al. used the average proportion of ground-glass opacity at an interval of around 5 days(2). Instead, we took a step forward by applying GBMTM to analyse the patient-level time series of pneumonia lesion percentage and lymphocyte count, in which the interrelationships between the two biomarkers were also captured. Our results showed that COVID-19 patients were heterogeneous and the convalescent patients could be further divided into four subgroups.
Clustering COVID-19 patients into four trajectory groups could be a better method for understanding the disease severity of various patients and thus be useful for triage for tailored treatment. As shown in the results, the trajectory grouping was a stable indicator of overall disease severity, no matter compared to single measures such as lesion percentage, lymphocyte count, or clinical subtypes. This could be because the trajectory groups can capture the dynamic patterns of several key biomarkers instead of one single biomarker at only one single time point. As shown in Fig. 2A, from Groups 1 to 4, the peak lesion level and time from onset to peak level became increasingly larger, and the time for lymphocyte counts to rebound was gradually longer. The severe or critical cases were mostly in Groups 3 and 4 on admission, and more patients in Groups 3 and 4 developed more severe status during hospitalisation. These findings suggest that patients requiring a long time to resolve pneumonia and restore lymphocyte counts should be assigned to intensified care, especially those with declining lymphocyte counts.
Importantly, we observed that the lymphocyte counts were replenished much earlier than the resolution of lung lesions, indicating immune system restoration precedes lung tissue repair. Lymphocyte counts started to recover before symptom onset among the patients in Groups 1 and 2, and after 1 day and 4 days post-onset for the patients in Groups 3 and 4. The durations for lymphocyte count recovery to the normal level (at least 1.1 × 109/L) were 1 day, 2 days, 13 days, and 19 days in Groups 1 to 4, respectively. However, the durations before pneumonia began to resolve were 6.5, 9, 10, and 12 days in Groups 1 to 4, respectively. In light of these findings, monitoring the trends in lymphocyte alterations and the turning point of recovering lymphocytes may help rapidly identify patients at a high risk of adverse outcomes or who are recovering compared with pneumonia lesion volume changes on CT, which are organic injuries and will lag behind in reflecting the disease progression trend. As a result, optimised treatment could be promptly assigned to help reduce complications and mortality.
The heterogeneity in the longitudinal trajectories is also valuable for optimising post-discharge follow-up strategies, such as choosing proper follow-up intervals for different patients. The lesion percentages right before discharge were 0.6%, 2.7%, 6.0%, and 24.6% in Groups 1 to 4, and lymphocyte counts right before discharge were 2.5 × 109/L, 1.9 × 109/L, 1.2 × 109/L, and 1.5 × 109/L in Groups 1 to 4, respectively (Table S2). This demonstrated that the biomarkers were not fully recovered and the values varied across different COVID-19 patients at discharge, indicating that the follow-up strategy should consider such heterogeneity. For example, the follow-up interval after discharge might be chosen with time 0 as the day of symptom onset. Apart from Group 1, which always had a pneumonia percentage less than 5%, the durations from onset to lesion percentage decreasing to 5% were estimated to be 26, 45, and 93 days in Groups 2 to 4, respectively. The durations from onset to lymphocyte count returning to 1.1 × 109/L were estimated to be 1 day, 2 days, 13 days, and 19 days in Groups 1 to 4, respectively. Using these results, we can adjust the follow-up intervals of different subgroups. In general, patients in Groups 3 and 4 needed a more frequent follow-up plan to obtain more intensive surveillance during their longer convalescence period.
There are several limitations in our study. First, we focused on pneumonia lesion volume and lymphocyte counts for the trajectory analysis and ignored other clinical indicators, for example, leukocyte counts and D-dimers, which might have resulted in losing some pieces of the complete picture of COVID-19’s disease evolution. However, CT imaging and lymphocyte counts are usually among the easiest biomarkers to access and interpret at most hospitals, thus making this study more reproducible. Second, to avoid the transmission risk caused by coughing and droplet formation during testing(20), no data on pulmonary function tests were collected during follow-up to assess the survivors’ recovery of lung function. Third, the entire follow-up period was at least two months for all of the patients. Longer follow-up is expected to depict extended evolution of the disease. Finally, future studies are anticipated to collect more densified data points to predict the trajectory group membership of a patient in an early stage of the disease course, which would be beneficial to guide therapeutic strategies as a patient predicted in a higher order group should undergo more intensive surveillance to prevent undesirable outcomes.