Description of dataset
The study flowchart is shown in Figure 1. Thirteen studies – Six randomised controlled trials and seven retrospective non-randomised comparative studies covering a total of 1467 participants – met our final inclusion criteria. All the studies were conducted in China: seven in Wuhan, Hubei; one study (25) in Qiandongnan, Guizhou; one (26) in Beijing; one (27) in Changsha, Hunan; one (28) in Shiyan and one (29) in Xiangyang, Hubei; another one (30) was a large scale of study recruiting patients from 23 hospitals of nine provinces of mainland China. They covered three Medicines (LHQW, JHQG and XBJ) and one Formulation (QFPD decoction). No relevant study was identified from China CDC, NHC and SATCM’s websites.
The key characteristics of the included studies are shown in Table 1. A table of excluded studies with reasons for exclusion is given in the Supplementary Material 2.
Eleven studies reference China’s national guideline (4th to 7th revisions) to select study participants. The diagnosis criteria evolved in these revisions. The 5th revision published in early February allowed a clinical diagnosis for patients from high-risk areas (Hubei Province) without laboratory confirmation, if chest imaging was typical. This was later cancelled in the 6th revision. The 7th revision published in early March added antibody test as an option of laboratory tests. Two studies (30, 31) followed the 4th guideline to select patients, one of which (31) only involved suspected cases. These suspected cases would be considered “clinically diagnosed” if the 5th guideline criteria were applied. Eight studies followed 5th or 6th guideline (25, 28, 29, 32-36) with confirmation of laboratory testing, and one of them (32) included a special inclusion requirement of being hospitalized for more than 6 days. One study (35) recruited both suspected and diagnosed cases according to the 7th treatment guideline, and used epidemiological history, clinical symptoms, CT images and etiological evidence as criteria. Two studies (26, 27) did not mention guideline-based diagnosis. Two studies captured post-acute COVID data (26, 29), while none followed long enough to observe potential chronic COVID symptoms. Eight studies (26, 31-33, 36, 37) provided a breakdown of participants’ underlying conditions, most commonly hypertension (ranging from 12.2% to 33.3%), coronary heart disease (2.1% to 16.2%), stroke (5.9% to 15.9%), diabetes (7.8% to 25.6%). Two studies (35, 37) reported a small number of patients with COPD (1.1% to 4.9%). One study (37) included a small number of patients with pre-existing respiratory disease (chronic obstructive pulmonary disease and tuberculosis, about 3%). Other small proportion of underlying condition reported including chronic kidney and liver disease, cirrhosis, bronchial asthma, hyperlipidaemia and diseases not specified.
In all studies except one arm in (35), CHM medicines or formulations were used in conjunction with usual care (as recommended in the current version of the Chinese national guideline), and compared with usual care alone. ‘Usual care’ in all the studies included three main approaches: nutrition and supportive treatment, symptomatic treatment and antiviral and antibacterial treatment.
Quality appraisal of included studies
The results of quality appraisal of the included studies are shown in Figure 2.
Results of quality assessment of the RCTs are shown in Figure 2a. All six trials had some form concerns or were considered having high risk of bias. When evaluating the randomization process, three trials (30, 35, 37) produced random sequences through SPSS or SAS software, whilst three trials (25, 27, 36) used random number table. Two trials (30, 37) concealed the allocation until the completion of enrolment. The studies (25, 27, 36) did not report allocation concealment. One study (35) was designed as non-blind and patients were grouped using a block random method, and this trial was assessed of high risk in the randomization process. This trial (35) was judged to be of high risk of bias in measurement of the outcome, since assessors’ and patients’ knowledge of highly promoted interventions could influence assessment on outcomes such as symptom improvement. The other four studies (30, 35-37) were apprised as high rick in the same aspect. The other two RCT were open labelled as well. However, because their main outcomes are laboratory tests, they were judged as being low risk of bias. Three studies (25, 27, 36) did not report information about whether patients were aware of their allocation. Also four studies (25, 27, 35, 37) reported no trial registration information on the manuscripts, and we cannot match the studies to protocols we retrieved from Chinese Clinical Trial Registry, we judge they of some concerns with the domain of ‘selection of reported result’. The study (30) was the only one registered with the Number: Chi CTR-TRC-2000029434, but this study did not include intention-to-treat analysis which was considered as inappropriate to estimate the effect of assignment to intervention.
Of the non-randomised studies (all of which were retrospective cohort studies) (31, 32, 34) three were found to be of fair quality, the other four (26, 28, 29, 33) were of outstanding quality (Figure 2b). All studies had extensive exclusion criteria for major diseases (including renal disease, cancer and immunodeficiency), and all but one study (37) excluded comorbid respiratory diseases, though as the presence of these comorbidities is low for Chinese COVID-19 patients the population is likely to be representative of patients with COVID-19 (38). The exposed and non-exposed cohort were from the same community. Two studies (31, 34) failed to have comparability on the basis of study design, the other studies normally controlled for age or disease severity of patients. All the studies were completed, but only two were considered of enough follow-up length: one study (26) conducted lasted for 25 days, and clearly- recorded data of nucleic acid test and pneumonia recovery situation were collected till the 15th day of hospitalization. Another one (29) lasted for 22 days. The others were finished within 7 to 10 days. All studies used medical records to ascertain exposure and did not stipulate the outcome of interest at the beginning of the studies, suggesting a potentially significant source of bias.
Effects of interventions on outcome measures
The included trials featured four comparison groups: LHQW (plus usual care) versus usual care (six studies); XBJ plus usual care versus usual care (three studies); JHQG plus usual care versus usual care (two studies), and QFPD plus usual care versus usual care (two studies) (Table 1).
Primary outcome
Our primary outcome measure (change in disease severity category according to clinical guidelines) was adequately reported in only one (non-randomised) study. One study (32) claimed that there was a significantly lower proportion of patients becoming severe in the treatment group compared to the comparator group, as judged by a p value less than 0.05 (see Table 2 for numbers). However, our own calculation reveals that p=0.0503 (see Supplementary Material 3 for details).
One randomised controlled trial (36) reported changes in disease severity but we choose not to include these findings because the definition of category used as treatment outcome was not clear. There was also inconsistency in the numbers presented in this study (see Supplementary Material 4 for details). Moreover, the study included both mild and moderate patients, but only presented data on progression to severe or dead, missing progression from mild to moderate and progression to critical. We wrote to the corresponding author for clarification, but received no response.
One retrospective analysis (33) of QFPD decoction showed no significant difference in the numbers of patients being cured (as defined by the Chinese national guideline).
Secondary outcomes
Improvement in symptoms
Primary studies measured symptom resolution differently. Fever resolution, for example, was measured in three ways: time taken for fever to resolve, whether fever was resolved after at the end of treatment, and change in symptom score. Assigning a score to a symptom is a common practice in CHM studies, although it has been criticised for systematic errors, non-standardized use in each study and statistical inappropriateness (39). As a result, we will not report on the Traditional Chinese Medicine (TCM) scoring of symptoms, but have included additional information in Supplementary Material 5.
Figures 3a to 3o show the results of meta-analysis of studies which tested the effectiveness of 3M3F on 13 reported COVID-19 symptoms. Limited findings suggested that 3M3F may reduce time took to fever recovery by SMD -0.98 days, 95% CI -1.78 to -0.17; participants = 163; studies = 3; I2= 83%. There were larger proportion of COVID-19 patients benefited from 3M3F in recovery of fever, cough, fatigue/tiredness, phlegm, short of breath and muscle pain, but not in the other seven symptoms reported (Table 2).
One RCT comparing LHQW granule as an add on to antiviral and antimicrobial treatment in line with 7th edition of national guidelines failed to show a reduction in the proportion of patients with improved fever RR 1.00 [0.91, 1.10], cough RR 0.86 [0.69, 1.06], fatigue RR 1.05 [0.84, 1.33], diarrhoea RR 1.00 [0.80, 1.25], nausea/vomiting RR 0.98 [0.75, 1.26], or loss in appetite RR 1.00 [0.80, 1.25], comparing LHQW granule to usual care (35).
Data from three retrospective cohort studies (32-34) showed a statistically significant effect in favour of 3M3F in reducing time to fever resolution at 0.98days, 95% CI -1.78 to -0.17; participants = 163; I2 = 83%) (Figure 3a). Two retrospective cohort studies (31, 32, 34) and a single RCT (37) tend to suggest larger proportion of patients with fever resolved by taking LHQW (granule) and JHQG together with usual care RR 1.38, 95% CI 1.19 to 1.61; participants = 318; I2 = 0%) (Figure 3b).
There was large heterogeneity among studies reporting the proportion of patients with cough resolved and they showed conflict findings. Three retrospective cohort studies (31, 32, 34) favoured LHQW group RR 1.90, 95% CI 1.24 to 2.90; participants = 199; I2 = 18%, while a RCT failed to prove the favourable effects of JHQG plus usual care versus usual care RR 1.54, 95% CI 0.97 to 2.45 (37) (Figure 3c).
Similar positive findings from RCTs or retrospective cohort studies were observed in the proportion of patients with symptom resolution in fatigue/tiredness (RR 1.48, 95% CI 1.18 to 1.86; participants = 219; studies = 3; I2 = 0%, Figure 3d), phlegm (RR 1.97, 95% CI 1.08 to 3.61; participants = 176; studies = 4; I2 = 52%, Figure 3e), shortness of breath (RR 3.93, 95% CI 1.89 to 8.17; participants = 83; studies = 3; I2 = 0%, Figure 3f), and muscle pain (RR 1.83, 95% CI 1.02 to 3.27; participants = 49; studies = 3; I2 = 2%, Figure 3g). On the contrary, studies with small samples failed to show a favourable effect over 3M3F in the resolution of chest tightness (RR 2.00, 95% CI 0.81 to 4.96; participants = 89; studies = 3; I2 = 64%), diarrhoea (RR 1.09, 95% CI 0.65 to 1.82; participants = 35; studies = 3; I2 = 0%), nausea/vomiting (RR 1.25, 95% CI 0.82 to 1.90; participants = 43; studies = 3; I2 = 0%), loss in appetite (RR 0.63, 95% CI 0.14 to 2.84; participants = 33; studies = 3; I2 = 55%), sore throat (RR 1.35, 95% CI 0.68 to 2.70; participants = 26; studies = 3; I2 = 0%), headache (RR 1.21, 95% CI 0.83 to 1.77; participants = 47; studies = 3; I2 = 0%), or block/running nose (RR 1.00, 95% CI 0.64 to 1.57; participants = 23; studies = 3; I2 = 0%).
Table 3 shows the impact on symptom resolution in studies which were not amenable to meta-analysis. Statistically significant differences were shown for LHQW capsule (time to resolution of fever, cough, and fatigue), LHQW granule (time to resolution of cough, shortness of breath, symptom scores for fever, dry and sore throat), and QFPD decoction (time to resolution of cough).
Recovery or improvement of chest CT manifestations
Significant changes were shown in two retrospective cohort studies in time to reduction in lung lesion on CT scan, in QFPD (decoction) -4.80 days, 95% CI -5.82, -3.77, and JHQG (decoction) - 0.53 days, 95% CI -0.98, -0.08 at day 15, as adds on to usual care. In addition, there was a larger proportion of patients experiencing recovery/improvement of chest CT manifestations (RR 1.16, 95% CI 1.03 to 1.30; participants = 521; 3 retrospective cohort studies; I2 = 0%, Figure 3o).
Other secondary outcome measure
Inconclusive findings on blood test results, length of hospital stay, viral conversion, and medication used are reported narratively (Table 4). One non-randomised study found statistically significant differences in favour of LHQW in four laboratory tests (white cell count, lymphocyte count, C-reactive protein and procalcitonin). The clinical significance of these results is not clear and the authors do not discuss them. Inconclusive findings were observed in reduction in length of stay: one small, non-randomised study (33) showed a statistically significant reduction in length of stay in those received QFPD decoction, while one (29) failed to show the same.
Adverse events
No study reported any serious adverse events (AE). Four studies did not discuss AE in their results (29, 32, 34, 35). Among those that discussed AEs, three suggested no AE was observed either in the 3M3F or the comparator groups (25, 27, 31) and one reported no serious side effects (36). One RCT (30) reported 45.8% (65/142) cases of AEs including abnormal liver function, renal dysfunction, headache, nausea, vomiting, diarrhoea and loss of appetite in the add-on LHQW capsule, while the control group reported 54.2% (77/142) cases with adverse events, including abnormal liver function, renal dysfunction, headache, nausea, vomiting, diarrhoea and loss of appetite. However, such comparison of this study (30) was found with no statistical significance at 0.84, 95% CI 0.67 to 1.07. The RCT of (37) using JHQG reported diarrhoea in 27 out of 82 (33%) participants in TG versus 0 in CG, and this result has statistically significant difference.