Overview of dataset
The study flowchart is shown in Figure 2. From approximately 900 potentially relevant articles, we identified 47 relevant papers of which 15 were empirical studies and 32 were narrative reviews, editorials and commentaries. Of the empirical studies, only two (published in 2020) related to patients with Covid-19; neither was a validation study (both considered whether exertional desaturation predicted outcome). Two small studies (published in 1988 and 1994) considered exertional desaturation as a predictive test for pneumocystis pneumonia in people with HIV; we include them because of physiological parallels with Covid-19 discussed below. The other 11 studies were attempts at validation of one or more exercise desaturation tests in chronic lung diseases other than Covid-19; they were published between 2007 and 2020 and included between 15 and 107 participants.
The empirical studies are summarised briefly in Table 1 and in more detail in Supplementary Tables S1-S5, which include a detailed risk of bias and applicability assessment. Overall, the methodological quality of most studies as assessed by the QUADAS 2 tool was uncertain. Of the 11 validation studies, none scored ‘high quality’ on all 7 dimensions in the QUADAS2 tool. One (Briand et al 2018 [27]) scored ‘high quality’ in 6 of the 7 dimensions but across the other 8 studies there were many aspects of methodological quality that scored poorly or could not be assessed confidently from the information supplied in the paper. In the sections below, we have placed more emphasis on the studies scoring higher on the QUADAS2 tool and on those which were undertaken on participants with perfusion defects.
The remainder of our results section is divided into four sub-sections. First, we set the context for our review with a narrative summary of the use of exertional desaturation tests in general lung disease. Next, we describe a sparse literature (two studies) in which the results of exertional desaturation tests were correlated with clinical outcomes in Covid-19, followed by an equally sparse literature in which exertional desaturation tests were correlated with the type of pneumonia in HIV positive patients. Finally, we describe and critique a somewhat larger literature on the validation of selected tests for exertional desaturation in various chronic lung diseases.
Use of exertional desaturation tests in general lung disease
Tests of exertion in lung disease have mostly addressed the monitoring over time of chronic lung disease and have been oriented to measuring exercise capacity. A helpful narrative review by Lee et al (which drew on an earlier systematic review by European Respiratory Society and American Thoracic Society [28]) lists, for example, a 30-minute walk test, a 4-minute walk test, a stair climb power test (10 flights), a more moderate stair climb test (un-standardised but based on the patient’s own home stairs), 6-minute and 3-minute step tests, a 15-step test (step up and down on a 25 cm platform 15 times as fast as possible), Chester step test (an incremental protocol on a 20 cm platform with 2-minute phases commencing with 15 steps per minute and increasing by 2 per minute till terminated by dyspnoea or fatigue), and modified Chester step test (starting at 10 steps per minute) [29]. These authors also describe three different sit-to-stand tests: five repetition sit-to-stand (5STS: the patient stands up fully and sits down 5 times as quickly as they can); 1-minute sit to stand test (1MSTS: patient stands up fully and sits down as many times as they can in one minute) and the 30-second and 2-minute variants of this [29]. They also review tasks based on activities of daily living such as a semi-standardised “grocery shelving task” [29].
All the exercises described by Lee et al in the above review were designed mainly for longitudinal monitoring of the severity of chronic lung disease, and several have been shown to correlate with survival [29]. The tests combine an assessment of lung function with that of general physical fitness and muscle strength – a useful composite measure in patients with (for example) chronic obstructive pulmonary disease. They were not originally designed with assessment of acute breathlessness in mind, but as described below, some have subsequently been evaluated for that purpose.
A systematic review considered the validity of the 1MSTST in measuring exercise capacity in patients with chronic lung disease [30]. The main focus of that review was on a) whether the test correlated with severity of lung disease (broadly, it did), the test-retest reliability of the test (it was high), and whether the test score correlated with the gold standard 6-minute walk test (it did). They concluded that “The 1-min STS appears to be a practical, reliable, valid, and responsive alternative for measuring exercise capacity, particularly where space and time are limited” [30]. However, these authors did not look at the 1MSTST in the assessment of exertional desaturation.
The cardiopulmonary exercise test (CPET) has long been used to derive important variables that are known to be good predictors of prognosis in many cardiorespiratory conditions (including chronic obstructive pulmonary disease, interstitial lung disease, pulmonary arterial hypertension, congestive heart failure, cystic fibrosis and chronic thromboembolic pulmonary hypertension) [31]. Several studies have confirmed that peak VO2 is the preferred method for risk stratification and for the prognostic evaluation of patients with end-stage lung disease such as COPD and cystic fibrosis [31]. However, field tests (such as the 6MWT and 1MSTST) are more commonly used in clinical practice since they do not require specialist lung function facilities [32].
Fox et al explored the use of oximetry along with step climbing tests in the detailed assessment of pulmonary capacity, using area under the curve of a continuous oximetry reading [33]. Whilst these authors found that oximetry thus measured correlated with severity of disease and survival, such an approach is not relevant to the remote assessment of the acutely breathless or hypoxic patient. Similarly, a study of oximetry in the 6MWT showed strong correlation with disease severity and survival [34], but the test does not transfer to the current covid-19 situation.
Exertional tests for measuring desaturation in covid-19
Our search identified no validation studies of exertional tests for hypoxia in patients with Covid-19. We found two studies which sought to correlate the results of an exertional desaturation test with clinical outcomes in Covid-19.
In a small study of 26 COVID-19 patients assessed prior to discharge from hospital, Fuglebjurg et al used the 6MWT to assess the degree of exertional hypoxia; symptoms of subjective dyspnoea were noted [35]. 13 patients developed exercise-induced hypoxia (defined as SpO2 < 90%) during the 6MWT, of whom four had pulmonary embolism (a perfusion defect). COVID-19 patients experienced less hypoxia-related dyspnoea during the 6MWT compared with historical idiopathic pulmonary fibrosis controls (none of whom were documented as having pulmonary embolism). The authors concluded that the 6MWT is a potentially useful tool in the diagnosis of asymptomatic exercise-induced hypoxia in hospitalized COVID-19 patients prior to discharge. Whilst interesting, the study does not have direct relevance to the question of exertional desaturation tests in a less select Covid-19 population in the acute phase, nor does it tell us anything about the briefer tests currently in use in community settings.
Goodacre et al conducted a retrospective observational cohort study (a methodologically weak study design) across 70 emergency departments in the UK during the first wave of the COVID-19 pandemic [36]. 817 patients out of the 22000 who were assessed had an exertional test recorded on their record. Of these, 30 had an adverse outcome (defined narrowly as requiring organ support in intensive care) and 9 died. Whilst the positive 1.78 (1.25 to 2.53) and negative 0.67 (0.46 to 0.98) likelihood ratios of a 3% or more desaturation just achieved statistical significance, the authors concluded that exertional desaturation was not a significant predictor of adverse outcome when baseline clinical assessment was taken into account (p=0.37). The study specifically did not assess whether patients with exertional desaturation alone would otherwise have fulfilled criteria for hospital admission. It is possible that if adverse outcome had been more broadly defined (e.g. the need to be admitted to hospital, receive supplemental oxygen or in terms of subsequent healthcare usage), the test may have proved a useful predictor. It is noteworthy that only 3% of the cohort had an exertional test and were not randomly assigned. Additional information would have been gained if all patients with a particular baseline oximeter reading had been tested for exertional desaturation and followed up for adverse outcomes. In short, little can be concluded from this retrospective study of a highly selected sample.
Exertional desaturation as a predictor of acute lung disease
There are some important clinical parallels between pneumocystis pneumonia and the respiratory manifestations of acute Covid-19. Like those with acute Covid-19, patients with pneumocystis pneumonia may be hypoxic (and even cyanotic) on initial presentation, but less severe cases can be normoxaemic initially and become more hypoxic as the disease progresses [37, 38]. Pneumocystis pneumonia is known to cause the alveolar-arteriolar gradient to widen (i.e. a diffusion defect). Initially normoxaemia is still reached, though it takes longer than usual (but still within the pulmonary capillary transit time). But as the disease progresses, the blood does not have time to equilibrate with the alveolar oxygen levels by the time it leaves the alveoli, so hypoxaemia results. When the patient exercises, oxygen consumption by the tissues increases, causing desaturation at an earlier stage in the disease. Patients often do not feel short of breath as lung compliance and airways are normal. Hence, an exercise desaturation test may help to raise the clinical suspicion of pneumocystis pneumonia, assess its severity and inform a different treatment plan. Whilst Covid-19 can also cause a more typical pneumonia picture (often due to secondary infection), the diffusion defect pattern is commoner.
In two studies in people with HIV, the value of an exertional desaturation test to discriminate Pneumocystis pneumonia from other causes of acute pneumonia was tested. Here, desaturation during the exercise tests was correlated with subsequent results from a bronchoalveolar lavage (and in some cases biopsy) which confirmed or rejected its diagnosis. We describe these studies briefly below.
In a study we rated as high quality, Sauleda and colleagues assessed 45 HIV positive subjects with pneumonia who were admitted to the emergency department performed pedalling motions in the air for 2 minutes on the stretcher bed [37]. Oxygen saturations were monitored throughout the test. During exercise, the mean SaO2 fell in patients with pneumocystis pneumonia from 88% to 84% (p<0.01), whilst it improved slightly in patient with non- pneumocystis pneumonia from 91% to 93% (p<0.05).
In a similar study (which we rated as lower methodological quality) from 1988 of 39 patients with pneumocystis pneumonia (all HIV-positive young men), exertional desaturation was demonstrated in most of them (including many who had normal saturation at rest) using a 10-minute cycling test, whereas patients who presented with other acute lung conditions including bacterial pneumonia, tuberculosis and pulmonary candidiasis were significantly less likely to show exertional desaturation [38].
Validation of tests for exertional hypoxia in conditions other than Covid-19
We found 11 studies which described attempts to validate the use of exercise tests to assess exertional hypoxia in various chronic lung conditions (including chronic obstructive pulmonary disease, interstitial lung disease, advanced lung disease requiring a lung transplant and pulmonary hypertension). In these studies, measured physiological variables from various exercise tests (such as the 1MSTST, 5STST, 2MWT, IST) were compared to those measured with the accepted gold standard of 6MWT and/or CPET. Variables such as SpO2, heart rate and respiratory rate were measured during (and occasionally after) the various exercise tests. 5 studies looked at the 1MSTST and showed that it correlated well with the 6MWT and/or the CPET. We describe below the studies in this group which we scored as high quality.
In the study we rated as highest quality, Briand et al compared the nadir SpO2 measured by oximetry on the 6MWT and 1MSTST (performed on the same day) in a clinic population of 107 patients with chronic interstitial lung disease, [27]. There was high correlation between the two tests (r = 0.9; p < 0.0001). The authors also found that the correlation between the tests in terms of desaturation appeared to hold at lower levels of SpO2. No adverse events were described in this study. Table 2 shows the distribution of findings in Briand et al’s study.
Using the data in Table 2, and taking the 6MWT as the gold standard, the 1MSTST appears to have a sensitivity of 88%, a specificity of 81%, a positive predictive value of 79% and a negative predictive value of 89%. Because of the relatively small numbers, however, the confidence intervals around these values are wide [27].
Gephine et al (2020) compared the 1MSTST with the CPET in 14 people with severe COPD (a defect of ventilation) and 12 healthy participants [39]. In the COPD group, the fall in SpO2 from pre-exercise to peak exercise was similar in the COPD groups with both the 1MSTST (mean -5% SD 4%) and the CPET (mean -6%, SD 6%); differences were not statistically significant. In the healthy control group, there was very little fall in SpO2 with either the 1MSTST (-1%, SD 2%) or the CPET (-1%, SD 1%).
During the 1STST, a ≥4% SpO2 fall was seen in seven people with COPD, among which nadir SpO2 values were reached during the recovery period in five patients. For these patients, the lowest value of saturation was reached 33 ± 12 s after the end of the exercise. In comparison, 10 people with COPD exhibited an SpO2 fall of similar magnitude during the CPET. In five of them, the SpO2 values occurred during the recovery period, 51 ± 16 s after the end of exercise.
The authors concluded that (i) the 1STST elicited a similar peak physiological response to the CPET; (ii) people with COPD showed a nadir SpO2 during the recovery period of the 1STST, therefore highlighting the relevance of monitoring this crucial phase of exercise. This study did not, however, report a formal validation exercise of the kind shown in Table 2 (perhaps because numbers were small).
Gloecki et al found fairly high correlation (r = 0.81) between desaturation levels on the 6MWT and those on a shorter 2MWT in a small sample of 26 patients with COPD [40]. Oxygen saturation fell from a mean of 93.8% (95% confidence interval 92.8-94.7) to 83.2% (80.8-85.5) on the 2MWT compared with 93.3% (92.4-94.3) to 82.0% (79.8-84.3) on the 6MWT. Differences in nadir and percentage drop were not statistically significant between the two tests. The authors concluded that “the decline in oxygen saturation [is] very similar during the 2MWT and the 6MWT [and] that the short duration of a 2MWT is sufficient to induce a similar oxygen desaturation under room air conditions in patients with severe COPD as the 6MWT” (page 260) [40].
Rusanov et al [41] compared the 15SCT against the 6MWT test in 51 patients with pulmonary fibrosis (a defect of perfusion), along with a CPET. SpO2 fell from 95% (SD 3) to 86% (SD 7) in the 15SCT and from 94% (SD 3) to 86% (SD 8) in the 6MWT. The nadir of hypoxaemia was very similar on the CPET (88%, SD 6) and showed high correlation with the 15SCT (r = 0.85; p < 0.0001). The fall in SpO2 and nadir SpO2 was also highly correlated between 15SCT and 6MWT. The authors concluded that the desaturation measured by the 15SCT test is comparable to the desaturation measured by the CPET and the 6MWD test, making the 15SCT a reliable tool for monitoring disease progression in IPF and for evaluating the need for oxygen supplementation. Another paper by the same authors reports duplicate findings [42]. Again, however, this study only measured correlation, without validation against the 6MWT and the CPET.
Another study done in patients with a perfusion defect was by Vieira et al, who looked at the usefulness of the IST in 20 patients with pulmonary hypertension [43]. They found a high correlation between desaturation levels on the IST and CPET in patients walking on a treadmill: in the CPET, SpO2 fell from 96% (SD3) to 92% (SD 6); in the IST it fell from 96% (SD 3) to 89% (SD 8) – a difference that was not statistically significant but which may have been clinically significant. The authors concluded that the IST if a useful tool in the evaluation of patients with pre-capillary pulmonary hypertension.
In a further small study of 15 participants with pulmonary fibrosis (again, a defect of perfusion), Labrecque et al compared the 1MSTST (done twice to assess reproducibility), 6MWT and CPET [44]. The main aim of the study was to validate the 1MSTST not as a test of exertional desaturation but as a test of exertion which consistently produces a cardiorespiratory stress. 1MSTST, 6MWT and CPET all produced a similar fall in SpO2 (10% SD 5, 12% SD 4, and 8% SD 4 respectively). There was no significant difference between the nadir SpO2 reached (88% SD4, 85% SD 4, and 87% SD4 respectively), though these differences may have reflected clinically significant differences in desaturation. Perhaps the most important observation from the Labreque study for this review was that the 1MSTST (an intensive burst of exercise over one minute) was shown to be considerably more strenuous from a cardiorespiratory perspective than the 6MWT and the CPET (which was a longer but less intensive period of exercise lasting 8-9 minutes). As the authors noted, “Coping with such a surge in physiological demand during the 1STS [test] was demanding for people with ILD [interstitial lung disease]” (page 15).
We placed less weight on the final five studies – either because the sample was people with ventilation rather than perfusion defects and the methodological quality was uncertain, or because methodological quality was considered to be poor on our risk of bias and applicability tool, though the findings from all these additional studies were similar to the above ones. These studies were: Gruet et al (a prospective correlation study from France in 25 patients with cystic fibrosis, which concluded that the 1MSTST may be used as an alternative to the 6MWT and CPET for assessing exertional desaturation)[45]; Morita et al (a study in 23 patients with COPD which showed good correlation between various exercise desaturation tests) [46]; Kohlbrenner et al (a retrospective study from Switzerland in 38 lung transplant candidates which concluded that the 1MSTST is a safe alternative to 6MWT in such patients despite lower desaturation nadirs) [47]; Crook et al (a prospective study in 21 COPD patients which concluded that 1MSTST is a safe and accurate alternative to 6MWT); and Azzi et al (a retrospective study in 36 lung cancer patients whose main aim was to compare a 3-minute chair-to-rise test, 3MCTRT, with the CPET in terms of maximal exercise capacity but which also found high correlation with the level of oxygen desaturation achieved) [48].
The test that is most used in acute practice is the 40-step walk test – the patient is asked to walk 40 steps on the flat and oximetry repeated. We found no research studies on this at all.