Data processing
Figure 1 shows a summary flow diagram for the number of observations and patients at each stage of the processing procedure. In the raw telemonitoring dataset, there were 64,029 telemonitored BP observations from 905 patients, but this was reduced to 63,840 observations after applying the exclusion criteria and deleting presumed erroneous observations. Restricting to BP readings within one year of the index observation and patients with a least a full year of follow-up, the number of observations was reduced to 39,286 observations from 430 patients. After further restriction to those patients with a second Florence reading and another reading 6–12 months later, the number of patients reduced to 399.
In the raw database of comparator patients, there were 53,571 observations from 16,149 patients, and after applying the same exclusion criteria and restrictions as for the telemonitoring group, this number was reduced to 20,415 observations from 7,670 patients (see Fig. 1). After further restriction involving deleting all patients under 18 or older than 90 years, and excluding any patients not recording first and last BP more than six months apart, the number of patients reduced to 3,484.
End digit preference
A cross-tabulation of surgery measured systolic BP end digits against diastolic BP end digits is shown in Table S1 in the supplementary file. We observed a very strong double-zero preference in surgery-measured BP. The percentage of BP readings with double zeros was 11% (5,877/54,073) which is much higher than the percentage expected by chance of 1% and the percentage of 1.7% (761/44,150) we observed in telemonitored BP readings [10]. For systolic BP individually, Fig. 2 shows a markedly higher percentage of BP readings ending with a zero, with a similar pattern being observed for diastolic BP (see Figure S1 in supplementary file). There is also a suggestion of a preference for even end digits since all odd digits are below the even digits in both bar charts.
Standardisation with stratification
Table 2 shows patient characteristics for those patients in the telemonitoring and comparator groups who had at least two BPs 6–12 months apart, and with at least one year of follow-up. The follow-up duration was restricted to 12 months for all patients.
Table 2
Characteristics of patients in the telemonitoring and comparator groups, used for the stratification and matched analyses
| Telemonitoring (n = 399) | Comparator group (n = 3,484) |
Female sex | 182/399 (46%) | 1845/3484 (53%) |
Age | Mean 62.5 (SD 9.7) Median 64 (IQR 56 to 70) Range 29 to 89 | Mean 69.7 (SD 12.3) Median 71 (IQR 62 to 79) Range 20 to 90 |
SIMD 2012 decile | Mean 7.9 (SD 2.5) Median 9 (IQR 6 to 10) Range 2 to 10 | Mean 7.0 (SD 12.3) Median 7 (IQR 5 to 10) Range 1 to 10 |
Index systolic BP reading* | Mean 139.6 (SD 16.4) Median 138 (IQR 128 to 150) Range 100 to 188 | Mean 140.0 (SD 18.1) Median 138 (IQR 130 to 150) Range 71 to 240 |
*Index systolic BP values were unadjusted for white coat effect. |
Comparator patients were older on average, with a slightly higher percentage of females, and lower SIMD (i.e. more deprived). Index systolic BP readings were similar.
Table 3 shows the percentage of patients with raised systolic and diastolic BP at baseline and follow-up (final reading 6–12 months later) for the subgroup of patients with valid BP values at both baseline and follow-up.
Table 3
Percentage with raised SBP and DBP
| Telemonitoring | Comparator |
| Index reading | 6–12 months later | Percentage Relative Risk reduction | Index reading | 6–12 months later | Percentage Relative Risk reduction |
SBP 135+ | 190/399 (48%) | 94/399 (24%) | 51% | 2119/3484 (61%) | 1879/3484 (54%) | 11% |
SBP 140+ | 138/399 (35%) | 51/399 (13%) | 63% | 1658/3484 (48%) | 1414/3484 (41%) | 15% |
SBP 145+ | 92/399 (23%) | 37/399 (9%) | 60% | 1132/3484 (32%) | 854/3484 (25%) | 25% |
SBP 150+ | 62/399 (16%) | 20/399 (5%) | 68% | 894/3484 (26%) | 555/3484 (16%) | 38% |
DBP 85+ | 138/399 (35%) | 66/399 (17%) | 52% | 1080/3484 (31%) | 799/3484 (23%) | 26% |
DBP 90+ | 90/399 (23%) | 23/399 (6%) | 74% | 672/3484 (19%) | 411/3484 (12%) | 39% |
SBP: Systolic BP, DBP: Diastolic BP |
The observed improvements in BP control over time were larger in the telemonitoring group. For example, the percentage of patients with systolic BP of 145 mmHg or above was 14% lower at 6–12 months follow-up compared to baseline (relative risk reduction of 60% (95% CI 46 to 72)) for those in the telemonitoring group, compared to only 7% lower for comparator group patients (relative risk reduction of 25% (95% CI 19 to 29)). Therefore, the relative risk reduction in the telemonitoring group was more than double what it was in the comparator group (relative risk reduction ratio 2.43, 95% CI 1.77 to 3.27). Even after taking into account ‘white coat effect’ and comparing to those in the comparator arm with systolic BP of 150 + mmHg, the relative risk reduction was still greater in the telemonitoring arm (relative risk reduction ratio 1.58, 95% CI 1.17 to 2.00).
Table 4 shows descriptive statistics for the change in systolic BP (baseline – follow-up) for the telemonitoring group, with similar changes for the comparator group in brackets for comparison, stratified according to baseline variables. Note that no adjustment for ‘white coat effect’ has been made to the data in this table. Stratifying the results like this allowed us to see that the greatest differences in BP change between telemonitoring and comparator groups were for males, older patients (over 65 years), and those with relatively low systolic BP at baseline, although there may have been some confounding between each of these variables, which was investigated in further linear mixed effects analysis below. A similar table for diastolic BP differences is shown in the supplementary file (Table S2).
Table 4
Systolic BP differences in mmHg (baseline – final readings)
Stratification | N | Mean | SD | Median | IQR | Range |
None (Overall) | 399 [3484] | 6.5 [3.5] | 15.2 [19.5] | 6 [2] | -3 to 15 [-8 to 14] | -37 to 63 [-87 to 88] |
Age < 65 | 211 [1049] | 6.4 [4.5] | 14.6 [19.0] | 6 [4] | -3 to 16 [-8 to 15] | -28 to 55 [-65 to 88] |
Age 65+ | 188 [2435] | 6.7 [3.1] | 15.8 [19.7] | 6.5 [2] | -3.5 to 13.5 [-9 to 14] | -37 to 63 [-87 to 88] |
Male | 217 [1639] | 6.9 [2.7] | 15.2 [18.7] | 7 [2] | -3 to 15 [-9 to 14] | -37 to 63 [-87 to 88] |
Female | 182 [1845] | 6.1 [4.2] | 15.2 [20.1] | 5 [3] | -3 to 15 [-8 to 15] | -34 to 53 [-75 to 88] |
SIMD < 5 (more deprived) | 70 [811] | 7.8 [4.1] | 13.4 [19.2] | 6.5 [3] | 0 to 16 [-8 to 16] | -25 to 50 [-55 to 88] |
SIMD 5+ (more affluent) | 329 [2673] | 6.3 [3.3] | 15.5 [19.5] | 6 [2] | -3 to 15 [-9 to 14] | -37 to 63 [-87 to 88] |
Systolic BP < 135 | 209 [1365] | -1.2 [-7.9] | 11.8 [14.9] | 0 [-7] | -7 to 7 [-16 to 2] | -37 to 28 [-72 to 44] |
Systolic BP 135 or above | 190 [2119] | 15.1 [10.8] | 13.9 [18.5] | 13 [9] | 6 to 23 [0 to 21] | -17 to 63 [-87 to 88] |
Systolic BP 140 or above | 138 [1658] | 17.7 [13.6] | 14.0 [18.8] | 16.5 [12] | 9 to 25 [2 to 24] | -17 to 63 [-75 to 88] |
Systolic BP 145 or above | 92 [1132] | 20.9 [18.3] | 14.0 [18.9] | 21 [18] | 11 to 27.5 [7 to 29] | -17 to 63 [-75 to 88] |
Systolic BP 150 or above | 62 [894] | 23.8 [21.4] | 14.8 [18.8] | 22.5 [21] | 12 to 34 [10 to 32] | -10 to 63 [-75 to 88] |
Numbers are shown as Telemonitoring [Comparator] |
Full interaction linear mixed effects regression models were fitted to the systolic BP and diastolic BP difference outcomes, adjusting for SIMD (< 5 or 5+), female sex, initial systolic BP (< 135, 135–145, 145+), age (< 65, 65+), and their corresponding interactions with group (telemonitoring or comparator). GP practice was a random effect in the model. For the systolic BP outcome, only the interaction of group with systolic BP (145 + versus < 135) was statistically significant at the 5% level (mean difference − 5.1, 95% CI -9.1 to -1.1, p = 0.01), and age 65 + was significant at the 10% level (3.1, 95% CI -0.2 to 6.4, p = 0.07). Interestingly, there was no longer any difference in group effect between the sexes after adjusting for age, baseline systolic BP, and other covariates (0.3, 95% CI -2.9 to 3.6, p = 0.85). For diastolic BP, only the age interaction was significant at the 5% level (2.9, 95% CI 0.7 to 5.2, p = 0.01). Therefore, the effect of telemonitoring on BP control appears to be greater in older age groups and those with lower baseline systolic BP on average. However, this observation may have been impacted by our finding in previous research that people with higher BP were more likely to drop-out [6].
We then fitted the same linear mixed, but without interaction terms, and stratified according to variables that appeared to show a significant interaction with the treatment effect (age and systolic BP). The results for the group variable (telemonitoring – comparator) are shown in Table S3. Note that all of these results occurred after applying a -5 ‘white coat effect’ adjustment.
The improvement in BP control was significantly greater for telemonitoring patients overall compared to comparator patients (3.4, 95% CI 1.7 to 5.1, p < 0.001), but this was particularly true for the over 65 age group and for those with low systolic BP at baseline (< 135) (see Table S3). Telemonitoring appears to have a protective effect against increased systolic BP over time in those with already fairly low systolic BP at baseline.
Standardisation with matched cohort analysis
The mean difference in final systolic BP and diastolic BP (Comparator patients – Telemonitoring patients) were 5.96 (95% CI 3.55 to 8.36, p < 0.001) and − 0.10 (95% CI -1.81 to 1.60, p = 0.904), respectively.
Therefore, the final systolic BP was lower for telemonitoring patients compared to comparator patients in matched analysis after 6–12 months, even after reducing the systolic BP of comparator patients by a -5 ‘white coat effect’ adjustment.
We also performed detailed sensitivity analyses, adjusting the matching criteria, and also the amount we adjusted the surgery systolic BP readings (see Table 5).
Table 5
Sensitivity analyses for standardisation with matched analysis (Systolic BP)
| Matching criterion for Systolic BP | Adjustment to surgery Systolic BP readings* | N | Systolic BP |
Mean difference | 95% confidence interval | P-value |
1 | Nearest SBP with end digit 0 or 5 | 0 | 212 | 7.11 | 5.03 to 9.19 | < 0.001 |
2 | Nearest SBP with end digit 0 or 5 | -7 | 201 | 4.01 | 1.47 to 6.56 | 0.002 |
3 | Nearest SBP with end digit 0 or 5 | -10 | 211 | 1.83 | -0.55 to 4.21 | 0.131 |
4 | Exact SBP matching | 0 | 119 | 5.70 | 2.78 to 8.61 | < 0.001 |
5 | Exact SBP matching | -5 | 120 | 5.67 | 2.24 to 9.09 | 0.001 |
6 | Exact SBP matching | -7 | 128 | 2.25 | -0.67 to 5.17 | 0.130 |
7 | Exact SBP matching | -10 | 123 | 2.23 | -0.74 to 5.19 | 0.140 |
8 | Nearest SBP with end digit 0 | -5 | 208 | 3.90 | 1.39 to 6.42 | 0.003 |
9 | Nearest SBP with end digit 0 | -7 | 209 | 2.90 | 0.53 to 5.28 | 0.017 |
*Adjustment was applied to matching values as well as final values. |
The sensitivity analyses suggested that results were quite sensitive to our assumption about the effect of ‘white coat effect’, although we note that reduction of the surgery systolic BP readings had to be quite large to overturn the result of a significant systolic BP difference in favour of telemonitoring. If no ‘white coat effect’ adjustment was made to diastolic BP, the mean difference was 3.07 (95% CI 1.43 to 4.71), which was also statistically significant. The sensitivity analyses for diastolic BP are shown in Table S4 in the Supplementary file.
Random coefficients model analysis
The random coefficients model analysis had the advantage of using all the BP outcome data for patients as well as being able to take into account the time of measurements after each patient first started using telemonitoring (or first started recording readings after September 2015 in the comparator group). Table 6 shows the patient characteristics of this sample.
Table 6
Patient characteristics of all patients in the telemonitoring and comparator groups
| Telemonitoring (n = 882) | Comparator group (n = 7,806) |
Female sex | 413/882 (47%) | 4115/7806 (53%) |
Age | Mean 62.5 (SD 10.2) Median 64 (IQR 56 to 70) Range 22 to 89 | Mean 68.7 (SD 12.7) Median 70 (IQR 60 to 79) Range 19 to 90 |
SIMD 2012 decile | Mean 7.7 (SD 2.5) Median 8 (IQR 6 to 10) Range 2 to 10 | Mean 7.0 (SD 2.5) Median 7 (IQR 5 to 10) Range 1 to 10 |
Index systolic BP reading | Mean 134.4 (SD 16.4) Median 134 (IQR 124 to 144) Range 90 to 205 | Mean 140.1 (SD 18.2) Median 139 (IQR 129 to 150) Range 71 to 240 |
As Table 2 showed, comparator patients were older on average, with a slightly higher percentage of females, and lower SIMD. Interestingly, unlike in Table 2 which showed no clear difference, baseline systolic BP was higher among the comparator patients on average compared to the telemonitoring group.
Figure 3 shows the mean differences of systolic BP change per week (with 95% confidence intervals) for telemonitored BP in telemonitoring patients versus surgery measured BP in comparator patients in each practice, with a summary effect size computed using random effects meta-analysis.
Systolic BP change over time was significantly higher in the telemonitored group. The weekly improvement under telemonitoring was estimated to be -0.06 (95% CI -0.10 to -0.03) or -3.37 (95% CI -5.41 to -1.33) per year. The overall analysis across all sites, unadjusted for site, gave a very similar result of -0.06 (95% CI -0.08 to -0.04) or -3.19 (-4.16 to -2.23) per year, albeit more precise.
Note that by means of the group main effect term in the random coefficients model this analysis adjusts for ‘white coat effect’, provided that the magnitude of this potential bias remained constant over time, which is a plausible assumption.
The figures show high variation in results across practices with a few practices (especially small practices) showing large effects of telemonitoring.
Figure S2 in the supplementary file shows a similar plot for change in diastolic BP.
Additionally, Figures S3 and S4 show forest plots for the comparison of surgery measured BP between telemonitoring and comparator patients for systolic and diastolic BP respectively, but due to widespread entry of telemonitored readings into GP surgery systems these results should be interpreted with caution.
Comparison of analyses
In Table 7, we consider how well each of the analyses addresses the biases outlined in the Introduction section. All analyses were conducted using SAS software version 9.4 (SAS Institute Inc., Cary, NC, USA) except where indicated above.
Table 7
Assessment of how well the analyses control for key potential biases
Challenges | Analysis |
Standardisation with Stratification | Standardisation with Matching | Mixed effects analysis |
(1) Non-randomised design | Limited. Stratification helps to some extent to control confounding, but covariate adjustment in linear mixed effects models provides better adjustment. There still may be residual confounders. | Excellent control of confounding due to matching, but there still may be residual confounding by underlying variables not used to match on. | Good control via covariate adjustment in linear mixed effects models. Again, there still may be residual confounders. |
(2) White coat effect | Depends on the validity of the assumption for the difference due to ‘white coat effect’. | Depends on the validity of the assumption for the difference due to ‘white coat effect’. Results were found to be fairly sensitive to this assumption. | Fully adjusted by means of adjusting for group at baseline in a random coefficients model, although we make the reasonable assumption that the degree of ‘white coat effect’ did not change over time. |
(3) High variability in the frequency of readings | Partially. Standardisation meant that frequencies of readings were the same between groups, but subgroup selection to achieve this may have resulted in a biased subgroup. Covariate adjustment in linear mixed effects models may have only partially addressed this bias by controlling for differences between groups. | Partially. Standardization meant that frequencies of readings were the same between groups, but subgroup selection to achieve this may have resulted in a biased subgroup. However, matching may have partially addressed this bias by controlling for differences between groups. | Partially. Mixed effects models make a missing-at-random assumption for missing data. If this assumption holds true in estimating the change in BP over time, then the difference in frequency of readings would have had no effect on the estimated treatment effect because the change in BP would be correctly modelled in each group. If the reason for missing data (or different frequencies) was more or less informative in one of the groups compared to the other however (e.g. indicating low BP in comparator patients) then this could have biased the results. |
(4) Contamination of readings | Not an issue. Surgery BP readings measured in telemonitoring patients were excluded from this analysis. | Not an issue. Surgery BP readings measured in telemonitoring patients were excluded from this analysis. | The “group variable” in the model put surgery measured BP values from telemonitoring patients into a separate category and a separate telemonitoring effect was estimated for each category compared to comparator. It is assumed that the probability of this happening did not change over time, but this assumption is highly questionable Nevertheless, this potential bias would not have affected the comparison of telemonitored BP with surgery measured BP from comparator patients. |
(5) Regression to the mean | At least partially. Although we might expect this would be fully controlled due to comparison with the comparator group, it is conceivable that between-group differences in the inclusion probabilities for patients with greater propensity for stronger regression-to-the mean (e.g. those with intermittently high or unstable BP), might contribute to confounding bias. | At least partially. Will be controlled to some extent due to comparison with comparator group and matching, but there may be differences in the strength of regression-to-the-mean between treatment and comparator groups. | At least partially. Will be controlled to some extent due to comparison with comparator group, but there may be differences in the strength of regression-to-the-mean between treatment and comparator groups. |
(6) End digit preference | Partially addressed through analysis of change over time and then comparison between groups. But the degree of specific value/end digit preference might conceivably have changed over time, and therefore there may have been differential change between subgroups causing bias in the results. | If there was differential change in end digit preference or specific value preference over time in one group compared to the other, then it may have caused confounding bias. Moreover, patients may not have been matched correctly by systolic BP due to differential end digit bias between groups. Again, we are relying on the reliability of the assumption about the true BP in each group. | If there was differential change in end digit preference or specific value preference over time in one group compared to the other, then it may have caused confounding bias. However, adjustment for group at baseline in a random coefficient model should in theory have adjusted for differences in the strength of digit preference. |
(7) Withdrawal bias | In this analysis we used subgroup selection to select out everyone with at least two readings at baseline and follow-up. Patients who withdrew from the telemonitoring arm or those in the comparator arm who got their BP measured less frequently were more likely to be excluded from the analysis, and so this problem reduces to the problem of incomparable groups (issue (1)). | Again, this issue is equivalent to issue (1). There may be residual confounding due to underlying variables not used to match on. | Model assumes any missing data is “missing-at-random” conditional on covariates used in the adjustment. If the reasons for missing data or missing data mechanisms differed according to treatment group, and these were not taken into account in the statistical model, then this may have caused biased results. |