3.1 Study population
Between April 2019 and January 2021, a total of 300 patients with T2DM were enrolled from the North-West London GP network. Overall, 287 patients underwent the whole screening procedure, while 13 did not complete the screening and were excluded (Supplementary figure 1). The study population was diverse in terms of ethnic background, with the most represented ethnicities being White-Caucasian (102/287=32%), Arab (74/287=28%) and South-Asian (47/287=17%). The study population was also diverse in terms of severity of T2DM and anti-diabetic treatments. Demographic and clinical characteristics of the study population are shown in Table 1 and Table 2.
The overall prevalence of NAFLD, based on ultrasound, was 64% (186/287), while the prevalence of other liver diseases was 9% (28/287: 27 BAFLD and 1 with chronic hepatitis B). There were no cases of secondary NAFLD, i.e. secondary to medications. The overall prevalence of significant liver disease, as defined by LSM ≥ 8.1 kPa, was 17% (50/287) in the whole diabetic population and 26% (50/186) in the NAFLD sub-group. The prevalence of advanced fibrosis, as defined by LSM ≥12.1 kPa, was 10% (31/287) in the whole population and 16% (31/186) in the NAFLD sub-group. The prevalence of newly diagnosed cirrhosis (either histologically proven or clinically diagnosed) secondary to NAFLD was 3% (8/287) in the whole diabetic population and 5% (8/184) in the NAFLD subgroup (Supplementary figure 1). The number needed to treat/screen (NNT) in this population was 4.56 (3.38-7). Due to the COVID-19 related restrictions, only 11 patients underwent a liver biopsy among those with elevated LSM (as per SOC): all the biopsied cases had liver fibrosis stage ≥2 as per CRN scoring system.
When compared to those with NAFLD and normal LSM (n=136), patients with NAFLD and significant fibrosis (LSM ≥8.1 kPa) (n=50) presented higher body mass index (BMI) (36.8 vs 30.3 kg/m2, p=0.0001), larger hip (123 vs 110 cm, p=0.0001) and waist circumferences (120 vs 105 cm, p=0.0001). Those with significant liver disease also had higher alanine aminotransferase (ALT, 46 vs 30 IU/L, p=0.0001), aspartate aminotransferase (AST, 37 vs 26 IU/L, p=0.0001) and gamma-GGT values (GGT, 62 vs 27 IU/L, p=0.0001). Of note, 42% of the patients with 8.1 kPa ≤LSM≤ 12.1 kPa and 38% patients with LSM ≥12.1 kPa had normal LFTs at screening. In terms of metabolic control, patients with NAFLD and significant fibrosis showed higher median HbA1c (71 vs 59 mmol/mol, p=0.0001), fasting glucose (9.4 vs 6.7 mmol/l, p=0.001), insulin level (21 vs 12.4 µU/ml, p=0.001) and HOMA index (8.1 vs 3.3, p=0.001). There was no difference in terms of duration of diabetes, anti-diabetic medications or presence of diabetic complications (Table 2). In terms of socio-economic status, those with NAFLD and significant fibrosis lived in more deprived neighbourhoods according to their median education rank (18789 vs 23148, p=0.03) (Supplementary Table 9).
Overall, waist circumference (crude OR 1.086, 95%CI 1.021-1.154, p=0.008), BMI (crude OR 1.17, 95%CI 1.008-1.358, p=0.04), AST (crude OR 1.071, 95%CI 1.07-1.01, p=0.022) and education rank (crude OR 0.857, 95%CI 0.744-0.987) were independent predictors of significant liver disease in the whole diabetic population (Table 3).
3.2 Derivation of the BIMAST score
The whole study population was split into a derivation (n=194) and a validation (n=93) cohort, following a 2:1 random allocation. The derivation and the validation cohorts were similar in terms of clinical features (Supplementary table 10). Based on the clinical predictors of significant fibrosis on multivariate analysis, the BIMAST score was computed as:
0.17*(BMI, kg/m2) + 0.054*(AST, IU/L) – 8.771
The BIMAST score can be calculated online using the platform: https://callbuddy.eu/BIMAST/index.html.
Waist circumference and education rank were omitted a priori to increase the potential usability of the score. The Hosmer‐Lemeshow test and Brier score for the BIMAST score were 0.9 and 0.12, confirming that the derived model fitted well the derivation cohort and had good calibration. In the derivation cohort, the BIMAST score was able to predict the presence of significant fibrosis (LSM≥8.1 kPa, n=33) accurately, with an AUROC of 0.81 (95%CI: 0.72-0.9, p<0.0001) (Figure 2A). A cut-off of 0.063 gave 94% sensitivity and 44% specificity, with PPV 22% and NPV 97% for significant fibrosis. Moreover, the BIMAST score was able to predict the presence of advanced fibrosis (LSM≥12.1 kPa, n=17) accurately, with an AUROC of 0.84 (95%CI: 0.72-0.95, p<0.0001) (Figure 2B). A cut-off of 0.102 carried sensitivity 94%, specificity 50%, positive predictive value (PPV) 20% and negative predictive value (NPV) 99% for advanced fibrosis. The BIMAST performed similarly in the internal validation cohort (Supplementary material).
In the whole population, when compared to other screening strategies, the BIMAST score performed better. The AUROC curves for diagnosing significant fibrosis were 0.74 (95%CI: 0.66 - 0.83, p<0.0001) for US plus LFTs, 0.72 (95%CI: 0.65- 0.8, p=0<0.0001) for NAFLD fibrosis score and 0.62 (95%CI: 0.53 – 0.7, p=0.008) for FIB-4. The pairwise comparison of AUROC curves (DeLong method) demonstrated that the BIMAST score was better than US plus LFTs (p=0.01), NAFLD fibrosis score (p=0.009) and FIB-4 (p<0.0001) in diagnosing the presence of significant fibrosis in the community (Figure 3A). Similarly, the BIMAST score outperformed US plus abnormal LFTs (p=0.01), NAFLD fibrosis score (p<0.0001) and FIB-4 (p<0.0001) for diagnosing the presence of advanced fibrosis (DeLong method) (Figure 3B).
3.3 External validation of the BIMAST score
Both the Royal Free cohort (n=218) and the Sicilian cohort (n=168) presented higher LFTs and fibroscan values compared to the primary care cohort (Supplementary table 11 and supplementary figure 3).
In the Royal Free cohort, the Hosmer‐Lemeshow test and the Brier score for the BIMAST score were 0.67 and 0.31, suggesting that the BIMAST score had a moderate goodness-of-fit and calibration. Specifically, the BIMAST score predicted significant fibrosis (LSM≥ 8.1 kPa, n=105) with an AUROC of 0.7 (95%CI: 0.63-0.77, p<0.0001), while a cut-off of the BIMAST of 0.063 gave sensitivity 34%, specificity 91%, PPV 76% and NPV 40% (Supplementary Figure 4A). Furthermore, the BIMAST predicted advanced fibrosis (LSM≥ 12.1 kPa, n=66) with an AUROC of 0.72 (95%CI: 0.65-0.8, p<0.0001), while a cut-off of 0.102 gave sensitivity 43%, specificity 89%, PPV 62% and NPV 23%. vs 0.68 (95%CI: 0.6-0.76, p<0.0001) of FIB-4 (Supplementary Figure 4B). The pairwise comparison between AUROC curves (De Long method) confirmed that the BIMAST and the FIB-4 performed similarly in this cohort.
In the Sicilian cohort, the Hosmer‐Lemeshow test and the Brier score for the BIMAST score were 0.6 and 0.38, suggesting that the BIMAST score had a moderate goodness-of-fit and calibration. Specifically, the BIMAST score predicted LSM≥ 8.1 kPa (n=114) with an AUROC of 0.608 (95%CI: 0.5-0.71, p=0.037), while a cut-off of 0.063 gave sensitivity 27%, specificity 86%, PPV 83% and NPV 30% (Supplementary Figure 5A). Moreover, the BIMAST score predicted LSM≥ 12.1 kPa (n=65) with an AUROC of 0.602 (95%CI: 0.51-0.69, p=0.0001), while a cut-off of 0.102 gave sensitivity 20%, specificity 85%, PPV 48% and NPV 40%.vs 0.69 (95%CI: 0.609-0.77, p=0.0001) of FIB-4 (Supplementary Figure 5B). The pairwise comparison between AUROC curves (De Long method) revealed that FIB-4 performed better than the BIMAST score in this cohort.
3.4 Cost-effectiveness analysis
The cost-effectiveness analysis was based on the identification rates of the screening strategies from the study population (Supplementary figures 6-10). Of note, 19 (19/50=38%) patients with LSM≥8.1 kPa were missed by FIB-4 (false negatives), while only 5 (5/50=10%) were missed by BIMAST. Specifically, those who were misclassified by FIB-4 as low-risk were significantly younger (57 vs 62 years, p=0.03) and had lower AST (35 vs 41 IU/L, p=0.034) compared to those correctly classified as low-risk group.
Overall, screening for NAFLD by any of the strategies analysed improved the rate of diagnosis by 8-15%. All screening strategies were associated with QALY gains, ranging from 121-149 years, with fibroscan (148.73 years) resulting in the most substantial gains, followed by BIMAST score (141.01 years), FIB-4 (134.07 years) and NAFLD fibrosis score (121.25 years). The ICER of BIMAST score and fibroscan compared to SOC were £2,337.92 and £2,480 per QALY gained, respectively (Table 4).
The ICER was most sensitive to variations in progression rates (effect of early diagnosis on disease progression), screening test sensitivity and specificity, and model time horizon. Nevertheless, when transition probabilities, utilities, screening treatment effect, and cost inputs were modified, we found a >99% probability of NAFLD screening tests being cost-effective compared to SOC in all evaluated scenarios (Figure 4, Supplementary tables 6-10). When sensitivity and specificity of each screening test were varied in a range between 20% and 100%, the ICER remained cost-effective below £3,260 in all scenarios (Supplementary tables 4-6). Whereas all screening strategies were found to be cost-effective compared to SOC in the base-case, when the time horizon was decreased from 40 years (lifetime) to 5 years, only BIMAST and FIB-4 remained cost-effective within the NICE CET criteria.