Of the 135 respondents who started the online questionnaire, 98 participants (73%) completed the full questionnaire. All these respondents successfully watched the animation of the Back-UP system. Forty-six per cent of the respondents were GPs, 25% worked as physical therapists in primary care and 29% worked as clinicians at a rehabilitation centre. Fifty-two per cent of the respondents were male and the average age was 48 years (SD ± 12.2). Most respondents worked together with 2 to 5 clinicians in their practice (38%) and worked as clinicians for 21 years or more (48%). The majority of the respondent (69%) showed neutral faith towards care technology. All demographic characteristics are presented in Table 2.
Table 2
Responders’ demographics (n = 98).
Gender
|
Male
Female
|
52%
48%
|
Age in years
|
|
48.0
(SD ± 12.2)
|
Number of clinicians in practice
|
Only 1
2 to 5
6 to 10
11 to 20
21 or more
|
9%
38%
15%
7%
31%
|
Number of years in practice
|
Less than 1
1 to 5
6 to 10
11 to 20
21 or more
|
1%
10%
17%
24%
48%
|
Faith in care technology
(Cronbach’s alpha = 0.8)
|
Positive
Neutral
Negative
|
2.6 (SD ± 0.7)
25%
69%
6%
|
Measurement Model
The first step in determining the quality of our measurement model was to assess outer loadings (where items with an outer loading of > .7 were retained). All outer loadings exceeded this threshold, except for items of the perceived phreat to professional autonomy scale and the perceived service risks scale. Item 1 to 5 of the perceived threat to professional autonomy scale had an outer loading < .4. When taking a look at the items that assessed this factor, we decided to first remove item 6 (I find the Back-UP system advantageous for the medical profession as a whole), as this item could also be considered to be an indicator of Perceived Usefulness. Removal of the item resulted in item 1 obtaining an acceptable outer loading. Next, we removed item 5 of the scale (Using the Back-UP system will decrease my control over the allocation of scarce resources) as it had a negative outer loading. After this removal, items 1 and 2 of the scale had an outer loading > .7. The item with the lowest outer loading (which was < .4), item 4, was then removed. Removal of this item resulted in a three-item scale, whereby items 1 and 2 had an outer loading > .7 and item 3 had an outer loading of > .5. Since the deletion of item 3 did not result in an increase of the scale’s Average Variance Extracted (AVE) and composite reliability, we retained this item.
Then, we focused on the perceived service risks scale. Two items had an outer loading > .7, while two items had an outer loading of > .5. Hence, the effect of removing one of the latter two items on AVE and composite reliability drove the decision to retain the items or not. First, we removed the item with the lowest outer loading score (item 3). Removal of the item slightly improved the construct’s composite reliability and greatly improved its AVE. The resulting items still contained one item with an outer loading of .6 (item 1). This item was removed. Again, a small effect on composite reliability and a large effect on AVE was the result. Therefore, we retained only items 2 and 4 for this construct.
For the remaining items, we assessed cross-loadings (see Table 3). All items load higher on the scale they are supposed to measure than on any other scale. This ensures the discriminant validity of the measurement model.
Table 3
|
Latent variable
|
|
IU
|
PU
|
PSB
|
PSR
|
PTA
|
BEN
|
INT
|
COMP
|
IU1
|
.939
|
.662
|
.547
|
− .581
|
.222
|
.436
|
.401
|
.436
|
IU2
|
.875
|
.479
|
.401
|
− .420
|
.137
|
.375
|
.379
|
.368
|
IU3
|
.940
|
.673
|
.521
|
− .482
|
.116
|
.455
|
.406
|
.372
|
PU1
|
.598
|
.878
|
.630
|
− .561
|
.283
|
.421
|
.381
|
.555
|
PU2
|
.638
|
.890
|
.549
|
− .540
|
.264
|
.378
|
.377
|
.458
|
PU3
|
.534
|
.900
|
.678
|
− .616
|
.377
|
.434
|
.334
|
.422
|
PU4
|
.616
|
.913
|
.757
|
− .570
|
.344
|
.432
|
.378
|
.433
|
PSB1
|
.394
|
.571
|
.869
|
− .393
|
.180
|
.505
|
.399
|
.531
|
PSB2
|
.414
|
.634
|
.862
|
− .435
|
.204
|
.490
|
.363
|
.496
|
PSB3
|
.550
|
.707
|
.853
|
− .536
|
.272
|
.539
|
.470
|
.530
|
PSB4
|
.469
|
.602
|
.869
|
− .409
|
.160
|
.464
|
.398
|
.473
|
PSB5
|
.496
|
.661
|
.897
|
− .486
|
.226
|
.538
|
.504
|
.573
|
PSR2
|
− .598
|
− .576
|
− .534
|
.860
|
− .259
|
− .531
|
− .515
|
− .555
|
PSR4
|
− .247
|
− .454
|
− .306
|
.762
|
− .420
|
− .284
|
− .240
|
− .410
|
PTA1
|
− .165
|
− .342
|
− .194
|
.378
|
.960
|
− .099
|
− .073
|
− .281
|
PTA2
|
− .069
|
− .175
|
− .173
|
.310
|
.746
|
− .020
|
− .094
|
− .320
|
PTA3
|
.037
|
− .013
|
--.070
|
.171
|
.500
|
.113
|
.058
|
− .077
|
BEN1
|
.469
|
.477
|
.578
|
− .556
|
.192
|
.932
|
.663
|
.507
|
BEN2
|
.352
|
.351
|
.467
|
− .354
|
− .015
|
.876
|
.596
|
.445
|
INT1
|
.397
|
.378
|
.446
|
− .444
|
.070
|
.662
|
.959
|
.632
|
INT2
|
.430
|
.410
|
.501
|
− .481
|
.135
|
.683
|
.966
|
.600
|
COMP1
|
.392
|
.514
|
.571
|
− .548
|
.367
|
.468
|
.575
|
.931
|
COMP2
|
.407
|
.459
|
.548
|
− .573
|
.274
|
.519
|
.618
|
.936
|
Subsequently, we assessed the reliability of the different measurement scales by determining the composite reliability score, the AVE, and Cronbach’s alpha (Table 4). Thresholds for these scores are > .7 for composite reliability, > .5 for AVE, and > .7 for Cronbach’s alpha. All scores are acceptable to good, except for the Cronbach’s alpha value for perceived service risks, which is slightly below .7. However, since both AVE and the composite reliability score are acceptable for this construct, and since Cronbach’s alpha is a rather conservative reliability measurement for the case of a two-item construct, we will accept the reliability of this construct.
Table 4
|
Composite reliability
|
AVE
|
Cronbach’s alpha
|
Intention to Use
|
.942
|
.844
|
.908
|
Perceived Usefulness
|
.942
|
.802
|
.918
|
Perceived Service Benefits
|
.940
|
.757
|
.757
|
Perceived Service Risks
|
.795
|
.661
|
.661
|
Perceived Threat to professional Autonomy
|
.793
|
.576
|
.799
|
Benevolence
|
.900
|
.818
|
.781
|
Integrity
|
.962
|
.926
|
.921
|
Competence
|
.931
|
.872
|
.781
|
Next, we verified that there was no multicollinearity by determining the outer and inner Variance Inflation Factors (VIF) values. At this stage, we switched from reflective to formative model development. These values were all below the threshold of 5.00 (with a maximum value of 3.893 for outer VIF and a maximum value of 2.378 for inner VIF).
To end our assessment of the measurement model, we assessed the significance and relevance of the individual indicators, with respect to their latent variable. We used a bootstrapping procedure, using 5.000 samples to determine whether the contribution of each item towards its factor is significantly greater than 0. Eight outer weights were significant (p < 0.05) and therefore retained in the model. For the remaining 15 items, we looked at the outer loading. For all items, except perceived threat to autonomy items 2 and 3, the outer loading was > .5. For those two items, we looked at the significance of the outer loadings, but both were insignificant (p > .05). Therefore, they were removed from the model.
Appreciation of factors
A boxplot (Fig. 3) presents the median score, quartiles and complete range of the factors of the measurement model. The average scores of these factors were 3.51 (SD .81) for Intention to use, 3.47 (SD .70) for Perceived Usefulness, 3.43 (SD .72) for Perceived Service Benefits, 3.47 (SD .66) for Perceived Service Risks, 3.35 (SD .74) for Perceived Threat to Autonomy, 3.53 (SD .66) for Benevolence, 3.53 (SD 0.69) for Integrity and 3.42 (SD .64) for Competence.
Causal Model
We assessed the causal model via a bootstrapping procedure with 5.000 bootstraps. The results can be found in Fig. 4.
* p < .05
** p < .01
*** p < .001
Then we determined the effect size (f2) of the significant relations in the model (29). These scores are as follows:
-
Perceived Usefulness → Intention to Use: f2 = .926 (large effect size)
-
Perceived Service Benefits →Perceived Usefulness: f2 = .778 (large effect size)
-
Perceived Service Risks →Perceived usefulness: f2 = .159 (medium effect size)
-
Perceived Threat to Professional Autonomy →Perceived Service Risks: f2 = .063 (small effect size)
-
Benevolence →Perceived Service Risks: f2 = .115 (small effect size)
-
Competence →Perceived Service Risks: f2 = .142 (small effect size)
An overview of reasons to use and not to use a complex CDSS
In line with the results of the quantitative study, the reason that was most often mentioned by respondents (n = 21) to use a complex CDSS was to improve the care for their patients, especially the assessment (e.g., “Better streamlining of the right care”). Second (n = 19), participants pointed out a curiosity to test and use the CDSS and see for themselves what the value of the system is (e.g., “Curiosity, I would like to experience whether such a system can contribute to the treatment”). As the third most mentioned reason (n = 18), respondents expected an increase in efficiency as a result of the reduction of workload and time (e.g., “Workload reduction, as the CDSS could also be used by the practice nurse”). The use of a complex CDSS could help them to reorganize work. For instance, the supporting staff could ask the patient to complete the stratification questionnaire before a consult. As the fourth reason (n = 16), support during the decision making was mentioned (e.g., “The CDSS can support and sharpen me in my own diagnostic thinking”). Patient empowerment is the fifth most often mentioned reason (n = 14) to use a complex CDSS (e.g. “A nice way to see together with the patient whether the decision is wise” and “Clear policy information towards the patient. The patient has a clear picture of the possibilities and the patient can monitor himself.”). Next to these reasons, clinicians would use the tool to work consistently with evidence-based medicine (n = 8), and since they perceived the technology as friendly to use (n = 3).
As barriers to use the complex CDSS, respondents mentioned being worried about their own clinical practice and autonomy; they are reluctant to use a CDSS when it interferes too much with clinical practice (n = 18) (e.g., “When the CDSS becomes leading and the clinical view of the practitioner is subordinated“, “When my role as a care provider is undermined or becomes more complicated.”, and “I would like to keep my own clinical reasoning without a CDSS.”). Also, a large number of respondents do not want to use a CDSS when it comes at an increase in time and costs (n = 18) (e.g. “Because it will take extra time that would be deducted from the time I have for my patient.”, and “Using the CDSS will cost more time in the beginning and learning to use the CDSS will cost time as well”). The fear that the CDSS does not work correctly (n = 17) is also a reason not to use the CDSS (e.g., “Too complicated to use for clinician and patient”.) As the final reasons for not wanting to use the CDSS were a too generic approach (n = 15) (e.g., “Cookbook medicine” and “No eye for specific patient characteristics”), a lack of effectiveness and added value (n = 11) (e.g., “The quality of the CDSS appears to be insufficient and not convincing of added value for the doctor and patient.”), hampering personal contact with the patient (n = 8) (e.g., “Patients need attention and actual face-to-face contact”), privacy and data security concerns (n = 8) (e.g., “The privacy of the patient is not guaranteed”), a capitalizing on healthcare (n = 4), lack of trust (n = 3), and if the use of CDSS is imposed by external parties, such as healthcare insurance companies (n = 3).