Pilot Test Baseline: Three-Step Test-Interview (TSTI)
Twenty-five older hospitalised patients were approached for the TSTI of the baseline test and fourteen (56%) agreed to participate. Characteristics of the participants are displayed in the second column of Table 5.
Table 5. Participants Three-Steps Test-Interview (TSTI) baseline and follow-up
|
Baseline
(n=14)
|
Follow-up
(n=13)
|
Characteristic
|
n
|
n
|
Gender
Male
Female
|
12
2
|
8
5
|
Age (years)
70-79
80-89
90-99
|
9
4
1
|
9
4
0
|
Educational level*
Low
Middle
High
Unknown
|
8
2
3
1
|
3
5
5
0
|
Admission reason†
Pulmonary problems
Cardiac problems
Bowel problems
Kidney problems
Stroke
General malaise
|
5
4
2
2
0
1
|
0
8
3
0
1
1
|
* Educational level: Low= no education, primary school, basic vocational training; Middle = secondary education, vocational training; High = bachelor, master
† Reason according to the patient
Comprehension of the format varied widely between participants. Some participants needed only a short introduction and an example and were then able to place the cards independently in the boxes with only a little guidance from the interviewer, whereas others needed constant guiding and reminding of the aim. Some of these participants had the tendency to elaborate on how they were coping with the subject on the card without specifying whether it was a goal or not. This emphasized the necessity for adequate interviewer instructions and an instruction guide for the interviewers.
An example of guiding by the interviewer:
Washing and dressing. Is that a goal for you with this hospitalisation?
Yes, that you can still do that yourself.
Yes. And, uh.
I find that somewhat important.
Somewhat important. Because how is it going with washing and dressing now?
It is still fine, but with difficulty.
Yes. Satisfactory or good?
Let’s say satisfactory.
And is it your goal to preserve it that way or to improve it?
To preserve it that way, yes. (P1)
Example of a participant with the tendency to elaborate on how he was coping with the subject, without specifying whether it was a goal:
Well, the next one depicts a figure with a lot of pain, I can say that I have a lot of pain, but I have, I don't feel pain that easily. But when I move, I do feel pain now. And you have to indicate that here in points, I have always found that very difficult to interpret, but eh, I, I say, I, I, when I lay down it is not too bad and when I move, it hurts, I'm in the midst of it. So I have pain.
Yes. Yes. And is it a goal for you to reduce this?
Yes! Otherwise I will die. (laughing) (P7)
Adaptations
The following adaptations were made to the format:
Originally we had four answer options: ‘bad, mediocre, satisfactory, good’, but this appeared insufficient for some participants. We therefore added ‘very bad’ and ‘very good’.
Initially, we had one card for each goal and one answering sheet with a horizontal axis for status and a vertical axis for importance where the participant could place the card. Since this appeared complicated for participants, we made two different answer sheets: one for status, and one for importance. The participant had two cards per goal and placed one card on each sheet, as explained in the methods.
Since most interviews were conducted after a few days of hospitalisation, the question about status was sometimes problematic. As some participants indicated that complaints had already improved compared to the day they were admitted, they had the goal ‘improving’, but their status was ‘good’ or ‘very good’ at the moment of interview. We therefore decided to ask in these cases how the status was on the day of admission.
We changed the text on the card ‘visiting’. Originally we had the text ‘visiting family or friends’, but some participants started elaborating about family and friends that had passed away. To help them focus only on visiting, we changed the text into ‘visiting’.
A user guide was written for the interviewers.
Opinions of participants
A broad variety of evaluations by participants included: pleasant diversion; nice pictures; emotional to be confronted with own impairments; subjects were considered very relevant; it was helpful to express wishes and concerns; interesting; and somewhat tiring.
Content validity
The goals the participants mentioned in their own words were indicated in the tool as at least ‘somewhat important’ in all cases. For example:
If you had to say in your own words what you hope to accomplish with this hospitalisation, what would you say?
Ehm, first of all, the solution to the pain that I was talking about, which is what I actually come here for. And so, as far as the physiotherapy is concerned, also that eh, the moving around, the moving around. (…) Then I just mean, uh, walking. Not those short distances like in the house, that, like this here, that’s okay. Like this short distance, I could do that without a walker. But walking further, I mean that way. (…) Also in terms of breathing and so on. In regards to breathing, I will have to wait and see. I’m dreading it, but, uh, it's at least worth a try. (P11)
This participant indicated: Pain: very important, mediocre, improvement; Walking: very important, mediocre, improvement; Shortness of breath: very important, bad, improvement.
Field Test Baseline
Research assistants practised with Version 2 of the P-BAS-P in a new group of 62 consecutive hospitalised older patients. Sample characteristics are shown in the second column of Table 6. Through observation and feedback it was revealed that the instruction to ask for the status on the day of admission when the interview was conducted after a few days and the status of the participant had changed, was not clear for all research assistants. We therefore decided to change the instructions and always ask for the status on the day of admission.
Table 6. Sample characteristics Field test baseline and Evaluation reliability, validity, responsiveness, interpretability
|
Field test baseline (n=62)
|
Evaluation reliability, validity, responsiveness, interpretability (n=169)
|
Characteristic
|
n (%)
|
n (%)
|
Gender
Male
Female
|
37 (60)
25 (40)
|
96 (57)
73 (43)
|
Age (years), median (range)
|
74 (70 – 96)
|
75 (70 – 98)
|
Living situation
Independent
Sheltered accommodation
Senior home
Nursing home
|
61 (98)
0
1 (2)
0
|
162 (96)
4 (2)
2 (1)
1 (1)
|
Educational level*
Low
Middle
High
|
10 (16)
37 (60)
15 (24)
|
61 (36)
65 (38)
43 (25)
|
Specialty
Medical
Surgical
Intervention cardiology
Unknown
|
19 (31)
21 (34)
18 (29)
4 (6)
|
78 (46)
36 (21)
48 (28)
7 (4)
|
Admission type
Acute
Elective
Unknown
|
32 (52)
26 (45)
4 (6)
|
75 (44)
87 (52)
7 (4)
|
Admission time (days) median (range)
|
4 (1-28)
|
5 (1-35)
|
Number of days the interview took place after admission
1
2
3
4
|
1 (2)
28 (45)
24 (40)
8 (13)
|
10 (6)
64 (38)
69 (41)
26 (15)
|
* Educational level: Low= no education, primary school, prevocational education; Middle = secondary or vocational education; High = bachelor, master
Pilot Test Follow-up: TSTI
Eighteen participants were approached in the hospital, and 17 gave permission to be contacted after discharge. Afterward, two participants refused, one participant appeared too confused for informed consent, and for one participant the opportunity to make an appointment fell outside the timeframe; this resulted in thirteen (72%) participants for the TSTI of the follow-up. Characteristics of participants are displayed in the last column of Table 5.
As described in the Methods section, two formats were tested. Format one: the participant was asked what the status per item was at that moment. Format two started with ‘Because of the hospitalisation…’ and the rest of the sentence depended on the goal being ‘prevention’, ‘preservation’, or ‘improvement’.
Format one was well understood, and answers were given easily. The only confusion was caused by the question ‘How is your disease or condition now?’; as many participants have multiple, it was not clear for all which one was meant. We therefore made the addition: ‘as concerning the disease or condition you were admitted for’, which was well-understood.
Format two was more problematic. Participants often had to read the sentence several times before it was understood. Some other problems:
Originally the follow-up question in the case of preservation was formulated as: ‘Because of the hospitalisation, I still…’ This caused confusion when the participant already had a low level of baseline function. We therefore changed the sentence into ‘Because of the hospitalisation I maintained…’. Although this gave sometimes complicated sentences, it was better understood.
The question ‘Because of the hospitalisation I remained alive’ was often considered complicated. For example:
Yes, so I don’t know, if I had surgery or not, if I would not have had surgery, whether I then, whether that was life-threatening. I don’t know. (P8)
The use of the answer option ‘somewhat’ was interesting, since it was used with many intentions. Often it meant ‘I don’t know’, or ‘actually it is going really badly, but I stay hopeful’, or ‘I don’t want to be too negative’. Only in a minority of the cases was it used to indicate ‘I accomplished my goal to some extent’.
The main concern with format two was that it was unclear which timeframes the participant had to compare. Sometimes participants were evaluating the period in hospital, sometimes shortly after discharge. But even when participants evaluated their current situation, it was not clear with which period they had to compare, this could be long before hospitalisation, shortly before hospitalisation, or during hospitalisation. Changing the words ‘because of’ into ‘thanks to, did not help.
Example:
Thanks to hospitalisation I have more energy. Well, no more energy. When I compare that with the situation before that time. Well, well, not more. Well, somewhat maybe. Yes… It depends on what I'm comparing it to, by the way. If it was right before my hospital admission, then of course it is eh… then it is completely. But if I go a little further back in time, it is not even that big of a difference. So I find it a little bit difficult to answer, this question. Let me answer it as quite. But more because I don't know at what point in time I have to compare it to. (P13)
Because of the difficulties with format two, we decided to continue only with format one.
Evaluating Reliability, Validity, Responsiveness, Interpretability
Version 3 of the P-BAS-P consisted of the baseline version with changed instructions and follow-up version format one.
Sample
From the 699 eligible patients, 363 were approached for informed consent and 179 gave it. After exclusion of ten cases, we had 169 baseline cases. We lost 29 to follow-up and an additional four the P-BAS-P were not administered at follow-up, which resulted in 136 follow-up cases. Full details are shown in Figure 3. Most (41%) baseline interviews were done on the third day after admission.
Sample characteristics are shown in the last column of Table 6, and Additional File 3 shows the scores of the other questionnaires applied for evaluating construct validity.
Descriptive statistics P-BAS-P
In two cases the pictures were not used. One case was because there was no table available, and the other because the participant had to lie down completely flat. In both of these cases, the interviewer asked them all questions without using the cards. The time to conduct the P-BAS-P varied from six to 21 minutes, with a mean of eleven minutes.
Table 7 shows the baseline descriptive statistics of the P-BAS-P. The number of goals selected as a minimum of ‘somewhat important’ varied from 3 to 21 per person, with a median of 12 and a mean of 11.7. Twenty-eight participants mentioned an extra goal. Examples were: going on holiday, resuming (volunteer) work, or shopping. The missing values at baseline are all due to the interviewer accidentally forgetting to indicate the option on the answer sheet after the interview, except for one participant who did not know to indicate how important the goal ‘enjoying life’ was to him.
Table 8 shows the follow-up descriptive statistics and change scores of the P-BAS-P. Since the end of March 2020, the Corona-pandemic influenced answers on the items groceries, sports, hobbies, outings, visiting, and ‘extra’. In cases where participants could not answer this question or indicated the answer was influenced by the Corona-measures, the answer was replaced by ‘missing due to Corona’. A closer look at the other missing values revealed some patterns: In four cases the participant did not know the answer, for example the bowel movements were too irregular, or the participant was still under treatment and did not know to indicate what the disease status was. There were some seasonal problems: at the follow-up moment, it was not the right season for gardening (1x) or the hobby ‘fishing’ (3x). More difficult to interpret were participants stating they did not do an activity, for example: ‘my husband does the groceries’ (1x), ‘I don’t work in the garden’ (3x), ‘I don’t sport’ (4x), ‘we don’t go on outings’ (2x), ‘I have no hobbies anymore’ (1x), ‘I am not allowed to drive’ (1x). It is unsure whether this meant the participant did not reach the goal, so the answer should be ‘very bad’, since the participant mentioned, for example, it was important to garden, and now he does not garden, or whether there was another reason not to garden. In three cases it is doubtful whether the goal selected on baseline was appropriate since the participant stated at follow-up ‘I have never sported/made outings/visited’. One extra goal had a missing value since the goal was ‘becoming 100’, while the participant was 73 years on baseline.
Table 7. P-BAS-P Baseline descriptive statistics n=169
|
Importance
|
Prevention/
preservation/
improvement
|
Status
|
|
Not at all/ not applicable
n (%)
|
Some-what
n (%)
|
Quite
n (%)
|
Very
n (%)
|
Mis-sing
|
Improve-ment
n (%)
|
Mis-sing
|
Very bad
|
Bad
|
Mediocre
|
Satisfac-tory
|
Good
|
Very good
|
Mis-sing
|
Better
|
9 (5)
|
4 (2)
|
39 (23)
|
117 (69)
|
0
|
148 (88)
|
0
|
15 (9)
|
49 (31)
|
58 (37)
|
17 (11)
|
18 (11)
|
2 (1)
|
1
|
Energy
|
29 (17)
|
11 (7)
|
52 (31)
|
77 (46)
|
0
|
121 (86)
|
0
|
19 (14)
|
44 (31)
|
47 (34)
|
18 (13)
|
8 (6)
|
4 (3)
|
0
|
Pain
|
100 (59)
|
7 (4)
|
19 (11)
|
43 (35)
|
0
|
59 (86)
|
0
|
12 (17)
|
26 (38)
|
20 (29)
|
2 (3)
|
6 (9)
|
3 (4)
|
0
|
Bowel movements
|
141 (83)
|
5 (3)
|
11 (7)
|
12 (7)
|
0
|
19 (68)
|
0
|
1 (4)
|
11 (39)
|
5 (18)
|
1 (4)
|
8 (5)
|
2 (1)
|
0
|
Shortness
of breath
|
71 (42)
|
3 (2)
|
27 (16)
|
68 (40)
|
0
|
91 (93)
|
0
|
12 (13)
|
42 (43)
|
30 (31)
|
4 (4)
|
9 (9)
|
1 (1)
|
0
|
Walking
|
60 (36)
|
5 (3)
|
37 (22)
|
66 (39)
|
1
|
93 (85)
|
0
|
14 (13)
|
45 (42)
|
28 (26)
|
9 (8)
|
8 (7)
|
4 (4)
|
1
|
Appetite
|
126 (75)
|
5 (3)
|
24 (14)
|
14 (8)
|
0
|
27 (63)
|
0
|
3 (7)
|
9 (21)
|
14 (33)
|
4 (9)
|
12 (28)
|
1 (2)
|
0
|
Knowing what is wrong
|
111 (66)
|
2 (1)
|
14 (8)
|
41 (24)
|
1
|
n.a.
|
n.a.
|
8 (14)
|
11 (19)
|
18 (32)
|
8 (14)
|
9 (16)
|
3 (5)
|
1
|
Curing
|
10 (6)
|
8 (5)
|
39 (23)
|
111 (66)
|
1
|
123 (79)
|
4
|
15 (10)
|
46 (29)
|
69 (44)
|
15 (10)
|
10 (6)
|
3 (2)
|
1
|
Alive
|
21 (12)
|
12 (7)
|
21 (12)
|
115 (68)
|
0
|
n.a.
|
n.a.
|
n.a.
|
n.a.
|
n.a.
|
n.a.
|
n.a.
|
n.a.
|
n.a.
|
Enjoy
|
52 (31)
|
11 (7)
|
41 (24)
|
64 (38)
|
1*
|
76 (65)
|
0
|
6 (5)
|
20 (17)
|
33 (28)
|
23 (20)
|
24 (21)
|
10 (9)
|
1
|
Groceries
|
81 (48)
|
11 (7)
|
44 (26)
|
33 (20)
|
0
|
53 (60)
|
0
|
11 (13)
|
20 (23)
|
25 (28)
|
4 (5)
|
23 (26)
|
5 (6)
|
0
|
Wash and dress
|
108 (64)
|
2 (1)
|
28 (17)
|
31 (18)
|
0
|
27 (44)
|
0
|
4 (7)
|
11 (18)
|
11 (18)
|
7 (12)
|
21 (34)
|
7 (12)
|
0
|
Garden
|
109 (65)
|
11 (7)
|
23 (14)
|
26 (15)
|
0
|
46 (77)
|
0
|
3 (5)
|
21 (35)
|
13 (22)
|
8 (13)
|
14 (23)
|
1 (2)
|
0
|
Sports
|
98 (58)
|
13 (8)
|
29 (17)
|
29 (17)
|
0
|
54 (76)
|
0
|
14 (20)
|
21 (30)
|
17 (24)
|
5 (7)
|
9 (13)
|
5 (7)
|
0
|
Hobbies
|
91 (54)
|
11 (7)
|
23 (14)
|
44 (26)
|
0
|
45 (58)
|
0
|
7 (9)
|
21 (16)
|
27 (35)
|
8 (10)
|
19 (25)
|
4 (5)
|
1
|
Driving
|
89 (53)
|
2 (1)
|
29 (17)
|
49 (29)
|
0
|
25 (31)
|
0
|
9 (11)
|
7 (9)
|
4 (5)
|
11 (14)
|
41 (51)
|
8 (10)
|
0
|
Outings
|
87 (51)
|
16 (9)
|
30 (18)
|
36 (21)
|
0
|
57 (70)
|
0
|
9 (11)
|
26 (32)
|
19 (23)
|
4 (5)
|
20 (24)
|
4 (5)
|
0
|
Visiting
|
92 (54)
|
12 (7)
|
34 (20)
|
31 (18)
|
0
|
37 (48)
|
0
|
10 (13)
|
11 (14)
|
13 (17)
|
10 (13)
|
29 (38)
|
4 (5)
|
0
|
Home
|
63 (37)
|
1 (1)
|
8 (5)
|
96 (57)
|
1
|
11 (10)
|
0
|
4 (4)
|
3 (3)
|
5 (5)
|
6 (6)
|
63 (59)
|
25 (24)
|
0
|
Indepen-dence
|
52 (31)
|
2 (1)
|
25 (15)
|
89 (53)
|
1
|
43 (37)
|
0
|
10 (9)
|
18 (15)
|
15 (13)
|
14 (12)
|
45 (39)
|
15 (13)
|
0
|
Extra
|
141 (83)
|
1 (1)
|
9 (5)
|
18 (11)
|
0
|
22 (79)
|
0
|
2 (7)
|
10 (36)
|
5 (18)
|
4 (14)
|
3 (11)
|
4 (14)
|
0
|
* Missing values at baseline are all due to the interviewer accidentally forgetting to indicate the option on the answer sheet after the interview, except for ‘enjoying life’, which was unknown by the participant.
Table 8. P-BAS-P Follow-up and change n=136
|
n
|
Very bad
|
Bad
|
Mediocre
|
Satis- factory
|
Good
|
Very good
|
Mis-sing
|
Missing Corona
|
Change
(F-B)
|
Score
|
Better
|
130
|
0
|
3 (2)
|
31 (24)
|
30 (23)
|
58 (43)
|
8 (6)
|
0
|
|
Mean: 1.35
SD: 1.54
Range: -2 – 5
|
Mean: 0.44
SD: 1.48
Range: -3 - 4
|
Energy
|
116
|
1(1)
|
7 (6)
|
40 (35)
|
25 (22)
|
40 (35)
|
3 (3)
|
0
|
|
Mean: 1.15
SD: 1.44
Range: -2 – 4
|
Mean: 0.28
SD: 1.36
Range: -3 - 3
|
Pain
|
57
|
1 (2)
|
0
|
15 (27)
|
8 (14)
|
24 (45)
|
7 (13)
|
1
|
|
Mean: 1.76
SD: 1.65
Range: -2 – 5
|
Mean: 0.93
SD: 1.49
Range: -3 - 4
|
Bowel movements
|
28
|
1 (4)
|
1 (4)
|
2 (7)
|
6 (22)
|
14 (52)
|
3 (11)
|
1
|
|
Mean: 0.90
SD: 1.67
Range: -1 – 4
|
Mean: 0.24
SD: 1.48
Range: -2 - 3
|
Shortness
of breath
|
78
|
0
|
4 (5)
|
24 (31)
|
13 (17)
|
32 (41)
|
5 (6)
|
0
|
|
Mean: 1.52
SD: 1.46
Range: -2 – 4
|
Mean: 0.59
SD: 1.42
Range: -3 - 3
|
Walking
|
89
|
2 (2)
|
5 (6)
|
27 (31)
|
21 (24)
|
30 (34)
|
3 (3)
|
1
|
|
Mean: 1.19
SD: 1.57
Range: -3 – 4
|
Mean: 0.33
SD: 1.49
Range: -4 - 3
|
Appetite
|
40
|
1 (3)
|
2 (5)
|
5 (13)
|
5 (13)
|
21 (53)
|
6 (15)
|
0
|
|
Mean: 1.00
SD: 1.41
Range: -1 - 4
|
Mean: 0.38
SD: 1.23
Range: -2 - 3
|
Knowing what is wrong
|
58
|
1 (2)
|
4 (7)
|
2 (4)
|
12 (21)
|
26 (46)
|
12 (21)
|
1
|
|
Mean: 1.32
SD: 1.74
Range: -2 – 5
|
Mean: 0.32
SD: 1.74
Range: -3 - 4
|
Curing
|
132
|
0
|
6 (5)
|
26 (20)
|
25 (19)
|
53 (41)
|
19 (15)
|
3
|
|
Mean: 1.63
SD: 1.46
Range: -2 – 5
|
Mean: 0.84
SD: 1.49
Range: -3 - 5
|
Alive
|
117
|
n.a.
|
n.a.
|
n.a.
|
n.a.
|
n.a.
|
n.a.
|
n.a.
|
n.a.
|
n.a.
|
Mean: 0
SD: 0
Range: 0
|
Enjoy
|
105
|
1 (1)
|
1 (1)
|
15 (14)
|
19 (18)
|
47 (45)
|
21 (20)
|
1
|
|
Mean: 1.05
SD: 1.68
Range: -4 - 5
|
Mean: 0.39
SD: 1.54
Range: -4 - 4
|
Groceries
|
74
|
4 (6)
|
6 (9)
|
9 (13)
|
8 (11)
|
38 (54)
|
6 (9)
|
1
|
3
|
Mean: 0.89
SD: 1.67
Range: -4 - 5
|
Mean: 0.29
SD: 1.50
Range: -5 - 4
|
Wash and dress
|
55
|
0
|
4 (7)
|
5 (9)
|
5 (9)
|
32 (58)
|
9 (16)
|
0
|
|
Mean: 0.85
SD: 1.46
Range: -2 - 5
|
Mean: 0.36
SD: 1.34
Range: -2 - 5
|
Garden
|
54
|
3 (6)
|
7 (14)
|
8 (16)
|
10 (7)
|
17 (34)
|
5 (10)
|
4
|
|
Mean: 0.85
SD: 1.67
Range: -3 – 5
|
Mean: 0.06
SD: 1.51
Range: -3 - 4
|
Sports
|
62
|
3 (7)
|
8 (17)
|
8 (17)
|
8 (17)
|
14 (30)
|
5 (11)
|
4
|
12
|
Mean: 0.78
SD: 1.93
Range: -4 – 4
|
Mean: 0.05
SD: 1.73
Range: -4 - 3
|
Hobbies
|
72
|
2 (3)
|
2 (3)
|
11 (18)
|
8 (13)
|
30 (48)
|
9 (15)
|
5
|
5
|
Mean: 0.64
SD: 1.68
Range: -2 – 5
|
Mean: 0.45
SD: 1.55
Range: -3 - 4
|
Driving
|
74
|
5 (7)
|
4 (6)
|
3 (4)
|
3 (4)
|
42 (58)
|
16 (22)
|
1
|
|
Mean: 0.51
SD: 1.69
Range: -3 - 5
|
Mean: 0.18
SD: 1.49
Range: -3 - 4
|
Outings
|
71
|
0
|
2 (4)
|
13 (29)
|
6 (13)
|
19 (42)
|
5 (11)
|
2
|
24
|
Mean: 1.05
SD: 1.90
Range: -3 – 4
|
Mean: 0.33
SD: 1.68
Range: -3 - 3
|
Visiting
|
69
|
1 (2)
|
1 (2)
|
4 (8)
|
8 (17)
|
31 (65)
|
3 (6)
|
1
|
20
|
Mean: 0.98
SD: 1.66
Range: -3 – 4
|
Mean: 0.49
SD: 1.38
Range: -3 - 3
|
Home
|
96
|
1 (1)
|
0
|
1 (1)
|
7 (7)
|
68 (71)
|
19 (20)
|
0
|
|
Mean: 0.18
SD: 1.17
Range: -4 - 4
|
Mean: 0.11
SD: 1.12
Range: -4 - 4
|
Indepen-dence
|
106
|
0
|
2 (2)
|
8 (8)
|
14 (13)
|
65 (61)
|
17 (16)
|
0
|
|
Mean: 0.79
SD: 2.45
Range: -2 – 5
|
Mean: 0.45
SD: 1.39
Range: -3 - 4
|
Extra
|
20
|
2 (14)
|
1 (7)
|
1 (7)
|
3 (21)
|
7 (50)
|
0
|
1
|
5
|
Mean: 0.50
SD: 1.65
Range: -2 - 3
|
Mean: -0.29
SD: 1.74
Range: -2 - 2
|
SD= standard deviation
Reliability
Baseline Questions. For the test-retest reliability, 62 participants were approached. In twelve cases, the participant refused the retest, resulting in 50 participants performing a baseline test-retest reliability. In 45 cases the retest was performed by another interviewer and in five cases by the same interviewer.
Of the 21 specified goals from which participants could select, the number of discrepancies between test and retest per participant ranged from zero to a maximum of eleven (52% of the number of goals) with a median of 5 (24%). Of the total of 242 discrepancies, the goal was selected only during the test 87 (36%) times, and in 155 (64%) cases only during the retest.
Item level agreement is included in Additional File 4.
Forty-one retest participants had a follow-up. The PBI1 test of the participants who had a baseline retest ranged from -1.12 to 2.60, with a mean of 0.55 and standard deviation (SD) of 0.83. The PBI1 of the retest ranged from -1.05 to 2.45, with a mean of 0.46 and of SD 0.82. The ICC between PBI1 of test and retest was 0.76 (95% CI 0.59;0.86).
The PBI2 test of the participants who had a baseline retest ranged from -1.13 to 2.62, with a mean of 0.56 and SD 0.84. The PBI2 of the retest ranged from -1.00 to 2.45, with a mean of 0.49 and an SD of 0.84. The ICC between PBI2 of test and retest was 0.73 (95% CI 0.54;0.85).
Follow-up Questionnaire. For the follow-up test-retest reliability, 90 participants were approached. In seventeen cases the participant refused the retest, six times the participant could not be reached, one participant was sick at the moment of retest and for six it was unknown why the retest was not performed. Finally, 60 participants performed a test-retest of the follow-up questionnaire. Nine participants indicated their situation had changed between test and retest and were removed from analysis, resulting in 51 retests. Median time between test and retest was seven days. In 36 cases the retest was performed by another interviewer and in fifteen cases by the same interviewer.
The agreement on item level is included in Additional File 5.
For the calculation of the PBI, we excluded one case, because only one out of eighteen answers of the retest was saved in the computer system. The PBI1 test of the participants who had a follow-up retest ranged from -1.04 to 2.87, with a mean of 0.26 and an SD of 0.70. The PBI1 of the retest follow-up ranged from -1.26 to 2.59, with a mean of 0.27 and an SD of 0.72. The ICC between the PBI1 of the test and retest follow-up was 0.86 (95% CI 0.77;0.92).
The PBI2 of the participants with a follow-up retest ranged from -1.00 to 2.91, with a mean of 0.27 and an SD of 0.69. The PBI2 of the retest follow-up ranged from -1.25 to 2.58, with a mean of 0.27 and an SD of 0.71. The ICC between the PBI2 of the test and retest follow-up was 0.85 (95% CI 0.76;0.92).
Validity
Importance of Goals. All hypotheses, except for hypothesis 11, were confirmed. Table 3 shows the test statistics and complete descriptive information is shown in Additional file 6. The hypothesis for ‘bowel movement’ could not be calculated because of too low cell frequencies.
The 50 cases selected for the open question mentioned 98 goals in total. Of these, 13 goals could not be coded as an item in the P-BAS-P because they did not exist in the P-BAS-P, and were therefore coded as ‘other’. We consequently analysed the agreement between the codes and the answers given in the P-BAS-P of 85 goals and found an agreement of 89%. An overview of the number of items coded and the amount of agreement is given in Table 9.
The number of confirmed hypotheses regarding importance of goals exceeded the threshold for validity.
Table 9. Coding of open questions and agreement with P-BAS-P in descending order of frequency
Code
|
Frequency coded
|
Agreement n (%)
|
No agreement n (%)
|
Curing
|
18
|
17 (94)
|
1 (6)
|
Other
|
13
|
n.a.
|
n.a.
|
Alive
|
8
|
8 (100)
|
0
|
Walking
|
7
|
7 (100)
|
0
|
Energy
|
7
|
7 (100)
|
0
|
Hobbies
|
7
|
7 (100)
|
0
|
Sports
|
6
|
5 (83)
|
1 (17)
|
Outings
|
6
|
4 (67)
|
2 (33)
|
Pain
|
5
|
3 (60)
|
2 (40)
|
Shortness of breath
|
4
|
4 (100)
|
0
|
Home
|
4
|
3 (75)
|
1 (25)
|
Independence
|
3
|
1 (33)
|
2 (67)
|
Knowing what is wrong
|
3
|
3 (100)
|
0
|
Groceries
|
2
|
2 (100)
|
0
|
Enjoy
|
2
|
2 (100)
|
0
|
Better
|
1
|
1 (100)
|
0
|
Garden
|
1
|
1 (100)
|
0
|
Visiting
|
1
|
1 (100)
|
0
|
Driving
|
0
|
n.a.
|
n.a.
|
Bowel movements
|
0
|
n.a.
|
n.a.
|
Appetite
|
0
|
n.a.
|
n.a.
|
Wash and dress
|
0
|
n.a.
|
n.a.
|
Total
|
98
|
76 (89)
|
9 (11)
|
Status Baseline, Follow-up, and Change. As seen in Table 4, all correlations between baseline, follow-up status and the related constructs, are in the direction hypothesised, but from baseline correlations, only four were strong enough and from follow-up six were strong enough to confirm the hypotheses. Of the correlations between change scores, two were strong enough to confirm the hypotheses. For the item ‘sports’, the correlation was in the opposite direction than was hypothesised. As only the minimum of six confirmed hypotheses was reached for the follow-up status, this was the only moment where the status question was considered valid.
Responsiveness
Of the 50 cases selected at baseline for comparing open questions, 46 had a follow-up. This resulted in 61 dyads of coded open goals and P-BAS-P items with follow-ups. The correlation between the answers on the open question and the corresponding P-BAS-P score was 0.26 and therefore the hypothesis was rejected.
PBI1 ranged from -1.63 to 2.87, with a mean of 0.31 and an SD of 0.80 and PBI2 ranged from -1.94 to 2.91, with a mean of 0.32 and an SD of 0.81. For the anchor question ‘How much have you benefited from the admission?’ ten (7%) participants did not know what to answer. Of the valid responses, ten (8%) of the participants answered ‘not at all’, five (4%) ‘a little bit’, twenty (15%) ‘somewhat’, 45 (36%) much, and 46 (37%) very much. The Spearman’s correlation coefficient between PBI1 and the anchor question was 0.267, between PBI2 and the anchor question 0.272.
After coding the explanations participants gave to the anchor question, in ‘based on outcomes’, or ‘based on other grounds’, we found that 101 participants (83%) based their judgements on outcomes, for example ‘I have no longer chest pain’, and 21 (17%) on other grounds, for example ‘top nurses, they were very correct’. Seven participants gave no explanation. In a selection of participants basing their judgement on outcomes, PBI1 ranged from -1.63 to 2.60, with a mean of 0.38 and a standard deviation of (SD) 0.79 and PBI2 ranged from -1.94 to 2.62, with a mean of 0.38 and an SD of 0.80. The correlation between PBI1 and anchor question was 0.376 and the correlation between PBI2 and anchor question was 0.389.
Interpretability
The visual anchor-based minimal important change distribution method was based on the selection of participants basing their judgement on outcomes.
The upper half of figure 4 shows ROC curves of PBI1 with the ROC curve of 'no important benefit' on the left side, with an area under the curve (AUC) of 0.61. The optimal cut-off point for ‘no important benefit’ was set at a sensitivity value of 84% and a specificity of 46%, resulting in an MIC of -0.3 points on the PBI1. The right side shows the ROC curve of ‘important benefit’, with an AUC of 0.63. The optimal cut-off point for ‘important benefit’ was set at a sensitivity value of 36% and a specificity of 95%, resulting in an MIC of 0.9 points on the PBI1. This means PBI1 values between -0.3 and 0.9 points are considered as ‘borderline benefit’.
The lower half of figure 4 shows the ROC curves of PBI2 with the ROC curve of 'no important benefit' on the left side, and an AUC of 0.62. The optimal cut-off point for ‘no important benefit’ was set at a sensitivity value of 84% and a specificity of 54%, resulting in an MIC of -0.3 points on the PBI2. The right side shows the ROC curve of ‘important benefit’, with an AUC of 0.63. The optimal cut-off point for ‘important benefit’ was set at a sensitivity value of 66% and a specificity of 66%, resulting in an MIC of 0.2 points on the PBI2. This means the PBI2 values between -0.3 and 0.2 are considered as ‘borderline benefit’.
The SEM for PBI1 was 0.41, resulting in an SDC of 1.1. The SEM for PBI2 was 0.44, resulting in an SDC of 1.2.
The anchor-based MIC distribution is displayed in figure 5. As visualised in figure 5, the SDC is larger than the MIC, especially for PBI2. There is much overlap between the curves, leading to much misclassification.