This section presents the participant characteristics and results in relation to the three key aims: the feasibility of conducting a definitive CRT; the acceptability of RECALL; and the compliance and fidelity of the intervention delivery.
Participant characteristics
Table 3 provides details of the number and characteristics of the schools (clusters) and the individual participants recruited to the study compared to the recruitment targets.
Participants
|
Recruitment targets
|
Number recruited
|
Characteristics
|
Health professionals
|
n= 8
|
n= 8
|
Professional backgound:
SLT (n= 4) OT (n= 2) PT (n= 1) SEB (n= 1)
|
Schools (clusters)
|
n= 6
|
n= 6
|
Social disadvantage ranking (based on data from the NIMDM 2017 [29]):
Within lowest decile for their HSCT area (n= 3)
Within lowest quintile for their HSCT area (n= 3)
|
Children recruited for outcome measurement
|
n= 60
|
n= 60
|
Gender: girls (n= 26, 43%); boys (n= 34, 57%)
Age at baseline: 56 months to 67 months (mean = 61 months)
|
n= 30
(50% of sample)
|
n= 22 (37%)
|
1) children about whom teachers had concerns around listening and communication skills
|
n= 12
(20%)
|
n= 12 (20%)
|
2) children with diagnosed developmental or learning difficulties
|
n= 18
(30%)
|
n= 26 (43%)
|
3) typically developing children who did not have any identified listening and communication problems as recognised by the teachers
|
Table 3 Participant characteristics
Feasibility of conducting a definitive CRT
Recruitment, consent and sampling
Figure 1 shows the study flow chart including the response, recruitment and retention rates throughout the study. The recruitment targets were met in terms of: HPs (n= 8); schools (n= 6); and the total number of children (n= 60) (Table 3). Due to staff absence (maternity leave/sick leave) the RISE teams could only facilitate the study in particular geographical sectors within their HSCT areas. Consequently, from the list of schools identified in areas of SD (n= 43), a considerable number (n= 17) had to be excluded on the basis of their location. As a result, the criteria in respect of SD was widened to include schools ranked within the lowest quintile within the HSCT (rather than the lowest decile). The overall rate of parental consent (72%) was good. However, some parents of children about whom teachers had concerns did not consent and the desired proportion of children in this sub-group was not achieved (n= 22, 37% compared to the target of n= 30, 50%). It was also apparent during the sampling process that teachers did not always know whether children did/did not have a diagnosis.
Blinding
The outcome assessors (RAs) remained blind to the intervention groups but, due to the nature of the intervention and materials provided, the teachers in RECALL became aware of their allocation.
Attendance and loss to follow up
No schools or individual participants dropped out of the study. Two children did not complete post-intervention assessments as they were absent from school, indicating minimal loss to follow up (3%).
Figure 1. RECALL cluster randomised feasibility trial flow char (following CONSORT guidance, 2010) [27]
Appropriateness of outcome measures
The acceptability of the outcome measures used with the children was considered from two perspectives: i) the ease of administration and scoring; and ii) the appropriateness of the tests in terms of their psychometric properties for assessing WM, attention and language in the population of interest (4-5 year olds in areas of SD).
Ease of administration
Two methods were used to monitor the children’s progress from week to week. For the odd one out task and some of the phoneme awareness tasks, the children each had an individual booklet and they marked their response using stampers. The data gathered through the observations of RECALL in the classroom and via the semi-structured interviews with the HPs and teachers indicated that this approach was not acceptable. The children needed help to turn the pages of the booklets so the HPs or teachers had to repeatedly pause the task to ensure all of the children were on the right page. The children were distracted by the stampers and tended to stamp ad hoc in their booklets. Hence, not only was the data collected unreliable, this method interfered with the task delivery.
For tasks that required a verbal response (listening recall and some of the phoneme awareness tasks) individual digital voice recorders were trialled with five children. The devices were small and unobtrusive and did not interfere with the delivery of the task. However, this method did not yield usable data. The microphones picked up too much background noise from the classroom, meaning the child’s voice could not be distinguished. It was also difficult to hear the facilitator’s voice when presenting the trial items so the accuracy of the child’s response, could not be judged.
Regarding the pre-and post-intervention outcome measures, administering the full battery of assessments with each child was time-consuming and this may have impacted negatively on the children’s motivation and performance. In particular, the NRDLS took a considerable amount of time to complete, whereas the CELF-P (trialled in one school) was much quicker to administer. With regards to the NEPSY-II’s auditory attention and statue subtests, all of the RAs found it difficult to observe and simultaneously record the children’s performance. Therefore, they doubted the accuracy of their scoring. If this test were used in a full trial, thorough training and practice should be provided to those administering it and inter-rater reliability must be measured.
Regarding the proxy measures of children’s functional skills, all of the teacher rating scales (BRIEF-P) of attention in the classroom were completed. This suggests that the checklist was acceptable to teachers. Children’s communication skills at home were measured using the FOCUS-34. This tool looks at change/improvement in the child’s communication skills over time (rather than providing a direct measure of their ability). It should be completed by the same parent at each time point with support from a SLT [43]. Due to the classroom-based nature of this trial, the forms were sent home for parents to complete and return to the school. Therefore, the parents completed this measure without support. Completed checklists were returned at both time points for 35 children (58% of the sample) but examination of the raw data indicated that for 8 children the forms were not completed by the same parent at the two time points. This raises questions about the reliability of the data. Furthermore, two outlying scores were apparent indicating possible misunderstanding of scoring (a Likert scale) by the parents. In a future trial, greater support would need to be provided to parents (as outlined in the protocol for the FOCUS-34) to avoid these potential issues.
Psychometric properties
One of the key aims of this feasibility study was to determine whether the outcome measures are sensitive enough to detect change as a result of intervention. To examine the sensitivity of the outcome measures, we looked at the distribution of the children’s scores at baseline for each assessment. If a lot of children scored at floor levels (low scores), a measure may be sensitive to change, though only if the floor effect is not so low that it masks future improvement. Conversely, if a lot of children score at ceiling levels (high scores) at baseline, the measure may not detect change at the post-intervention time point.
To identify potential floor or ceiling effects, the direction of the scores were explored in terms of skewness: negative skewness values indicate a clustering of scores at the high end that could indicate ceiling effects; and positive skewness values indicate a clustering of scores at the low end that may indicate floor effects [48]. First, the skewness of the children’s scores for the full sample (n= 60) at baseline were explored. This sample included children considered to be typically developing (n= 26), as well as those with identified difficulties (n= 20) and those about whom teachers had concerns around listening and communication skills (n= 22). This means that any clustering of scores to the high end (negative skewness) could be attributed to high performance on the part of the typically developing children, which could mask the true sensitivity of a measure to detect improvements in children with poorer baseline skills. Consequently, further assessments of normality were conducted by splitting the sample into two groups: typically developing children (TD group) (n= 26); and children about whom there are concerns/ recognised difficulties with listening and communication skills (concerns group) (n= 34).
Table 4 presents the descriptive statistics for the baseline data (mean, standard deviation and skewness) for all of the outcome measures for the total and the split sample, along with a brief interpretation of these results. Degrees of skewness were interpreted as follows: less than −1 or greater than +1 = highly skewed; between −1 and −.5 or between +.5 and +1= moderately skewed; and between 0.5 and +.5 = approximately symmetric distribution [49]. Moderate skewness values were considered to be acceptable, but high skewness values (denoted by shaded cells in Table 4) were taken as an indication that a test may not be appropriate for a full trial. The table clearly shows the difference in the distribution of scores between the TD group and the concerns group. For the TD group, potential ceiling effects were found for the phoneme isolation subtest of the PIPA (skewness -1.99), and the BRIEF-P global executive and WM scales (skewness 1.67 and 1.32 respectively[1]). The overall direction of the scores for the children with concerns was the same for these measures (skewed towards better performance) but only to moderate levels indicating that these tools should be appropriate for use in a full trial of RECALL.
The pattern of results presented in Table 4 highlighted some issues that required further investigation. For the listening recall subtest of the AWMA, scores were clustered at the lower end for both groups (TD skewness 1.08, concerns group skewness 2.21) indicating potential floor effects that may mask children’s improvement in a large scale trial. In addition, there was a need to clarify the optimal measures for phoneme awareness (the phoneme isolation or segmentation subtest) and language skills (the NRDLS or the CELF-P). Given the issues highlighted previously about the parents’ scoring of the FOCUS-34 the results of this measure also needed further examination.
Baseline and post-intervention scores (mean and standard deviations) for the three intervention groups (RECALL, RISE and no-intervention control) for the full sample (n=60) were compared (Table 5) to provide an indication as to whether the outcome measures detected change. As the difference score between pre and post intervention assessment on the FOCUS-34 is the measure of interest for that assessment, only the change score was analysed. The data were not tested for statistical significance of treatment effects because this study aimed to assess the feasibility of a future full trial and consequently, the sample obtained was not statistically powered to support this type of analysis [27]. The findings are summarised below.
Listening recall: Table 5 shows that this test detected differences between the means for the three intervention groups at baseline and at the post-intervention time points. Therefore, despite children’s scores being highly skewed towards the lower end, this test should be appropriate for a large-scale trial of RECALL.
Phoneme awareness: The phoneme isolation subtest was originally favoured for this study because it relates directly to the tasks trained in RECALL (identifying the first sound in a word). Due to concerns about potential ceiling effects at baseline, the phoneme segmentation subtest was trialled as an alternative in one school. Post-intervention results suggest that the phoneme isolation subtest is sensitive to change in the population of interest because differences across the intervention groups were apparent. Table 5 shows that the RECALL and RISE (active control) groups improved on the phoneme isolation task, but the no-intervention control group did not. Since the RAs reported that these tasks were quick and easy to administer, it may be acceptable to include both the phoneme isolation and segmentation subtests of the PIPA in a full trial.
Language: The direction of the distribution of scores for the NRDLS and the CELF-P (Table 4) indicated that the NRDLS scores were moderately skewed towards high scores (-.57) and the cumulative raw scores for the three subtests of the CELF-P were moderately skewed towards low scores (.56). Taken together with the RAs’ report that the NRDLS was time-consuming to complete, this suggests that the CELF-P may be a more appropriate language measure for a definitive trial of RECALL.
FOCUS-34: For the purpose of the analysis of the parent rating scale two outliers (where it was apparent that the parents had misinterpreted the form) were removed. The FOCUS-34 measures change over time, with a difference of more than 11 points indicating significant clinical change [43]. Table 5 shows a clear difference between the mean change score (the post-intervention score minus the baseline measure) for the no-intervention control group (x̄= 2.12, SD = 10.23) in comparison to the RECALL (x̄= 13.46, SD= 21.70) and RISE groups (x̄= 12.58, SD= 18.38). This suggests that the measure would be sensitive to change over time.
Outcome Measure
|
Full sample (n= 60)
|
Split sample
|
Interpretation of results
|
Typically developing group (n= 26)
|
Concerns group (n= 34)
|
Outcome
|
Task
|
Test
|
Mean (SD)
|
Skewness[2]
|
Mean (SD)
|
Skewness
|
Mean (SD)
|
Skewness
|
Trained task
|
Listening recall
|
AWMA[3]
|
1.16 (1.68)
|
1.57
|
1.58 (1.98)
|
1.08
|
.81 (1.33)
|
2.21
|
Both groups: scores highly skewed towards the low end- potential floor effects
|
Odd one out
|
AWMA
|
7.16 (3.54)
|
.28
|
7.88 (3.98)
|
-.13
|
6.56 (3.07)
|
.61
|
Both groups - distribution approximates normality
|
Phoneme awareness
|
PIPA Phoneme isolation*
|
8.90 (4.04)
|
-1.08
|
10.08 (3.75)
|
-1.99
|
7.81 (4.05)
|
-.60
|
Full sample and TD group: highly skewed towards high scores - potential ceiling effects. Children with concerns- moderately skewed
|
PIPA Phoneme segmentation†
|
.30 (.675)
|
2.28
|
.00
|
-
|
.38 (.74)
|
1.95
|
Both groups: highly skewed towards the low end - potential floor effects.
|
Near-transfer (untrained WM)
|
Digit recall
|
AWMA
|
18.24 (4.96)
|
-.46
|
18.69 (6.41)
|
-.80
|
17.88 (3.42)
|
.66
|
Children with concerns: moderate skewness towards high scores for digit recall and counting recall.
|
Block recall
|
AWMA
|
10.74 (3.24)
|
-.28
|
11.65 (3.90)
|
-.73
|
10.00 (2.41)
|
-.39
|
Counting recall
|
AWMA
|
6.21 (3.19)
|
-.42
|
7.00 (3.60)
|
-.53
|
5.56 (2.71)
|
-.90
|
Nonword recall
|
AWMA
|
4.52 (3.56)
|
.15
|
3.96 (3.14)
|
.03
|
4.97 (3.86)
|
.08
|
Far-transfer
|
Auditory Attention
|
NEPSY-II
|
19.54 (6.22)
|
-.69
|
20.62 (6.47)
|
-.94
|
18.70 (5.98)
|
-.62
|
Both groups: moderate skewness towards high scores
|
Statue
|
NEPSY-II
|
22.64 (5.58)
|
-.81
|
25.69 (2.95)
|
-.61
|
20.24 (6.02)
|
-.23
|
Full sample: moderate skewness towards high end. Concerns group- approximates normality.
|
Language
|
NRDLS*
|
61.06 (4.58)
|
-.74
|
62.75 (3.25)
|
.29
|
59.50 (5.11)
|
-.57
|
Both groups: NRDLS scores moderately skewed towards high performance; CELF-P scores moderately skewed towards lower end
|
CELF-P†
(Cumulative Raw Scores)
|
55.40 (8.53)
|
.44
|
61.5 (10.61)
|
-
|
53.80 (8.01)
|
.56
|
Behaviour in the classroom
|
BRIEF-P[4]
Global Executive Composite
|
99.57 (30.21)
|
.90
|
88.73 (32.13)
|
1.67
|
107.85 (26.20)
|
.76
|
For both scales of this measure: scores are highly skewed to lower end (indicating better performance) for the TD group but not for the concerns group.
|
BRIEF-P WM scale
|
62.20 (15.56)
|
.52
|
25.27 (10.48)
|
1.32
|
31.7 (8.34)
|
.26
|
Communication skills at home
|
FOCUS 34 baseline
|
189.39 (39.76)
|
-1.52
|
204.05 (36.76)
|
-2.51
|
179.13 (39.11)
|
-1.35
|
Highly skewed for full sample and both groups but to a greater degree for TD
|
Table 4. Descriptive statistics for raw scores at baseline for the full and stratified samples
Outcome
|
Task
|
Test used
|
Time point
|
RECALL
(n= 20)
|
RISE Active Control
(n= 20)
|
No Intervention
(n= 20)
|
Mean (SD)
|
Mean (SD)
|
Mean (SD)
|
Trained task
|
Listening recall (ELWM)
|
AWMA
|
Baseline
|
.47 (.77)
|
1.22 (1.83)
|
1.41 (1.66)
|
Post-intervention
|
4.11(3.12)
|
5.28 (4.51)
|
2.35 (3.74)
|
Odd one out (ELWM)
|
AWMA
|
Baseline
|
7.00 (3.13)
|
5.94 (3.11)
|
8.06 (4.13)
|
Post-intervention
|
8.42 (3.16)
|
10.44 (3.09)
|
9.24 (4.49)
|
Phoneme awareness
|
PIPA
Phoneme isolation subtest*
|
Baseline
|
6.33 (5.32)
|
9.47 (2.97)
|
9.05 (4.21)
|
Post-intervention
|
7.56 (3.64)
|
9.63 (3.40)
|
7.16 (3.85)
|
PIPA
Phoneme segmentation subtest†
|
Baseline
|
.30 (.21)
|
-
|
-
|
Post-intervention
|
2.10 (.31)
|
-
|
-
|
Near-transfer (untrained WM)
|
Digit recall
|
AWMA
|
Baseline
|
16.58 (5.78)
|
19.78 (4.25)
|
17.59 (4.32)
|
Post-intervention
|
19.37 (4.04)
|
18.61 (4.64)
|
18.29 (4.95)
|
Block recall
|
AWMA
|
Baseline
|
11.05 (2.80)
|
10.28 (3.48)
|
10.76 (3.7)
|
Post-intervention
|
11.05 (2.55)
|
10.56 (5.22)
|
9.41 (3.97)
|
Counting recall
|
AWMA
|
Baseline
|
16.58 (5.78)
|
19.78 (4.25)
|
17.59 (4.32)
|
Post-intervention
|
19.37 (4.04)
|
18.61 (4.64)
|
18.29 (4.95)
|
Nonword recall
|
AWMA
|
Baseline
|
3.58 (3.61)
|
6.39 (2.97)
|
3.35 (3.37)
|
Post-intervention
|
7.26 (2.88)
|
8.61(4.13)
|
6.65 (3.23)
|
Far-transfer
|
Auditory Attention
|
NEPSY-II
|
Baseline
|
18.00 (6.29)
|
20.05 (6.69)
|
21.59 (5.83)
|
Post-intervention
|
17.47 (6.78)
|
21.68 (5.45)
|
19.82 (5.33)
|
Statue
|
NEPSY-II
|
Baseline
|
21.32 (5.82)
|
22.47 (6.01)
|
23.72 (5.13)
|
Post-intervention
|
26.37 (4.14)
|
26.47 (6.60)
|
23.72 (5.10)
|
Language
|
NRDLS Comprehension Scale*
|
Baseline
|
60.56 (5.72)
|
61.47 (3.79)
|
60.35 (4.76)
|
Post-intervention
|
62.33 (2.74)
|
62.95 (2.70)
|
61.35 (5.15)
|
CELF-P†
(Cumulative Raw Scores)
|
Baseline
|
55.4 (8.53)
|
-
|
-
|
Post-intervention
|
57.00 (7.24)
|
-
|
-
|
Behaviour in the classroom
|
BRIEF-P[5]
Global Executive Composite
|
Baseline
|
60.20 (12.61)
|
57.55 (14.40)
|
68.85 (17.66)
|
Post-intervention
|
57.45 (11.68)
|
50.70 (9.75)
|
63.45 (15.40)
|
BRIEF-P Working memory scale
|
Baseline
|
27.9 (7.37)
|
25.85 (9.63)
|
33.00 (11.11)
|
Post-intervention
|
25.55 (6.97)
|
22.95 (6.02)
|
29.80 (9.62)
|
Communication skills at home
|
FOCUS-34 (Change score)
|
Post-intervention minus baseline
|
13.46 (21.70)
|
12.58 (18.38)
|
2.12 (10.23)
|
Table 5. Baseline and post-intervention mean and standard deviations for raw scores at baseline and post-intervention (per group) for full sample (n= 60): rows are shaded to ease reading
Acceptability of RECALL
Figure 2 shows that three major themes were identified in the qualitative data gathered through the semi-structured interviews with the HPs and teachers who delivered RECALL.
Some RECALL components are acceptable
All of the HPs and teachers liked the fantastical play component of RECALL, reporting that the puppet, fantastical themes and props were appropriate and fun for year 1 children. The phoneme awareness tasks were easy to administer due to their similarity to usual classroom practice. The listening recall task was also quick and easy to administer. It was at an appropriate level of difficulty (with both the teachers and the HPs reporting that the children seemed to improve across the 6-week intervention period) and engaged the class. The fact that the sentences tied in with the fantastical themes and were funny seemed to appeal to even the most inattentive children. One of the teachers reported:
“I think, the listening recall one benefitted and involved every child…..It was actually boys I noticed who probably stick out with the listening recall and the boys who like imaginative play and who like a giggle. So, I actually found that really related to boys. It related to everybody, but they stood out. It surprised me that they were interested. It was just because they thought it was funny, so it just hooked them in and they wanted to be part of it.”
Odd one out is challenging
None of the HPs or teachers liked the odd one out task in its current format. The teachers were uncertain about the nature of this specific task and how to deliver it e.g., whether it was acceptable for children to place their fingers on the location of the odd one out picture in their booklets. As outlined earlier, the booklets were hard to manage and the stampers were distracting, meaning that the facilitators had to pause frequently to help the children, thereby disrupting the flow of the activity and elongating it. The participants all reported that there were too many trial items per session so the children became unmotivated, especially those with existing attention difficulties who tended to copy their peers’ responses. The HPs and teachers all stated that the difficulty level increased too quickly and the children would have benefitted from additional practice at the 2-to-be-remembered item level. One of the teachers stated:
“I found it was a very big challenge for a lot of them [the children]. At the start it wasn’t too bad, but then as it progressed and maybe you were at three odd one out on the one page, then four- it was really, really difficult. Again, those few [children] in the top group would have been trying to focus really well but so many just lost it and a lot of them were randomly stamping. The wee weaker groups, they just weren’t focused at all.
Figure 2. Qualitative data themes identified in semi-structured interviews with the HPs and teachers who delivered RECALL
Groups are too big
Whilst the use of booklets and stampers to record children’s responses impacted on the acceptability of RECALL, the size of the groups was also identified as a barrier to the intervention delivery. The number of children in the class (divided into groups of 9 or 10) made it difficult to deliver the tasks and to monitor children’s progress. This was noted by all of the HPs and teachers during the semi-structured interviews (even for the listening recall task which was universally liked by the participants).
This was summed up by one of the HPs:
“…..if there was less children it would be so much easier to guide and judge how they were doing. Because you were only getting a general idea [of how they were doing].”
Compliance and fidelity to the intervention delivery
There was good compliance with the implementation of RECALL regarding the total number of sessions completed (95%) and the number of trials delivered (11 practice items of listening recall and odd one out, and 10-15 minutes of phoneme awareness training). In terms of the quality of delivery, fidelity to the intervention protocol varied between the 2 schools (there was a high degree of inter-rater consistency on the fidelity measure across the research team): school 1 (76%) and school 2 (45%).
This discrepancy related to the delivery of the odd one out task during the teacher-delivered RECALL sessions. In school 1, the teacher divided the class into three groups, as specified in the intervention protocol. In school 2, the teacher presented the task to all of the children at the same time, holding up the picture stimuli and walking around the classroom until each child had seen them. Then the children all stamped the location of the odd one out picture in their booklets. This lengthened the time that the children had to hold the information in their WM, both changing the nature of the task and making it too difficult. The overall duration of the session also increased and the children, especially those who were inattentive, became unmotivated and restless.
[1] On the BRIEF-P higher scores indicate higher executive dysfunction meaning that the child presents with greater difficulty coping in the classroom so positive skewness values indicate potential ceiling effects.
[2] Skewness: 0=perfect normality; negative skewness values indicate a clustering of scores at the high end; positive skewness values indicate clustering at the low end (except on the BRIEF-P (Gioia et al., 2003) where lower scores indicate greater degrees of executive dysfunction so positive skewness = clustering of scores at the high end. Shaded cells =highly skewed values (>1 or <-1)
[3] Raw scores on AWMA subtests represent the number of trials correct (rather than memory span)
[5] Note: higher scores on the BRIEF-P [42] indicate greater degrees of executive dysfunction. A reduction in scores over time indicates improvement. For tests marked* sample (n= 50); for tests marked† sample (n= 10). For the FOCUS-34 [43] change scores of >11 points indicate significant clinical change.