Study characteristics
Characteristics of the included studies (Table 1), showed that only one (Laugeson et al., 2014) out of the 15 studies was a non-randomised study, whilst the others were all RCTs. Collectively, there were 1244 participants enrolled, of which around 80% were males. Sample sizes recruited into the studies ranged from 20 to 110 participants. Studies were conducted in the United states of America (Gantman et al., 2012; Hume et al., 2022; Ko et al., 2019; Laugeson et al., 2009; Laugeson et al., 2014; Laugeson et al., 2015; Matthews et al., 2018; Matthews et al., 2020; Vernon et al., 2018), Israel (Rabin et al., 2018, Rabin et al., 2020), Korea (Yoo et al., 2014), Taiwan (Chou, 2020), Australia (Afsharnejad et al., 2022) and Netherlands (Idris et al., 2022). Twelve studies included youth aged between 12 to 18 years (Afsharnejad et al., 2022; Chou, 2020; Idris et al., 2022; Ko et al., 2019; Laugeson et al., 2009; Laugeson et al., 2014; Matthews et al., 2018; Matthews et al., 2020; Rabin et al., 2018; Rabin et al., 2020; Vernon et al., 2018; Yoo et al., 2014), two studies included youth aged between 18 to 23 years old (Gantman et al., 2012; Laugeson et al., 2015), and one study only reported a mean age of 16.2 years (Hume et al., 2022). One study was clinician-led (Afsharnejad et al., 2022), three were teacher-led (Chou, 2020; Laugeson et al., 2014; Hume et al., 2022), nine were parent-led (Gantman et al., 2012; Idris et al., 2022; Laugeson et al., 2009; Laugeson et al., 2015; Matthews et al., 2018; Matthews et al., 2020; Rabin et al., 2018; Rabin et al., 2020; Yoo et al., 2014), and two were peer-led (Ko et al., 2019; Vernon et al., 2018).
(Insert Table 1)
All participants had a diagnosis of ASD. Eight studies (Afsharnejad et al., 2022; Idris et al., 2022; Ko et al., 2019; Laugeson et al., 2014; Matthews et al., 2018; Matthews et al., 2020; Rabin et al., 2018; Rabin et al., 2020) noted a formal diagnosis of ASD as per Diagnostic Statistical Manual-IV/5 (APA) or the ADOS, two studies (Chou, 2020; Hume et al., 2022) included participants who were receiving education and/or services for ASD, four studies (Gantman et al., 2012; Laugeson et al., 2009; Laugeson et al., 2015; Yoo et al., 2014) included participants who had a previous diagnosis of ASD by a reliable mental health clinician, and one study (Vernon et al., 2018) didn’t elaborate on how a diagnosis was obtained. Seven studies (Afsharnejad et al., 2022; Gantman et al., 2012; Laugeson et al., 2015; Matthews et al., 2020; Rabin et al., 2018; Rabin et al., 2020; Yoo et al., 2014) validated the ASD diagnosis by administering the ADOS or the Autism-spectrum quotient (AQ) (Baron-Cohen S et al., 2001) upon entering the study.
All the studies were included in the quantitative analysis, were separated into studies that compared the intervention with a waitlist control group (or services as usual group) and those that compared the intervention with an active control group. An active control involved participants being randomly assigned to a manualised structured group, which controlled for the exposure to a social group.
The average duration of the intervention was 21.1 weeks (SD = 23, ranging from 12 to 104 weeks). And the average duration of the weekly sessions was 92.3 minutes per week (SD = 29.91, ranging from 30 to 180min/week). All included studies targeted social skills as the primary outcome measure, and none targeted restricted interests or behaviours. A summary of findings is presented as Supplementary Table S2.
Quality assessment and Risk of Bias
Quality assessment and risk of bias for each study is outlined in Tables 2 and 3. Most studies ranged from fair (Chou, 2020; Gantman et al., 2012; Hume et al., 2022; Ko et al., 2019; Laugeson et al., 2015; Rabin et al., 2018; Yoo et al., 2014) to good (Afsharnejad et al., 2022; Idris et al., 2022; Laugeson et al., 2014; Rabin et al., 2020), however four were assessed as poor quality (Laugeson et al., 2015; Matthews et al., 2018; Matthews et al., 2020; Vernon et al., 2018). The quality of the evidence was assessed by the GRADE tool (Supplementary Table S2), where the outcome measures faired low to very low.
Table 2
Quality assessment of randomised controlled studies using Joanna Briggs Institute (JBI) critical appraisal checklist
Author and Year | Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Q7 | Q8 | Q9 | Q10 | Q11 | Q12 | Q13 | Quality |
Afsharnejad 2022 | Y | U | N | NA | NA | Y | N | Y | Y | Y | Y | Y | Y | Good |
Chou 2020 | Y | U | N | NA | NA | NA | Y | U | Y | Y | Y | Y | Y | Fair |
Gantman 2012 | Y | U | N | NA | NA | NA | Y | Y | Y | Y | N | Y | Y | Fair |
Hume 2022 | Y | U | N | NA | NA | NA | Y | N | N | Y | Y | Y | Y | Fair |
Idris 2022 | Y | U | Y | NA | NA | Y | N | Y | N | Y | Y | Y | Y | Good |
Ko 2018 | Y | U | U | NA | NA | Y | Y | N | N | Y | Y | Y | Y | Fair |
Laugeson 2009 | U | U | N | NA | NA | NA | Y | N | Y | Y | N | Y | Y | Fair |
Laugeson 2015 | U | U | Y | NA | NA | NA | Y | N | N | Y | N | Y | Y | Poor |
Matthews 2018 | U | U | N | NA | NA | N | Y | N | N | Y | N | Y | Y | Poor |
Matthews 2020 | U | U | N | NA | NA | N | N | Y | Y | Y | N | Y | Y | Poor |
Rabin 2018 | U | U | N | NA | NA | Y | Y | N | N | Y | Y | Y | Y | Fair |
Rabin 2020 | Y | Y | Y | NA | NA | N | Y | Y | N | Y | Y | Y | Y | Good |
Vernon 2018 | U | U | N | NA | NA | NA | Y | Y | N | Y | N | Y | Y | Poor |
Yoo 2014 | Y | U | N | NA | NA | Y | Y | N | N | Y | Y | Y | Y | Fair |
Table 3
Quality assessment of non-randomised controlled studies using JBI critical appraisal checklist
Author and Year | Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Q7 | Q8 | Q9 | Quality |
Laugeson 2014 | Y | Y | Y | Y | N | Y | Y | N | Y | Good |
(Insert Table 2)
(Insert Table 3)
Effects of intervention
Meta-analysis of four studies (Ko et al., 2019; Rabin et al., 2018; Rabin et al., 2020; Yoo et al., 2014) showed that youth with ASD who underwent a SSI demonstrated a significant observed improvement in social skills, as compared to the waitlist group, with a standardised mean difference (SMD) of 0.49 (95% CI 0.20, 0.78; p-value = 0.0008) (Fig. 2). The GRADE of evidence for the observed measure was classified as low (Supplementary Table S2), due to limited confidence in the precision of the evidence, secondary to small sample sizes.
(Insert Fig. 2)
Meta-analysis of seven studies (Gantman et al., 2012; Laugeson et al., 2009; Laugeson et al., 2015; Matthews et al., 2018; Rabin et al., 2018; Rabin et al., 2020; Yoo et al., 2014), showed outcome in favour of SSI, based on adolescent reports of their social knowledge (SMD = 2.03, 95% CI [1.15; 2.91], p-value = 0.0001) (Fig. 3). However, there was substantial heterogeneity among the included studies (I2 = 86%). During sensitivity analysis, no single cause (such as age, gender distribution, geographical location, or outcome measure) could completely account for the heterogeneity. Given this degree of heterogeneity, the GRADE of evidence was classified as very low quality (Supplementary Table S2).
(Insert Fig. 3)
An additional meta-analysis was conducted on seven studies (Chou, 2020; Laugeson et al., 2009; Laugeson et al., 2015; Matthews et al., 2018; Rabin et al., 2018; Vernon et al., 2018; Yoo et al., 2014) based on adolescent reports of their social performance, which showed in favour of the intervention group (SMD = 0.56, 95% CI [0.19; 0.93], p-value = 0.003) (Fig. 4).
(Insert Fig. 4)
Seven studies (Gantman et al., 2012; Laugeson et al., 2009; Laugeson et al., 2015; Matthews et al., 2018; Rabin et al., 2020; Vernon et al., 2018; Yoo et al., 2014) reported on parent observations, which showed significant improvements in social engagement in the intervention group (SMD = 0.42, 95% CI [0.13, 0.71], p-value = 0.004) (Fig. 5). The GRADE of evidence was rated as low, due to risk of bias of the included studies (secondary to unclear randomisation and concealment processes, lack of blinding, and per-protocol analysis) (Supplementary Table S2).
(Insert Fig. 5)
The pooled estimate of three studies (Chou, 2020; Hume et al., 2022; Rabin et al., 2020) showed that teacher reports were in favour of SSI, however they failed to reach significance (Supplementary Figure S1). In addition, there was substantial heterogeneity between the studies, secondary to differences in geographical locations, type of interventions (in addition, two were teacher-led, and one was parent-led), and outcome measures. Hence, the GRADE of evidence was rated as very low quality (Supplementary Table S2).
Four studies compared the SSI with an active control group (Afsharnejad et al., 2022; Idris et al., 2022; Laugeson et al., 2014; Matthews et al., 2020). One study (Idris et al., 2022) reported a non-significant effect, not in favour of the intervention group, based on an observed measure. The adolescent social knowledge (Laugeson et al., 2014; Matthews et al., 2020), social-performance (Afsharnejad et al., 2022; Idris et al., 2022; Laugeson et al., 2014; Matthews et al., 2020) and parent reports (Idris et al., 2022; Matthews et al., 2020) showed in favour of the social intervention, however failing to reach statistical significance (Supplementary Figures S2, S3 and S4). And, the teacher report, from two studies (Idris et al., 2022; Laugeson et al., 2014) did not favour the intervention group (Supplementary Figure S5).
Follow-up measure
Seven studies reported on a follow-up measure, but only two studies (Afsharnejad et al., 2022; Idris et al., 2022) permitted an analysis. The waitlist control studies compared the follow up measure with the baseline measure of all participants, but not between groups. This wasn’t feasible, as the waitlist control group also received the intervention after a wait time. The two included studies in this analysis were those with active controls. Adolescent social-performance measure showed an effect in favour of the SSI, however this failed to reach significance (Supplementary Figure S6).
Publication bias
Outcome measures with six or more studies were inspected for publication bias visually on the funnel plot and statistically from Egger’s test of asymmetry (Egger et al., 1997). The funnel plots do not indicate obvious signs of asymmetry (Supplementary Figures S7, S8 and S9), and this was supported by the Egger’s test, where all outcomes had a p > 0.05, suggestive of funnel plot symmetry (Supplementary Table S2).
Implementation fidelity
Thirteen studies commented on implementation fidelity. Fidelity was measured on either a daily or weekly check, dependent on the structure of the study. Adherence was measured via a variety of methods, such as observed role-play scenarios during weekly group sessions (Gantman et al., 2012; Laugeson et al., 2015; Rabin et al., 2020; Yoo et al., 2014) checklists (Afsharnejad et al., 2022; Hume et al., 2022; Idris et al., 2022; Laugeson et al., 2009; Matthews et al., 2018; Matthews et al., 2020), and self-assessment (Laugeson et al., 2014). Three studies (Idris et al., 2022; Ko et al., 2019; Vernon et al., 2018) commented on the implementation, which indicated that between 91 to 97% of the time, procedures were accurately implemented.