This is the first study to examine fidelity to FBT from the perspectives of multiple raters including expert, therapist, parent, and peer. Although these data are exploratory in nature, the peer rater demonstrated the highest concordance with the expert rater, while therapist raters and parent raters demonstrated low concordance with the expert when looking at items from session 1, 2 and 3. Scale level analysis indicated that expert fidelity ratings for phase 1 treatment sessions scores were significantly higher than the peer ratings and, that parent fidelity ratings tended to be significantly higher than the other raters across all three treatment phases. Interestingly, there were no significant differences between the mean therapist ratings and the expert ratings. The item level data suggest that peers might present the best method for fidelity assessment, whereas the scale level analysis indicates that therapist ratings do not differ significantly from expert ratings, and therefore therapist self-rating may be an acceptable option for fidelity rating in these types of settings. Of course, therapist ratings would be much more cost efficient.
One possible reason for poor agreement in rating is that parents, therapists and the peer did not receive training in fidelity rating. This was intentional, as we had hoped to see how fidelity rating would occur in real-world clinical settings and how raters would compare to the expert. Although the peer rater was not a novice in the field of FBT, she had not received any formal training in FBT session fidelity rating. There also seemed to be a pattern of the greatest agreement in session 1 which is the most prescriptive, and lower levels of agreement as treatment progressed. It is likely easier to rate a session which has several well-described interventions. We theorize that rating may become more difficult and more divergent as the treatment becomes less prescriptive. This pattern has been seen in other studies as well (19, 20). In addition, peer ratings were significantly lower compared to the expert on the scale level analysis. We postulate that peers may be more vigilant and critical with respect to treatment interventions, whereas an expert may be more accepting of variations on intervention delivery as they have more experience in this regard.
It is also difficult to determine which analysis is more helpful in rating fidelity; the item level or scale level analysis. In a laboratory setting in which interrater agreement is critical on each item, and training is received in fidelity rating, the item level analysis may be most appropriate. The broader approach of scale level analysis might be most helpful when looking at significant differences rather than agreement. However, when learning an intervention such as this in the real-world, it is important to achieve acceptable levels of fidelity on all key interventions of the treatment (all items). This is where a threshold approach could be considered with respect to examining treatment fidelity. What level of fidelity is good enough? This question remains to be answered in terms of the level of fidelity related to treatment outcomes. Without this data it is difficult to determine what cut-off is adequate. Our previous study (25) used an a priori threshold of 80% (5.6/7) as adequate fidelity on each session (as opposed to each item), as this level is considered “considerable”, however, how this threshold relates to outcome is unknown. In this previous study, only one therapist out of eight met this bar, whereas the mean scores of the all eight therapists where actually quite similar to fidelity ratings in other FBT trials (20, 30), suggesting that perhaps this threshold is too high. In fact, a recent study by Dimitropoulos and colleagues (19) suggests that 4/7 or higher (set a priori) is adequate fidelity. However, this study found that adherence was not related to outcome in terms of percent ideal body weight (19), emphasizing the complexity of how fidelity and adherence relate to outcome. Further work is needed in this area.
Some of our findings align with Chapman and colleagues (21) who reported that youth and caregiver fidelity raters for adherence to a substance abuse treatment protocol for adolescents were highly inaccurate compared to treatment experts. Therapists and trained raters were generally consistent with ratings of treatment experts (21). The trained raters were bachelor to masters level research assistants with no experience in delivering the treatment. Parents and youth were much more likely to indiscriminately endorse the occurrence of key treatment components (21).
Also, somewhat in alignment with our findings, the reliability of therapist self-report has been called into question in prior research. Modest to weak agreement between therapist and observer fidelity ratings have been demonstrated in two studies of motivational interviewing with adults (23, 24). These raters were trained coders, most with a masters’ degree and experience in treatment delivery. An additional study of treatment fidelity for children with conduct problems reported that trained observers (research students pursuing a masters’ degree in social work, marriage and family therapy, or psychology) rated less frequent use of evidence based interventions compared to therapists’ fidelity ratings (22).
Interestingly, the degree of interrater agreement between the fidelity ratings of therapists and observers may vary depending on the type of treatment being delivered, as seen in a study by Hogue and colleagues (33). These researchers examined the fidelity of evidence-based practices for adolescent behaviour problems by comparing therapist to observer fidelity ratings. Observers were trained observational raters and had a masters’ degree in social work or psychology. Findings showed that therapists providing a family therapy intervention consistently provided fidelity ratings that aligned with those of expert raters. However, when these therapists rated their fidelity for motivational interviewing along with cognitive behaviour treatment, their inter-rater concordance with expert ratings was poor (33). The underlying mechanisms for this difference in alignment based on intervention remain unclear (33).
Innovative methods to capture and rate fidelity are currently being studied. Caperton and colleagues (34) examined “thin slices” of motivational interviewing sessions in order to determine the shortest session fragment for which fidelity by interrater agreement was similar between the thin slice and the full session. The thin slices were captured at random. These authors determined that approximately one third of a session, or about 9 minutes, had sufficient agreement to approach interrater levels for the full session. Although these authors caution that these results apply to motivational interviewing only, this could be a valid method to be studied for FBT to determine what length of session would be sufficient to be rated for fidelity. There may be unique features to FBT that would not be captured by this method, for example charging the parents with the task of refeeding, which only occurs at the end of session one.
As an alternate measure of fidelity, innovative ways of measuring therapist competence in FBT are also being developed. Lock and colleagues (35) recently used a new method to evaluate therapists’ behavioural skills acquisition of FBT using online training. Forty-six therapists were randomized to receive regular online training or enhanced FBT online training with extra modules focused on the two key elements of agnosticism and externalization. These authors recorded participants’ responses to video vignettes and then rated them on competence using a newly developed measure. Although this method does not examine fidelity directly, it is an innovative method of evaluating therapists’ acquisition of these skills. The findings indicated that the group receiving advanced training performed better in terms of skill in the area of agnosticism.
There are several important limitations to this study. The first limitation is the small sample size and the exploratory nature of the data. Future studies should aim to include more therapists, peers and experts. Caution should be used in interpreting our findings as our small sample size was used in statistical testing. In addition, there may have been bias present in the way the fidelity data were collected. For example, parents knew that the therapist was collecting the data and sending it to the researchers. Thus, parents may have felt pressure to provide inflated scores, as they would not want their scores to negatively affect their relationship with their therapist. Although efforts were made to ensure that parent reports were confidential and would be faxed directly to the researchers without being seen by the therapists, parents may still have been subject to this bias. It is important to note that the ICC data are based on 13 items from the first three sessions of FBT. It is quite possible that later sessions of Phase 1, along with Phase 2 and 3 would have produced different results, and that the level of agreement may depend on the phase of treatment. However, evidence suggests that fidelity within these sessions is sufficient for predicting end of treatment outcomes (19, 20), and that early weight gain by the start of session 4 predicts with 80% probability end of treatment remission (36, 37). Thus, we felt fidelity to these first three sessions was likely the most important and could be a way to assess fidelity efficiently in clinical practice settings.
Although concordance was highest between the peer and expert rater in our study using ICC data at the item level, it is important to note that no significant differences were seen on mean session scores between the expert and the therapists. Therefore, when thinking of the most pragmatic and cost-efficient method of fidelity measurement, therapist rating would likely meet these requirements. However, an important caveat to our findings is that even when a pragmatic measure of treatment fidelity is identified, some authors have found that it is difficult for practitioners to sustain fidelity assessment in the long-term, unless an organizational culture is present that recognizes fidelity assessment as essential for optimal clinical outcomes (8). Therapist drift can occur over time, and thus, ongoing attention to fidelity is important well after a new treatment has been implemented and maintained.