Tool Development
The study Principal Investigator (PI; CCL) and the first author (CWB) reviewed available literature to locate examples of tracking tools regardless of substantive focus (e.g., journaling prompts, brainstorming activities, Bunger et al.’s activity log (8) and Boyd and colleagues’ coding system (9)). We developed three tools in this pilot that varied on the degree to which they used open ended or structured/forced response options. There are tradeoffs to the degree of structure used (i.e., open versus forced), which may impact the completeness, quality, and actionability of the data collected. Greater reliability is associated with highly structured tools due to the standardization of response options (15). Less structured tools can allow for greater content validity and comprehensive qualitative coverage of a construct because participants are not limited in their responses, which can enhance content coverage beyond responses considered by the measure developer (16). However, in addition to allowing for a more diverse collection of responses, Reja el al. (17) note that open-ended questions require more extensive coding and are more prone to missing data (17). Edwards (15) indicated that open-ended questions may increase participant burden, while close-ended questions may be subject to bias, whether imposed by the investigator’s lens or participant avoidance of selecting extreme options.
Despite the variability in the degree of structure, all three tools sought to capture the same categories of information (Table 1). The tools were informed and evaluated by three implementation frameworks and compilations (Supplemental file 1). One, Proctor and colleagues (10) offered detailed recommendations for implementation strategy reporting and specification to enable replication in research and practice (Figure 1). Tools varied in the degree to which responses aligned with these recommendations. Two, Lyon and colleagues (18, 19) created the SISTER compilation, in which existing implementation strategy labels and definitions were revised to improve fit with school settings, with several new strategies created and strategies not relevant to school settings removed. Only the most structured tool offered these standardized strategy labels, and in the case of the other two tools the SISTER compilation was applied to evaluate the quality of the responses. Three, Wiltsey Stirman and colleagues (13) put forth a system for classifying and reporting treatment adaptations that illuminates whether modifications made to an EBP are fidelity consistent or constitute drift. The framework includes identification of who made the modification, at what level of delivery modifications are made, whether the modification was made to the content, context, or training and evaluation of the intervention, and the nature of the context or content modifications. Similar to the application of the SISTER compilation, these categories for treatment adaptation were embedded in the most structured tool and used to evaluate responses to the other two tools. Tool content and design were developed through an iterative process informed by research team meetings with implementation practitioners and the developer of the EBP. The piloted tools are available in Supplemental Files 2-4 and described below.
- Brainstorming Log: This tool was the most open ended (Table 1). The Brainstorming Log was informed by a vocational education trainer’s log (20) and literature on journaling as a qualitative data collection method (21). It consisted of six questions, one of which was multiple choice (“what is your role”) and five free text. Before describing their activities, participants indicated the range of dates for which they reported. First, participants reported treatment adaptations made, describing content and context modifications in separate questions. Second, participants reported on barriers they encountered and strategies deployed (or proposed) to address those barriers.
- Activity Log: This tool, based on Bunger et al. (8), was moderately structured and open ended (Table 1), using five same questions: participant role, date of the activity, time spent on the activity, the purpose, and the attendees. Unique to this tool, we asked about the intended outcome of the activity. This tool did not require participants to specify whether their activity was an implementation strategy or treatment adaptation.
- Detailed Tracking Log: This tool was the most structured and detailed (Table 1). In addition to the questions from the Activity Log, participants were prompted to categorize each activity as an implementation strategy or treatment adaptation through pre-populated response options. Implementation strategies were organized into nine categories delineated by Waltz et al. (22) (see Supplemental File 3) and assigned a label from the SISTER compilation (23, 24). Participants reported on treatment adaptations according to Wiltsey Stirman and colleagues’ (12)
Measures
Three measures of implementation outcomes, the Acceptability of Intervention Measure (AIM), Intervention Appropriateness Measure (IAM), and Feasibility of Intervention Measure (FIM) (25) were used to assess the likelihood that stakeholders might adopt these three tools. Each contain four items rated on a 5-point scale ranging from 1=completely disagree to 5=completely agree. Summary scores for each measure were created by averaging responses, with higher values reflective of more favorable perceptions. AIM was used to measure the degree to which each tracking tool developed in this study was satisfactory to the stakeholders (current sample Cronbach’s alpha of 0.97). IAM measured the relevance or perceived fit of each tracking tool (current sample Cronbach’s alpha of 0.97). FIM measured the degree to which each tracking tool could be successfully utilized (current sample Cronbach’s alpha of 0.96). These scales were followed by an open-ended question, “Please tell us why you rated this tracking method the way you did. What did you like/not like about it?”
The 6-item Adaptations to Evidence-Based Practices Scale (AES; (26)) was added to explore treatment adaptations using an established quantitative measure as a concurrent validity assessment with the tools in development. This measure includes six items rated on a 5-point scale ranging from 0=not at all to 4=a very great extent. AES contains two subscales, “augmenting” adaptations (“I modify how I present or discuss components of the EBP”) and “reducing/reordering” (“I shorten/condense pacing of the EBP”) adaptations. Cronbach’s alpha for the total score was 0.94, the augmenting subscale was 0.90 and the reducing/reordering subscale was 0.93. Mean scores were calculated for each AES subscale, with higher scores indicating more adaptation.
Pilot testing
Setting and Participants. Our study capitalized on a pilot implementation of the Blues Program, an evidence-based cognitive behavioral group depression prevention program. This EBP is intended to promote engagement in pleasant activities and reduce negative cognitions among teens at risk of developing major depression (27, 28). The New York Foundling, a non-profit that offers services to children and families, includes an Implementation Support Center that provided oversight of the Blues Program implementation in New York state high schools. School-based mental health providers were trained by the Blues Program developer (PR) to facilitate group sessions. These three stakeholder groups, the Blues Program developer, implementation practitioners from New York Foundling, and mental health providers participated in our pilot by reporting on their Blues Program related activities via the three tracking tools. The Blues Program developer (N=1) was a PhD trained investigator with 30 years of post-training experience. The implementation practitioners (N=3) all held master’s degrees and had an average of 5.5 years of professional experience. The school-based mental health providers (N=7) also all had master’s level training and worked in their profession an average of 4.3 years.
Data Collection. The tools were administered to the three participant groups across two cycles of the Blues Program each of which lasted six weeks. All groups were randomly assigned one tracking tool per week, distributed via an email that included a link to a web-based survey so that each tool was administered twice each cycle, in random order. Participants were instructed to complete the tracking tool by reflecting on the prior week’s activities; they were given six days to record activities so as not to overlap with the next week’s tool administration. At the end of data collection with each tool, participants completed the IAM, FIM, AIM, and AES. A $10 per survey incentive was offered during the first round of data collection; this incentive was increased to $20 for the second data collection cycle. Response rates for each round are reported in Table 2.
After data collection concluded, the first author conducted semi-structured interviews with participants that responded to the surveys: treatment developer (N=1), implementation practitioners from New York Foundling (N=2), and school-based mental health providers (N=5). One implementation practitioner was on leave when interviews were conducted and two mental health providers left their positions prior to interviews and could not be contacted follow-up. The response rate among remaining participants was 100 percent for interviews (N=8). Interviews allowed for an in-depth exploration into stakeholders’ experience with the tools. A semi-structured interview guide was prepared to capture information on: 1) perceived benefits of tracking, 2) tracking method preferences, 3) tracking process, 4) background or training for tracking, 5) tracking execution and completion, 6) general utility of tracking, and 7) contextual information (see Supplemental File 4 for full interview guide). Participants were emailed each tool so they could refer to them during the interview. Participants received a $40 incentive upon interview completion. Interviews were recorded and transcribed.
Data Analysis
To assess participant perceptions of the tools, we compared scores on the AIM, IAM and FIM using a generalized estimating equation (GEE, a type of multilevel model). Analyses were completed at the response level, rather than at the individual participant level. Scores on the measures were nested within weeks and roles and we examined fixed effects for role (treatment developer, implementation practitioners, and mental health providers) and tool (Detailed Tracking Log, Activity Log, and Brainstorming Log). As the measures did not show substantial skew or kurtosis, we used a linear model that assumes a normal distribution. We also used a similar analysis to see if the AES differed by role and tool. Because each of the predictors of interest had three categories, we ran each GEE twice, changing the reference group so we could examine all pairwise comparisons (i.e. Activity Log with Detailed Tracking Log, Detailed with Brainstorming Log, and Brainstorming Log with Activity Log). For the analyses, scores on the measures were only used if participants had reported at least 50% of the items. Three records were excluded because of missing data on the tools, while one record was excluded due to missing date information for a final sample of 59 responses.
We entered tool responses into an Access database for coding and created a codebook using established implementation frameworks and compilations. We categorized implementation strategies using the Waltz et al. (22) and SISTER compilations (23, 24); Proctor et al.’s (10) reporting recommendations for strategy specification; and the Wiltsey Stirman et al.’s (13) framework for treatment adaptations. Two research specialists (CWB, KM) trained by the study PI conducted dual independent coding and met weekly to resolve discrepancies. If consensus could not be reached among the two coders, the PI made a final decision. The resulting coded data offered a characterization of the reported activities based on alignment with implementation strategy reporting recommendations (10) and the treatment adaptation framework (12) (Table 3).
The first author and second author (LP), a sociologist with expertise in qualitative methods, conducted dual independent coding of all interview transcripts. A code book containing a priori codes based on the semi-structured interview guide was developed and iteratively refined with emergent codes added throughout the process. Code domains mapped onto the interview guide (Supplemental file 4). The coders held weekly consensus meetings to resolve discrepancies and reach agreement on additional emergent codes (29). Assigned codes were entered into ATLAS.ti (Version 7.1; (30)) and code reports were obtained and analyzed for main themes and illustrative quotes.