Setting and Participants
Setting.Schools implementing one of two evidence-based interventions (n = 39 SW-PBIS; n = 13 PATHS) were eligible and recruited for participation, resulting in 441 teachers from 52 elementary schools in 6 school districts in Washington, Ohio, and Illinois. The average racial/ethnic and socioeconomic composition of students across schools was 66% Non-White (range 21% to 100%) and 57% low-income status (range 4% to 100%), respectively.
Teacher-level demographics.On average, 9 teachers per school were recruited to complete measures. Most teachers were female, had at least a master’s degree, had an average of 11.6 years of experience, and were predominately White (see Table 1 for complete demographic information). The number of participants included in analyses was sometimes less than 441 due to missing data (< 5% overall).
Procedures
This study was part of a large-scale, federally funded measure adaptation and development project with the aim of creating school-adapted organizational assessments. Prior to conducting the current study, the original SILS was adapted for use in schools through (1) input from research and practice experts during a structured in-person convening and (2) mixed-methods focus group sessions with key educator stakeholder groups (central district administrators, principals, teachers) [67]. Adaptations included changing item wording to ensure construct equivalence for the target respondents (i.e., school-based practitioners) and deleting or expanding items and item content to ensure contextual appropriateness to schools [70]. An effort was made to preserve the integrity of the original items and constructs as much as possible [71]. Expansion included developing items to address additional constructs in subscales focused on leaders’ (a) Communication, (b) Organizational Vision/Mission, and (c) Availability to support EBP implementation.
Human subjects approval was obtained from the University of Washington Institutional Review Board and participating school districts’ research and evaluation departments, when applicable. School recruitment was done in collaboration with central administrators. Project benefits and data collection procedures were communicated to site-based administrators. Teachers from each school (n = 4-10) were then recruited by school administrators or a site-based liaison. Contact information was provided by teachers, which research staff used to establish and maintain project communications (e.g., send survey links).
Data were collected in Fall of the 2017 academic year. In November, teachers were sent an initial email to provide a project overview, obtain informed consent, and provide a link to the online survey. Upon receiving the initial email, teachers had one-month to complete the online survey. Weekly email reminders were sent to increase the response rates at each school. Across all participating schools, an average of 88% of respondents who were sent emails (n = 500) subsequently completed the online survey.
Measures
School Implementation Leadership Scales (SILS). The original ILS [9] and original SILS adaptation [66] are 12-item instruments developed to assess strategic leadership for EBP implementation. All ILS items are scored on a four-point Likert scale ranging from 0 (“not at all”) to 4 (“to a very great extent”). Both versions have previously supported a factor structure with four first-order factors (proactive leadership, knowledgeable leadership, supportive leadership, perseverant leadership) – each with three items – loading onto an overarching Implementation Leadership latent factor [26,66]. As described above, the present study adapted the original SILS based on expert feedback, adding items for three new subscales (Communication, Vision, Available). Eighteen additional items were initially developed for the new subscales and to augment the existing subscales with contextually appropriate items. This resulted in an initial 30-item revised SILS measure. Item reduction procedures along with reliability and validity data are reported in the Results. In addition, two versions of the adapted SILS were created, which included different referents. In one version, items referenced EBP generally (e.g., “Our principal is knowledgeable about evidence-based practice”). In the other, items referenced the specific EBP being implemented (e.g., “Our principal is knowledgeable about SW-PBIS”). Multigroup models were examined to determine whether the underlying factor structure was invariant across these two referents (see Results).
Multifactor Leadership Questionnaire (MLQ). The MLQ, a widely used measure of organizational leadership [20], was included to assess SILS convergent validity. Only the transformational and transactional leadership subscales were used in the present study. Transformational leadership is measured via five subscales: intellectual stimulation, inspirational motivation, individualized consideration, idealized behaviors, idealized attributes. Two subscales comprise transactional leadership (contingent reward, management-by-exception active). The MLQ and its subscales have previously demonstrated strong psychometric properties (72,73)]. Internal consistency for subscales and scale scores in the current study were acceptable and as follows: intellectual stimulation (α = .88), inspirational motivation (α = .89), individualized consideration (α = .80), idealized behaviors (α = .84), idealized attributes (α = .84), transformational leadership total score (α = .91), contingent rewards (α = .78), management-by-exception active (α = .79).
Public School Teacher Questionnaire. The Public School Teacher Questionnaire (PSTQ), included for decades as part of the Schools and Staff Survey conducted by the National Center on Educational Statistics [74], was prioritized in the present study for purposes of divergent validity as a measure of teachers’ general attitudes toward teaching. Respondents used a four-point Likert scale ranging from strongly disagree to strongly agree to rate 9 items that assess different attitudes toward the teaching profession (e.g., “The teaching profession is something that I enjoy and feel competent doing”). The PTSQ has demonstrated acceptable psychometric properties in extant research [75], as well as in the present study (α = .81).
Data Analytic Approach
Several methodological approaches were employed to establish construct validity. Although this study did not have sufficient higher-level units (i.e., schools) to examine a multi-level CFA, ICCs for SILS subscales provide evidence that 30%-45% of the variability existed between schools, which is the level at which the construct theoretically resides. A series of confirmatory factor analyses (CFA) were examined in Mplus [76] specifying weighted least squares means and variances (WLSMV) estimation with delta parameterization for the order-categorical scale items. Model fit was assessed using several indices including chi-square test statistics, comparative fit index (CFI) [77], the Tucker-Lewis index (TLI) [78], the root mean square error of approximation (RMSEA) [79,80], and the weighted root mean square residual (WRMR) [81]. CFI and TLI values greater than .95 and RMSEA less than or equal to .05 indicate a model well fit to the data. WRMR is a more recently developed fit index and has the benefit of measuring the reduction of residuals in measurement and structural models that leverage ordinal data. Despite limited research on empirically justifiable cutoffs, there is consensus that values below .90 are preferable [81]. Standardized factor loadings (β) less than .55 were considered low and flagged for further examination [82].
Two measurement models were examined. The first included only first-order factors modeling exogenous, but correlated SILS subscales. The second model tested a second-order factor structure in which all first-order factors were then assumed to load onto the higher-order Implementation Leadership factor. Each of these models were tested twice – once prior to and once post item reduction (see description below). If the first-order factors appreciably load onto the higher-order factor, the second-order factor structure would be prioritized in alignment with this study’s driving theory, measurement development process, and goal of producing a brief yet comprehensive measure of a school’s strategic implementation leadership supportive of EBP implementation.
The initial CFAs were intended to provide evidence of the underlying measurement structure of the SILS. Once established, item characteristics curves were evaluated to narrow SILS items to those most representative of each subscale [83]. Item coverage and redundancy of information were assessed to reduce the number of items for each subscale to three, as the fewest items necessary is a recommended criterion for pragmatic measures [29]. Note that one subscale (Proactive) included only three items and so was not subjected to item reduction. Using the reduced item version of SILS, we then tested both CFA models again and recalculated internal consistency estimates. Next, multigroup modeling was used to determine whether the underlying factor structure of SILS was invariant across versions of the scale employing general versus specific EBP item referents. Because the chi-square difference test is heavily influenced by sample size [84], two additional statistics were used to examine invariance across survey type. Cochran’s Q statistic [85] was used to determine the difference in magnitude between factor loadings of the two survey types, whereas d(Cox) was used to assess the difference in magnitude between thresholds. Q statistics that cluster around zero indicate no substantive difference between factor loadings. There are not agreed upon cutpoints for d(Cox). Because d(Cox) ranges from 0-1, we employed a decision rule in line with similar effect sizes [86] such that values greater than .50 would be flagged as a moderate difference between thresholds of the two survey types that would require more thorough investigation.
Convergent and divergent validity were assessed via correlations between SILS subscales and select measures that were theoretically hypothesized to yield small-to-moderate (convergent) or no (divergent) association. Specifically, correlations between SILS subscales and correlations between SILS and MLQ subscales were examined to establish convergent validity. The SILS subscales theoretically measure a unitary construct and as such the inter-subscale correlations were anticipated to be moderate-to-large. Correlations between SILS and all MLQ subscales except for Management-by-Exception were also expected to be moderate-to-large, but smaller than the SILS inter-subscale correlations. Management-by-Exception was anticipated to either be minimally or uncorrelated with SILS subscales. Divergent validity was similarly assessed via correlations, but between SILS subscales and both the PSTQ total score and school-level demographic characteristics. While the SILS and PTSQ are intended to measure different traits, they share the same assessment method (teacher reports) which makes it likely the two measures would share low-to-moderate correlations [87]. Some school-level demographic characteristics might influence teachers’ views of, experience with, and implementation of EBPs. As such, we hypothesized null-to-low correlations between SILS subscales and school-level demographics.