This review of school-based policy implementation measures was conducted with a similar protocol from a broad review of health policy implementation tools (36). Both reviews followed procedures conducting a systematic review of implementation measurement tools (40), and adhered to PRISMA reporting guidelines (see Figure 1 and Supplemental Table S1) (41). The review was guided by three prominent D&I frameworks: the Implementation Outcomes Framework (IOF) by Proctor and colleagues (32); the Consolidated Framework for Implementation Research (CFIR) by Damschroder and colleagues (37); and the Policy Implementation Determinants Framework by Bullock and Davis (42, 43). Through a combination of constructs from these frameworks, we sought to gain a deeper understanding of the implementation outcomes, determinants, and processes for school health policy implementation which are assessed through measurement tools. The same definitions of public policy and policy implementation were utilized in accordance with the review by Allen et al. (36). Specifically, public policy includes federal/nation, state/province/county, regional unit, or local level legislation or policies mandated by governmental agencies (44, 45). The implementation of policy conceptualizes the processes by which the mandate is carried out by public or private organizations (45). For the purpose of this review, the organizations of interest comprised states/provinces, school districts, and primary and secondary pre-university schools as implementing sites.
Searches
We searched six databases in April 2019 and again in August 2020 to ensure inclusion of recent articles in the present review: MEDLINE, PsycInfo, and CINAHL Plus through EBSCO; and PAIS, Worldwide Political, and ERIC through ProQuest. We searched terms at four domains: health, public policy, implementation, and measurement; see Supplemental Table S2 for search terms and syntax. Development of the search strings and terms was based on frameworks in D&I and policy research, with details previously described (36).
Inclusion and exclusion criteria
The inclusion criteria comprised English-language peer-reviewed journal articles published from January 1995 through August 2020 and utilized quantitative self-report, observational, and/or archival tools to assess implementation of a government-mandated policy (36). The broad review conducted in 2019 included empiric studies from any continent on policy implementation in any clinical or non-clinical setting on a broad range of health policy topics. Exclusion criteria can be found in Supplementary Table S3. Specific deviations from inclusion/exclusion criteria in the Allen et al. article were: 1) research must have taken place in/with school settings serving students in primary and secondary (ages 5-18; pre-university) schools; 2) measured implementation of school policies already passed or approved that addressed overall wellness, tobacco, physical activity, nutrition, obesity prevention, and mental health/bullying/social-emotional learning; and 3) policy-specific and setting-specific measures were included in the present review but excluded in the initial broad review (which sought generalizable measures that could be applied across multiple settings and topics). Our review included multi-item measures; articles were excluded if the tool included only one relevant item.
Screening
Two members of the research team used Covidence systematic review software (46) to independently screen all abstracts for inclusion and exclusion. Full texts of all empiric studies of school setting public policy implementation that passed abstract screening in 2019 were rescreened independently in summer 2020 by two coauthors (GMM, PA) for potential inclusion into the present review, with decisions and exclusion reasons coded in Excel. The school-setting full text rescreening was conducted because the initial review had excluded measures worded specifically for a certain setting or policy topic, whereas such specific measures were included in the present review. The two coauthors also conducted dual independent full text screening of newly identified 2019-2020 studies that passed abstract screening after the August 2020 updated database searches. The two coauthors met to reach consensus on any inclusion/exclusion disagreements. A third coauthor was consulted if consensus could not be reached. One of the pre-identified exclusion reasons was attributed to each excluded article (for more information see PRISMA chart; Figure 1).
Extraction
A comprehensive extraction procedure was implemented in which co-author (GM, PA, CWB) pairs conducted dual non-independent extraction. A primary reviewer entered relevant information into the extraction database and the secondary reviewer checked data entry for accuracy and completeness. The co-authors met regularly to reach consensus. Information extracted on the measurement properties included: 1) type of measurement tool (i.e., survey, archival, observation); 2) implementation setting (i.e., elementary/primary, middle, high/secondary school, combination of two or more levels); 3) school policy topic (i.e., wellness, physical activity, nutrition, mental health, tobacco, sun safety, etc.); and 4) level of educational entity directing implementation of the governmental mandate (i.e., school, district, state/province, national). Following the three chosen D&I frameworks, all implementation outcomes from the Proctor framework were extracted from measures, followed by selected CFIR constructs which were used in the previous review article and found to be pertinent to policy implementation, and the actor relations/networks and actor context domains from the Bullock and Davis framework. Finally, following the procedures outlined by Lewis and colleagues regarding the Psychometric and Pragmatic Evidence Rating Scale (PAPERS) (40, 47-51), pragmatic (i.e., brevity, cost, readability, training, interpretation) and psychometric (i.e., internal consistency, validity, norms) properties were extracted from each measure to ascertain the quality of each tool. These scoring classifications assign scores from -1 to 4 based on the degree to which the measures meet each criterion; higher scores on each construct reflect higher quality of the measurement tool (Supplemental Tables S4, S5).
Data synthesis
Upon achieving consensus on all measures, descriptive analyses were run to gather frequency of items in each school health policy topic. A subset of tools were widely used and/or based on national samples: the Centers for Disease Control and Prevention School Health Policies and Practices Study (school, district, state) (52); the Wellness School Assessment Tool (53); the Maryland Wellness Policies and Practices Project surveys (school and district level) (54); and the Health Enhancing Physical Activity Europe policy audit (55). We term these “large-scale” tools. Other less frequently reported measures with smaller sample sizes were called “unique tools.” Where appropriate, these measures were analyzed and presented separately when reporting characteristics, given the distinctive differences in methodology and utilization.