We followed the same process we used in previous meta-epidemiological studies of RGBs in childhood obesity and adult obesity studies.(5, 12) Our methods were informed by the Cochrane Handbook for Systematic Review of Interventions(13) and are reported, where applicable, according to the Preferred Reporting Items for Systematic review and Meta-Analysis (PRISMA) Scoping Review and Abstract Extension statements (Additional File 1).(14, 15) Consistent with our prior work, a preliminary health behavior intervention represents an initial evaluation of a behavior-focused health intervention with primary goals to test the feasibility, acceptability, preliminary efficacy (or effect sizes) or other developmental features of the intervention.(5) Health behavior interventions were defined as coordinated sets of activities that aim to promote a health behavior by targeting one or more levels of influence, including interpersonal, intrapersonal, policy, community, macro-environments, micro-environments, and institutions.(16-19)
Data Sources & Search Strategy
Our team used the following procedures to identify pairs of preliminary studies and their subsequent larger trial of the same or similar behavior intervention addressing tobacco use disorder, alcohol use disorder, interpersonal violence, or behaviors related to increased sexually transmitted infections. In Step 1, we used controlled vocabulary terms (e.g., MeSH and Emtree), free-text terms, and Boolean operators to identify systematic reviews and/or meta-analysis across OVID Medline/PubMed; Embase/Elsevier; EBSCOhost; and Web of Science databases.(20) The search strategy and syntax are provided in Additional File 2. In Step 2, our team uploaded identified systematic reviews and/or meta-analysis into an EndNote Library (v. X9.2) where they were reviewed by at least one trained research assistant (LV, KR) prior to retrieving full-text articles of all included studies within each review. In step 3, we retrieved the full-text articles included within each systematic review and/or meta-analysis uploaded them into NVivo (v.12, Doncaster, Australia). NVivo text search query was used to identify each study included within each systematic review and/or meta-analysis as either a (1) self-identified preliminary testing of an intervention (e.g., contained the words “pilot”, “feasibility,” “preliminary,” “proof-of-concept,” “vanguard,” “novel”, or “evidentiary” (16, 21, 22) or (2) a larger-scale trial referring to prior preliminary work to flag sections of text (e.g., “protocol” “previously”, “rationale”, “elsewhere described”, “prior work”, “informed by”). In Step 4, we used forward and backward citation searches to pair studies. Studies identified as large-scale trials were “followed back” using the references in the publication to identify preliminary testing and publication of an intervention within the body of the article. Studies identified as preliminary studies were “followed forward” using the Web of Science Reference Search interface (e.g., identify subsequent published studies referencing the identified preliminary study as informative preliminary work). Successfully paired preliminary studies and large-scale trials were catalogued in Excel (Microsoft) and referred to as ‘study pairs.
Inclusion/Exclusion Criteria
Included pairs had to contain at least one self-identifying preliminary study and one larger-scale trial of the same or refined intervention. Studies had to be published in indexed, refereed journals as verified by Ulrich's Web (http://ulrichsweb.serialssolutions.com). Studies had to be available as a full-text article in English. No participant age requirements or date boundaries were applied. To be included in our analysis, study pairs had to report the following: point estimates and measures of variance for the outcomes (e.g., SD, SE, 95%CI). Preliminary studies reporting only feasibility data (e.g., attendance, adherence, acceptability) could not be included in the quantitative analysis because they did not provide the necessary data to calculate an effect size. Additionally, study pairs had to present one shared outcome in both the preliminary study and the larger trial. For example, if a preliminary study reported intention to quit smoking and a larger trial reported quit rates, then these studies could not be used because the data reported in the larger trial could not be logically combined with the preliminary study’s data to produce consistent information about the phenomena of interest (e.g., the health behavior outcome - tobacco use). Where study pairs contained more than one eligible outcome, all outcomes were retained. Hierarchical models (see analysis) were used to account for lack of independence between outcomes from the same study pair.
Study Outcomes
To procure summary statistics comparable across all studies, outcomes reporting impact on health-related behaviors were extracted by the research team (LV, KR, MS, SB, CDP). Tobacco cessation rates (i.e., quit rates) were extracted from tobacco use disorder studies (e.g., 7-day point prevalence, exhaled carbon monoxide levels, self-reported quit rate). Drinking rates were extracted from alcohol use disorder studies (e.g., ASI Alcohol Composite score, units of alcohol consumed over the previous week). Measures of interpersonal functioning were extracted from studies targeting interpersonal violence (e.g., nonviolent discipline, physical victimization, child abuse potential inventory score). Measures representative of constructs associated with reduced infection rates (e.g., transmission risk behaviors, medication adherence, psychosocial functioning) were extracted from studies on behaviors related to increased sexually transmitted infection.
Coding Risk of Generalizability Biases
At least two reviewers (LV, MB, SB) independently reviewed each study pair to identify the presence/absence of RGBs using definitions presented in prior work (Table 1).(5, 12) Where discrepancies in coding RGBs occurred between reviewers, a third reviewer was consulted, and agreement was reached by discussion. Within each study pair, each RGB could be classified as not present, present in the preliminary study only, or present in preliminary study and larger trial (i.e., carried forward). Intervention duration, intervention intensity, and measurement bias are biases describing difference between preliminary study features and larger trials and, if present, were coded as present in both the preliminary study and larger trial. For studies where an RGB was present in the larger trial but not the preliminary study, for example, where implementation support was provided in the larger trial but was not mentioned in the preliminary study, it was assumed to have also been provided in the preliminary study, even if noy explicitly mentioned ad was coded in both the preliminary study and larger trial.
Analytic Procedures
Consistent with previous studies,(5, 6) our research team extracted outcomes reported across pairs and entered them into an Excel file (Microsoft). In Excel, effect sizes were corrected for differences in the direction of the scales so that positive effect sizes corresponded to improvements in health behaviors in the intervention group. This was done for the simplicity of interpretive purposes so that all effect sizes could be summarized and compared within and across studies. We performed all necessary data transformations in Excel (e.g., standard errors and confidence intervals transformed into standard deviations). Next, outcomes reported within pairs were transferred into Comprehensive Meta-Analysis software (Biostat Inc., v3.3.07) to calculate the standardized mean difference (SMD) for each study. After effects were calculated, the complete data file was exported as a .CSV and uploaded into STATA 16 (SE, StataCorp) for analysis (LV, MB).
The natural hierarchical structure of the data is effects (Level 1) nested within studies (Level 2), which are nested within pairs (Level 3). However, three-level meta-regression models, to the best of our knowledge, have not yet been created and tested and we had to utilize two-level meta-regressions for all estimates. A random-effects meta-regression model with robust variance were used to compare the change in SMD (column labeled “ΔSDM” in Additional File 3). For these models, estimates of ΔSDM were nested within study pairs because, the change in effect size is a attribute of a study pair, not a single study. Random-effects meta-regression model with robust variance were also used to generate summative effect estimates for preliminary studies and larger trials (columns labeled “Preliminary Studies” and “Larger Trials” in Additional File 3), though effects were nested within a study because for the purpose of the model, an effect(s) is a property of a study independent of the existence of a study pair. These models were repeated for all levels of each RGB such that each row and column in Additional File 3 represents a single model.
The difference in the SMD from the preliminary and larger scale trial were quantified according to previously defined formulas for the scale-up penalty.(4, 23, 24) This was calculated as: the SMD of the larger-scale trial divided by the SMD of the preliminary study and multiplied by 100. A value of 100% indicated identical SMDs in both the preliminary and larger-scale trial. A value of 50% indicated the larger-scale trial was half as effective as the preliminary study; a value above 100% indicated the larger-scale trial was more effective than the preliminary, whereas a negative value indicated the direction of the effect in the larger-scale trial was opposite of the preliminary. In line with prior work, a secondary evaluation of the impact of the biases was performed examining whether the presence/absence of biases was associated with nominally statistically significant outcomes (i.e., p ≤ 0.05) in the larger-scale trials.