We preregistered this review at the Open Science Framework (OSF) on April 12, 2021 https://doi.org/10.17605/OSF.IO/4CAFQ. The review adhered to the checklist of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)41.
Eligibility criteria
Eligible studies had to meet the following criteria: (1) Population: We included studies with non-clinical populations. Studies with clinical populations were excluded. (2) Intervention: We included OLP interventions regardless of their specific application. (3) Comparison: We considered either a no-treatment control condition (NT) or a hidden placebo (HP) control condition. (4) Outcome: We included studies that measured the efficacy of OLPs on any given scale. Since the aim of this study was to investigate the effect of OLPs on a meta-level (i.e., across various outcomes), we did not apply restrictions to the types of outcomes. (5) Design: We included RCTs and excluded all other study designs.
Information sources and search strategy
We screened five electronic bibliographic databases comprising all entries from the database inception to April 15, 2021. We did not apply any language restrictions. We searched for studies using Medline via PubMed (1965 to April 15, 2021), PsycINFO via EBSCO (1967 to April 15, 2021), PSYNDEX via EBSCO (1977 to April 15, 2021), Web of Science Core Collection (1945 to April 15, 2021), and The Cochrane Central Register of Controlled Trials (CENTRAL, The Cochrane Library, Wiley), Issue 4 of April 12, 2021. Due to its composite nature, CENTRAL does not have an inception date. However, we did not apply any date restrictions and used the latest issue available. In addition, we screened the Journal of interdisciplinary placebo studies DATABASE (JIPS, https://jips.online/).
We used a search strategy similar to von Wernsdorff et al.11. The search terms served the purpose of describing the OLP intervention in more detail. Therefore, in addition to terms such as "placebo", we used synonyms for "open-label", such as “non blind” or “without deception”. Since the aim of this study was to investigate the effect of OLPs on a meta-level, we did not specify outcomes and control conditions in the search strings. In addition, we used wildcards and variant forms of spelling to find as many studies as possible. The search strings are in Supplementary Tables S6-S10. Slight variations between the search strings are due to different proximity operators among the databases. We compiled all records identified in the databases in the reference management software Zotero 5.0.96.2 (Corporation for Digital Scholarship, Vienna, Virginia) and removed duplicates. We conducted both backward and forward citation searches of all included studies and important reviews on OLPs11,42 using Web of science and PsychINFO.
Study selection and data extraction
Two researchers (LS and PDS) independently screened titles, abstracts, and full texts for inclusion. Title and abstract screening were carried out using the systematic review software Rayyan (Rayyan QCRI, Doha, Qatar). Disagreements were resolved through discussion. If no consensus could be reached, JCF and SS were consulted. The chance-corrected agreement between raters after the full text screening was substantial (κ = 0.62). In cases where eligible studies did not report the necessary information to compute effect sizes, we contacted the authors of the studies. If the authors did not respond or were unable to provide the data, these studies were excluded.
The same two researchers, who selected the studies, independently extracted the data. Again, disagreements were resolved through discussion. We extracted data on: author, year, country of trial, study design, sample size, control condition, intervention characteristics, the exact wording of instructions given to the participants, as well as the type and number of outcomes into a spreadsheet. For outcomes, we extracted the means, sample sizes, and standard deviations. If reported, this was done for change scores, otherwise for both pre- and post-intervention scores. For studies where only the standard error was reported, we transformed the standard error into the standard deviation according to the procedure outlined in the Cochrane Handbook43.
As stated in the preregistration, we extracted the primary outcome as specified in the individual studies. If multiple primary outcomes were specified in the individual studies, we extracted all primary outcomes. If no outcome was designated as primary, we extracted all outcomes. For studies that included multiple control conditions, we only extracted data on the OLP condition and the corresponding comparator (i.e., NT or HP).
Study risk of bias assessment
We used the revised Cochrane risk of bias tool for randomized trials (RoB 2) to assess the risk of bias in primary studies. Five domains of bias are assessed using the RoB 2, namely biases arising from (1) the randomization process, (2) deviations from intended interventions, (3) missing outcome data, (4) measurement of the outcome, and (5) selection of the reported result44. Ratings for each domain range from “low risk of bias”, to “some concerns”, to “high risk of bias”. Finally, the ratings of the individual domains are aggregated into an overall rating, which in most cases is equivalent to the worst rating in any of the domains44.
Given the specific context of OLPs, we agree with von Wernsdorff et al.11 that a lack of blinding of participants should not result in an increased risk of bias rating. They argue that knowledge of one's group assignment is imperative and cannot be separated from the placebo effect in this particular intervention. Thus, we decided to rate the risk of bias in the domains (2) and (4) (i.e., the risk of bias due to unblinding) as not worse than “some concerns”. Risk of bias assessments were carried out independently by LS and PDS, with discrepancies resolved through discussion with JCF and SS.
Data synthesis and analyses
Since knowledge of the received intervention might influence self-reported outcomes, we conducted two separate meta-analyses, one for self-reported outcomes and one for objectively recorded outcomes (i.e., physiological or behavioral variables). The meta-analyses were conducted using the meta package of R, version R 4.0.3. Since all studies reported continuous data, we chose the standardized mean difference (SMD) as the summary outcome. We used Hedges’ g, which corrects for small sample bias45. When both pre- and post-intervention values were reported, we first calculated change scores by subtracting pre- from post-intervention scores. We then standardized the difference in change scores between groups using the pooled pre-intervention SD to calculate the corresponding SMDs.
If there were multiple outcomes within one study, we calculated SMDs for all of these outcomes and averaged them46,47. This approach ensured that there was no bias due to selective choice of outcome depending on effect size and conformity to the hypothesis.
When there were multiple OLP conditions within a study, we proceeded as follows: Our primary goal was to obtain the maximum OLP effect that could be realized experimentally. Since we assumed that suggestive instructions would amplify the placebo effect, we always chose whichever condition was most suggestive. This was operationalized by selecting the condition where most of the instructional statements from Kaptchuk et al.12 were utilized. Kaptchuk et al.12 were among the first to conduct a clinical trial of OLP and used a rationale (i.e., statements explaining the placebo effect) of four statements with positive framing to optimize placebo response. These statements imply 1) that the placebo effect is powerful, 2) that the body is automatically responding to placebos, 3) that it does not require a positive attitude, and 4) that taking the placebo faithfully is crucial. This or similar rationales were applied by many other researchers11.
Studies with crossover designs were not included in the meta-analyses as the parameters required for the computation of the effect sizes were not reported. An alternative approach of analyzing crossover studies is to handle study groups as if they were parallel groups. However, this approach is not recommended by Cochrane as this may lead to a unit-of-analysis error48.
Once the effects of the individual studies were calculated, they were aggregated into an overall SMD. We employed a random effects model by applying the inverse-variance weighting method45. To correct for differences in the direction of the scale, the means of some studies were multiplied by -148. Heterogeneity between studies was assessed using the chi-square test and the I² statistic. I² values above 25% are interpreted as low, above 50% as moderate, and above 75% as high heterogeneity49.
We conducted subgroup analyses to examine the influence of the suggestiveness of the instructions on the efficacy of OLPs. To assess the extent of the suggestiveness of the instructions in OLPs, we developed a tool based on the four statements applied by Kaptchuk et al12. These statements are given along with the administration of the open-label placebos. However, the placebos in most experimental studies included in our review were administered only once and under the supervision of an experimenter. Therefore, we omitted the fourth statement and formed four subgroups depending on the number of statements utilized in the instructions (ranging from 0 = “no statement utilized” to 3 = “all statements utilized”), with higher values indicating greater suggestiveness. We believe this approach to be reasonable, as many studies investigating OLPs have adopted the instructions from Kaptchuk et al.12 and varied the number of statements implemented in the instructions. For the subgroup analyses, we first calculated the pooled effect for each subgroup and then used a Q-test to examine whether effect sizes differed between subgroups50.
We also conducted exploratory subgroup analyses to examine whether the efficacy of OLPs differed depending on the control condition used (i.e., NT or HP). We used the same statistical procedures as before. However, these analyses were specified a posteriori and therefore not reported in the preregistration.
All tests were two-tailed.
Reporting bias assessment
We assessed publication bias by visually inspecting funnel plots for asymmetry. In funnel plots, the SMDs of the individual studies are plotted against their standard error. In addition, we carried out a statistical assessment of funnel plot asymmetry using Egger's regression test, which regresses the SMDs against their standard error51. We did not assess the risk for time-lag bias, as research on OLPs is in its early stages and the interest in non-clinical, healthy populations has arisen only recently.
Certainty assessment
We used the Grading of Recommendations Assessment, Development and Evaluation (GRADE)52 approach to assess the overall quality of the evidence. At the beginning of the assessment process, the overall quality of an RCT is rated as high and can subsequently be down- or upgraded based on eight dimensions: (1) risk of bias, (2) inconsistency, (3) imprecision, (4) indirectness, (5) publication bias, (6) dose response, (7) large effects, and (8) confounding. Based on the ratings of each dimension, the overall quality of evidence is rated as “high”, “moderate”, “low”, or “very low”. GRADE is performed for specific outcomes. However, due to the large number of different outcomes, we decided to form five clusters, in which similar outcomes were grouped together: self-reported pain, objective pain, self-reported positive well-being, self-reported distress, and physiological outcomes. For physiological outcomes, we formed three sub-clusters, each containing a single study, to account for the heterogeneity in physiological outcomes. In our approach, a study may be represented in several clusters due to different outcome variables, but in each cluster only once. Assessments were conducted by two independent raters (LS and PDS), with discrepancies resolved through discussion.