This systematic review and meta-analysis utilized the same databases and search terms as Spille et al. (2023)7 and von Wernsdorff et al. (2021)9. The study adhered to the revised Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement14 (Supplemental digital appendix A1). We pre-registered the study with PROSPERO (#ANONYMIZED) prior to data collection.
Eligibility criteria
We included randomized controlled trials (RCTs) that investigated the effects of OLPs on various health-related outcomes in both clinical and non-clinical populations. The OLP intervention could include any form of intert substance without pharmacological activity, such as placebo pills, capsules, nasal sprays, dermatological creams, injections, acupuncture, or verbal suggestions15. It was essential that the OLP was administered transparently, with recipients fully informed that they were receiving a placebo6,15. OLPs had to be compared with one of the following control conditions: CP, NT, TAU, or WL control condition. We included only trials reporting outcomes on a continuous scale, whether self-report (e.g., self-rated questionnaire) or objective measure (i.e., physiological, or behavioral variables). In accordance with the Cochrane Handbook for Systematic Reviews16, we included crossover trials only if data from the initial phase of the trial (i.e., prior to the crossover) were available. We contacted authors, if this data was not reported. If the authors could not provide the data or did not respond, we excluded the trial from the analyses.
Information sources and search strategy
On November 9, 2023, we conducted a comprehensive systematic review in multiple databases: EMBASE via Elsevier, MEDLINE via PubMed, APA PsycINFO and PSYNDEX Literature with PSYNDEX Tests via EBSCO, the Web of Science Core Collection, and the most recent edition of the Cochrane Central Register of Controlled Trials (CENTRAL, The Cochrane Library, Wiley). We used the search terms identical to those employed by von Wernsdorff et al. (2021)9 and Spille et al. (2023)7, focusing on variants of ‘open-label placebo’ (e.g., ‘placebo’, ‘open-label’, ‘non-blind’, and ‘without deception’). The number of hits for each search term in these databases is provided in Supplemental digital appendix A2. We restricted our search to publications from January 2020 onward (i.e., the search date of von Wernsdorff et al., 2021). For non-clinical samples, we included trials published from April 15, 2021 (i.e., the search data of Spille et al., 2023). Additionally, we screened all entries in the Journal of Interdisciplinary placebo Studies DATABASE (JIPS, https://jips.online/) from January 2020, as well as the complete database in the Program in Placebo Studies & Therapeutic Encounter (PiPS, http://programinplacebostudies.org/), using keywords in the publication titles (e.g. ‘analgesia’, ‘expectation’, ‘non-deceptive’, ‘open-label’, ‘placebo’, ‘suggestion’). No restrictions were applied regarding the language of publication or the age of participants.
Selection process
Results from the literature databases, including hits from the JIPS and PiPS databases, were exported into the systematic review management software Rayyan17. Duplicates were removed using Rayyan’s semi-automatized duplicate-detection feature. Two researchers (#ANONYMIZED and #ANONYMIZED) independently assessed study eligility. First, they screened all titles and abstracts, followed by a full-text assessment of reports deemed potentially eligible in the first stage by one of the investigators. Disagreements were resolved through discussion, with the supervision of two additional researchers (#ANONYMIZED and #ANONYMIZED).
Data items and collection process
Two researchers (#ANONYMIZED and #ANONYMIZED) independently extracted data from the included studies in a standardized Excel form, which was piloted with three records. Data extraction covered the following areas: study details (i.e., author, year, title), sample characteristics (i.e., type of population, population size, distribution of participants), intervention and control conditions (e.g., pill, cream, spray, duration), outcomes (i.e., baseline, post-intervention or change scores), and outcome form (i.e., self-report and objective). We extracted data for the outcome defined as primary outcome in the respective report. If a primary outcome was not specified, we extracted all outcomes related to the OLP intervention to minimize bias from selective outcome choice based on effect size and hypothesis fit18. In trials where baseline outcome scores were unavailable prior to experimental exposure, we extracted post-intervention values only.
Participants were classified into clinical populations if they met a medical condition (e.g., allergic rhinitis, cancer-related fatigue, chronic low back pain, menopausal hot flashes) or a mental disorder (e.g., major depressive disorder), as diagnosed by a clinician or psychologist19. Those classified as non-clinical populations were generally healthy individuals. Subclinical traits, such as test anxiety or low levels of well-being, did not qualify participants for the clinical population and were therefore categorized as non-clinical. The degree of suggestiveness of the treatment rationale was independently evaluated by two researchers (#ANONYMIZED and #ANONYMIZED) based on the description of the OLP administration. Rationals of OLP interventions featuring one or more elements from Kaptchuk et al. (2010, see Introduction)12 were rated as having a ‘high’ degree of suggestiveness, while those lacking elements of suggestive expectation induction were rated as having a ‘low’ degree of suggestiveness. In studies with different treatment rationales across separate intervention groups, both groups were extracted as distinct trials and coded accordingly as OLP+ (‘high’ degree of suggestiveness) and OLP- (‘low’ degree of suggestiveness). To ensure comparability across clinical trials, all control conditions (i.e., NT, TAU, WL, and CP) were checked for compliance with given definitions of comparators20. Thus, NT refers to a condition in which no alternative treatment is provided, while TAU included access to standard treatment practices for the condition. If participants were offered an OLP intervention following the OLP group, the control condition was labeled as a WL condition. For non-clinical trials, NT refers no intervention, and CP involved the same physical treatment as the OLP condition, but with a rationale designed to avoid creating specific expectations regarding the outcome7. Missing values were addressed by contacting the authors. If the authors did not responed or could not provide the data, the study was excluded. All extracted data were cross-checked using Excel's data validation feature. Discrepancies were resolved through discussion and consensus, supervised by (#ANONYMIZED and #ANONYMIZED).
The study at hand is an extended update of the reviews by von Wernsdorff et al.9 and by Spille et al (2021)7. Therefore, one reviewer (#ANONYMIZED) re-evaluated and extracted data form the studies included in these reviews. In cases of discrepancies regarding selection and extraction due to methodological differences (e.g., von Wernsdorff et al.9 extracted consistently post-intervention values only), a second reviewer (#ANONYMIZED ) independently re-assessed these reports. In cases where changes were necessary, both reviewers (#ANONYMIZED and #ANONYMIZED) independently extracted any additional data. The same data cross-checking process, as illustrated above, was applied to ensure accuracy.
Risk of bias assessment
We assessed the risk of bias of the included trials using the Cochrane risk of bias tool (RoB 2.0)21. The RoB 2 assesses bias arising from the randomization process (domain 1), deviations from intended interventions (domain 2), missing outcome data (domain 3), measurement of the outcome (domain 4), and selection of the reported result (domain 5)21. The results of the five domains are aggregated into an overall risk of bias rating, which is equivalent to the worst rating in any of the domains. In line with the previous meta-analyses on OLPs7,9,10, we applied the same special rules to the RoB 2 to account for the unblinded nature of OLPs. When a rating of a ‘high’ risk of bias in domain 4 resulted only due to the signaling question 4.5 (‘Is it likely that the assessment of the outcome was influenced by knowledge of the intervention received?’), we overrode the suggestion of the algorithm for this domain and labeled it with ‘some concerns’. Since eliciting treatment expectation is a crucial mode of action of OLPs, the effect of knowing about the group allocation cannot be separated from the placebo or nocebo effect (i.e., excitement or disappointment respectively)9. Moreover, we would have lost all variance in the assessment, as all studies would have received a ‘high’ overall rating as consequence. The RoB 2 for the newly included studies was assessed by two researchers independently (#ANONYMIZED and #ANONYMIZED), with discrepancies resolved through discussion and consensus. The results from assessments of the studies included in previous versions of the review by Spille et al. (2023)7 and von Wernsdorff et al.(2021) 9 are presented in the Supplemental digital appendix A4.
Data synthesis and analysis
Statistical analyses were performed using R, version 4.3.222. To evaluate the effects of the OLP interventions, we calculated standardized mean differences (SMDs) by subtracting the mean pre-post change in the intervention group from the mean pre-post change in the control group and dividing the result by the pooled pre-intervention standard deviation23. For studies that reported only post scores, SMDs were calculated based on these values. Standard errors of the means were converted into SDs following the guidelines outlined in the Cochrane Handbook16. We used Hedges’ g which corrects for bias due to small sample sizes24. We interpreted values of 0.20, 0.50, and 0.80 as small, moderate, and large effect, respectively25. In studies without defined primary outcome26–50, we initially computed SMDs for each of the outcomes independently, which were then averaged to obtain an overall SMD estimate for the respective trial51. This was done by the aggregate function in R from the metafor package52, under the assumption of an intra-study correlation coefficient of ρ = 0.6 53. In studies where the type of administration of the OLP intervention varied (e.g., one group received OLP nasal spray and another intervention group received OLP pills)28,37,42,54,55, the mean pre- and post-values along with their SD were aggregated prior to data analysis, in accordance with the guidelines outlined in the Cochrane Handbook16.We aggregated the interventions groups of these trials as follows: for Barnes et al. (2019)55, we combined the ‘Semi-Open Label’ and ‘Fully-Open Label’ intervention groups; for El Brihi et al. (2019)33, we combined the groups receiving one and four OLP pills per day; for Kube et al. (2020)37, we aggregated the ‘OLP-H’ (hope) and ‘OLP-E’ (expectation) groups; for Olliges et al. (2022)42, we combined the ‘OLP pain’ and ‘OLP mood’ groups; and for Winkler et al. (2023)28, we merged the ‘OLP nasal active’, ‘OLP nasal passive’, and ‘OLP capsule’ groups In studies where the OLP rationale was manipulated accross two independent samples31,41,56, the SMDs were calculated separately for each intervention group (OLP + and OLP- respectively) and treated as two distinct trials. To prevent unity-of-analysis error due to double counting, in these cases the control group data was divided in half for each intervention group16. Similarly, if a trial contributed to more than one comparison (i.e., assessing both self-report and objective outcomes)26,30,35,37,41,48,57, again, the sample size was divided by the number of comparisons to prevent unit-of-analysis error due to double-counting. To ensure consistent interpretation of the direction of effects where a positive SMD value indicates a beneficial effect of the OLP intervention for recipients, the means of some studies were multiplied by -1, as outlined in the Cochrane Handbook16.
After aggregating all effect sizes within studies, we conducted a meta-analysis using the meta package in R58. Anticipating heterogeneity among trials, we employed a random-effects model with the inverse-variance weighting method59. We weighted the trials based on post-intervention sample sizes, which are more conservative than pre-intervention samples sizes. All tests were two-tailed. Heterogeneity among studies was assessed using the Q statistic and quantified with the I2 index as well as prediction intervals. The I2 values indicate the percentage of total variance between effects that is due to true effect variation and are interpreted as follows: 0 to 40% might not be important; 30 to 60% may represent moderate heterogeneity; 50 to 90% may represent substantial heterogeneity; and 75 to 100% is considered to constitute considerable heterogeneity16. The prediction intervals indicate true effect variation and represent the range into which the true effect size of all populations will fall60. If the prediction interval lies entirely on the positive side (i.e. does no include the zero), favoring the OLP intervention, it suggests that, despite variations in effect sizes, the OLP is likely to be beneficial; however, is important to note that broad prediction intervals are relatively common and reflect inherent variability in the data53.
Sensitivity analysis
We conducted four sensitivity analyses to examine the robustness of results. First, we excluded outliers utilizing the “non-overlapping confidence intervals” approach, which regards a comparison as outlier, if its 95% confidence interval of the effect size did not overlap with the 95% confidence interval of the pooled effect size53. Second, we excluded studies assessed with a ‘high’ risk of bias according to the Rob 2 assessment. We chose the criterion ‘high’, because almost all studies had at least a risk to ‘some concerns’ due to the lack of blinding of the OLP interventions and to self-reported outcomes. Third, we considered potential publication bias using Duval and Tweedie’s trim and fill procedure, which provides an estimate of the pooled effect size after adjusting for asymmetry in the funnel plot61. Fourth, we estimated the overall OLP effect using a hierarchical three-level meta-analytic model, with effect sizes nested in studies53.
Reporting bias assessment and certainty assessment
We assessed publication bias by creating a funnel plot, which plots effect estimates (SMDs) from individual studies against their standard error (SE). We visually inspected the funnel plot for asymmetry and conducted Egger’s regression test, which regresses the SMDs against their SE62.