Ethics approval. Approvals for these studies were obtained from the Institutional Review Boards at the University of Rochester or the University of Texas at Austin.
Pre-registration and efforts to curb researcher degrees of freedom. Studies 1 and 2 were preregistered (osf.io/tgysd; 55). Studies 3 and 4 limited degrees of freedom by following published and previously pre-registered standard operating procedures for TSST and daily diary studies conducted by the labs carrying out this research 45, (the focus on TPR, SV, and PEP in Study 3 and the focus on the stressor intensity ´ treatment interaction in Study 4). Study 5, which focused on anxiety symptoms during the COVID-19 lockdown, was not preregistered because the pandemic was unanticipated. Researcher degrees of freedom were limited by following the same analysis steps (covariates, moderators, and BCF modeling) as Studies 1-4 whenever possible.
Intervention overview. The intervention consisted of a single self-administered online session lasting approximately 30 minutes. Random assignment to the intervention or control condition occurred in real-time via the web-based software, as participants were completing the online intervention materials. Participants were blinded to the presence of different conditions, and teachers or others interacting with participants were blind to the intervention content and to condition assignment. Thus, the intervention experiments used a double-blind design throughout.
Synergistic mindsets intervention. The intervention used methods for mindset interventions that are well-established in the literature and have been used successfully in national scale-up studies 26. The intervention first aimed to convey the message that stressful events are controllable and potentially helpful. It did so by reducing negative fixed mindset beliefs, or the belief that intellectual ability is fixed and cannot change, which can lead to the appraisal that negative events are uncontrollable and harmful. In particular, the fixed mindset leads to a pattern of appraisals about effort (that having to try hard or ask for help means you lack ability), about causes of failures (the attribution that failure stems from low ability), and about the desired goal in a setting (the goal of not looking stupid in front of others) 39,71. The intervention overcame these negative patterns of appraisals by conveying the growth mindset. The growth mindset promotes the appraisal that difficulties can be controlled and helpful. It argues that most people who became good at something important had to face and overcome struggles, and therefore, your own struggles should not be viewed as signs of deficient abilities but instead should be viewed as part of your path toward important skill development. To justify the controllable/helpful stressor appraisal, the intervention drew on neuroscientific information about the brain’s potential to develop more efficient (i.e., “stronger”) connections when it faces and overcomes challenges, using the analogy of muscles growing stronger when they are subjected to rigorous exercise 29.
Second, the intervention aimed to reduce the stress-is-debilitating mindset 22, which is the belief that stress is inherently negative and compromises performance, health, and wellbeing; this mindset leads to the appraisal that a given stressor is uncontrollable and harmful. Counter to the stress-is-debilitating mindset, the intervention developed here introduced the stress-can-be-enhancing mindset22, which is the belief that stress can have beneficial effects on performance, health, and wellbeing; this more adaptive belief system leads to the appraisal that stressors can be potentially helpful and controlled. The intervention explained that when people undergo challenges, they inevitably begin to experience stress, which can manifest in a racing heart, sweaty palms, or possibly feelings of anxiety or worry. The intervention leads people to perceive those signals as information that the body is preparing to overcome the challenge, for instance by providing more oxygenated blood to the brain and the muscles 33. Thus, the stress response is framed as helpful for goal pursuit, not necessarily harmful. The intervention also argued that feelings of anxiety can be a sign that you have chosen a meaningful and ambitious set of goals to work on, and therefore can indicate a positive trajectory, not a negative one.
Importantly, these two mindsets were conveyed synergistically, not independently, so that they built on one another. Participants were encouraged to view struggles as potentially positive and worth engaging with, and then they were invited to view inevitable stress coming from this engagement as a part of the body’s natural way to help them overcome the stressor.
These mindset messages were couched within a summary of scientific research on human performance and stress. Participants were not simply informed of these facts, but they were instead invited to engage with them, make them their own, and plan how they could use them in the present and future. Participants heard stories from prior participants (older students in this case) who used these ideas to have success in important performance situations, and they also completed open-ended and expressive writing exercises. For instance, participants wrote about a time when they were worried about an upcoming stressor, and then later on they wrote advice for how someone else who might be undergoing a similar experience could use the two mindsets they learned about—which has been called a “saying-is-believing” writing exercise 72.
Control group content. The control group intervention was also an online, self-administered activity lasting ~30-minutes. It was designed to be relatively indistinguishable from the intervention group by using similar visual layout, fonts, colors, and images. The content was predominately from the control condition from a prior national growth mindset experiment 26, which included basic information about the brain and human memory. It also involved open-ended writing activities and stories from older students. However, the control condition did not make any claims about the malleability of intelligence. To this standard content we added basic information about the body’s stress response system (e.g., the sympathetic and parasympathetic nervous system and the HPA-axis) to control for the possibility that simply reflecting on stress and stress responses could account for the results. The latter content did not include any evaluations of whether stress responses are good or bad, or controllable or uncontrollable.
Negative prior mindsets. At baseline, participants in all experiments except Study 2 completed standard, three-item measures of negative event-focused mindsets (fixed mindset of intelligence, i.e., “Your intelligence is something about you that you can't change very much.”) 26 and response-focused mindsets (the stress-is-debilitating mindset 14, i.e. “The overall effect of stress on my life is negative.”) (for both, 1 = Strongly disagree, 6 = Strongly agree). In the primary Bayesian analyses, the two measures and their product were entered into the covariate and moderator function, and the machine-learning algorithm decided how best to use the mindset measures to optimize prediction or moderation. In the preliminary correlational analyses (Extended Data Table 1), we analyzed the multiplicative term of the two, for simplicity.
Analysis strategy
For all experimental analyses, we used intention-to-treat analyses, which means that data were analyzed for all individuals who were randomized to condition and who provided outcome data, regardless of their fidelity to the intervention protocol. This analysis is more conservative but also better reflects real-world effect sizes.
The present research advanced a fully-Bayesian regression approach called targeted, smooth Bayesian Causal Forest (tsBCF or BCF) 73 to calculate treatment effects and understand moderators of the treatment effects. A previous version of the BCF algorithm has won several open competitions for yielding honest and informative answers to questions about the complex, but systematic, ways in which a treatment’s effects are, or are not, heterogeneous, and it is designed to be quite conservative 51. We used the existing BCF method for Studies 1, 2, 4 and 5. The model is specified in Eq. 1:
Notably, BCF uses conservative prior distributions, especially for the moderator function, to shrink toward homogeneity and to simpler functions, avoiding over-fitting. The data are used once—to move from the prior to the posterior distribution—and all analyses then summarize draws from the posterior. This approach contrasts with the classical method, which involves re-fitting the model many times to estimate simple effects or to conduct robustness analyses with different specifications. The BCF approach, therefore, reduces researcher degrees of freedom, mitigating the risk of false discoveries and other spurious findings. In this research we focused on estimation of treatment effects (i.e. how large the effect is) and not null-hypothesis testing (i.e. whether it is “significant” or not) because of well-known problems with the all-or-nothing thinking inherent in the null hypothesis significance test 74. Following convention75 we reported the average treatment effects (ATE) and the conditional treatment effects (CATEs) with the associated 10th and 90th %iles from the posterior distributions (see Figures for the 2.5 and 97.5 %iles).
Effect size calculations. Unless otherwise noted, effects are standardized by the raw SD in the control condition.
Manipulation checks (all studies). The intervention reduced negative mindset beliefs (four items, including “Stress stops me from learning and growing” and “The effects of stress are bad and I should avoid them”, 1 = Strongly disagree, 6 = Strongly agree). Analyses revealed lower levels of negative mindsets in the intervention condition at post-test compared to the control condition, signifying a successful manipulation check: Study 1) -.293 SD [-.426, -.161]; Study 2) -.437 SD [-.567, -.310]; Study 3) -.504 SD [-.724, -.504]; Study 4) -.255 SD [-.549, .030]; Study 5) -.556 SD [-.713, -.399]. The two field experiments with high schoolers (Studies 1 and 4) had smaller manipulation check effects that were more imprecise than the others (Studies 2, 3, and 5). This was expected because the former studies were conducted in naturalistic school settings that tend to produce noisier data.
Study 1
Sample size determination. Sample size was planned to have sufficient power to detect a treatment effect in a field experiment of .10 SD or greater, with .10 SD being the minimum effect size that we would interpret as meaningful for a study focused on immediate post-test self-reports. We worked with our data collection partner, the Character Lab Research Network (CLRN) 76, to recruit as close to 3,000 participants as possible in a single semester. The final sample size was determined by the logistical constraints of data collection during the COVID-19 pandemic.
Participants. Participants were a heterogeneous national sample of adolescents who were evenly distributed across grades 8 to 12 in U.S. public schools (13 y/o: 16%; 14: 20%; 15: 20%; 16: 21%; 17: 18%; 18: 5%). Forty-nine percent identified as male, 49% as female, and 2% as gender non-binary. Participants were also racially and ethnically diverse (participants could indicate multiple racial/ethnic identities so numbers exceed 100%): Black: 20%; Latinx: 39%; White: 68%; Asian: 7%. Participants were also socioeconomically diverse: 40% received free or reduced price lunch, an indicator of low family income. Therefore, Study 1 provided a test of the hypothesis that the intervention could be widely disseminated and effectively change beliefs and appraisals in a national sample of adolescents that reflected the diversity of students in U.S. public schools. Even so, the sample was not strictly nationally representative because random sampling was not used to recruit the CLRN sample.
Procedure. Participants were recruited by CLRN76, which administers roughly 45-minute online survey experiments three times per year to a large panel of adolescents attending 6th to 12th grade. Researchers program their studies using the Qualtrics platform and students self-administer the materials at an appointed time. Data collection continued during the modified instructional settings of Fall 2020. We note that all measures had to be short so as to keep respondent burden low and fit within the required time limit for CLRN studies. Thus, the tradeoff in Study 1, when achieving scale and reaching a large adolescent population during the COVID-19 pandemic, was estimating potentially weaker effect sizes due to statistical noise.
Measures. At the beginning of the survey, participants indicated their most stressful class (e.g., math, science, English / Language arts). Then, after the intervention (or control) experience they were asked to imagine that “later today or tomorrow your teacher [in your most stressful class] asked you to do a very hard and stressful assignment. Imagine this is the kind of assignment that will take a lot of time to finish but you only have two days to turn it in. Also pretend that you will soon have to present your work in front of the other students in your class.” Participants then reported their event-focused appraisals on three items (e.g., “How likely would you be to think that the very hard assignment is a negative threat to you?”, 5 = Not at all likely to think this, 1 = Extremely likely to think this). Next participants reported their response-focused appraisals (“Do you think your body's stress responses (your heart, your sweat, your brain) would help you do well on the assignment, hurt your performance on the assignment, or not have any effect on your performance either way?” 5 = Definitely hurt my performance, 1 = Definitely help my performance).
The end of the study also included an additional behavioral intention measure: a choice between an “easy review” extra credit assignment and a “hard challenge” assignment 61,65. The intervention increased the rate of choosing the challenging assignment by .11 SD [.028, .200]. We expected the treatment to increase engagement with stressors because it leads to the appraisal that they are opportunities for learning and growth.
Study 2
Sample size determination. All students in an introductory social science course in Fall 2019 were asked to complete the intervention or control materials. Sample size was set by the response rate.
Participants. Participants were predominately first-year college students attended a selective public university in the United States that draws from a wide range of socioeconomic status groups: 17 years-old: 3%; 18: 49%; 19: 29%; 20: 11%: 21 or older: 8%. Sixty-four percent identified as female and the rest as male; 39% had mothers who did not have a four-year college degree or higher (an indicator of lower socioeconomic status), and 59% identified as lower class, lower middle class, or middle class (vs. upper middle or upper class).
Procedure. This experiment was conducted in a social science course in which students completed timed, challenging quizzes at the beginning of each class meeting, twice per week. In the second week of the semester, soon before the first graded quiz, students were invited to complete the intervention (or control) materials on their own time using their own computer in return for course credit, and 83% of invited students did so. The effects of the intervention were assessed via students’ appraisals of the first graded quiz of the semester one to three days later. The appraisal items were necessarily short because they were embedded at the end of the assignment and students completed them during class before the lecture. The appraisal items were then administered a second time after another quiz which occurred 3-4 weeks post-intervention.
Measures. Participants rated their agreement or disagreement with the statements “I felt like my body’s stress responses hurt my performance on today’s benchmark” (1 = Strongly disagree, 5 = Strongly agree) and “I felt like my body’s stress responses helped my performance on today’s benchmark” (5 = Strongly disagree, 1 = Strongly agree). The two ratings were averaged to provide an appraisal index, with higher values corresponding to more negative appraisals 77.
Study 3
Sample size determination. An a priori power analysis was used to determine sample size. Previous stress research that assessed cardiovascular responses in laboratory-based stress induction paradigms produced medium to large effect sizes (e.g., range: d = .59 to d = 1.44 in Yeager et al., 2016, Jamieson et al., 2012, Oveis et al., 2020). Based on a standard medium effect size, at the low end of this range (d = 0.50), G*Power indicated that 64 participants per condition (i.e., 128 total participants) would be necessary to achieve a target power level of .80 to test for basic effects of the treatment using frequentist methods. In anticipation of potential data loss, we determined a priori that we would oversample by 20%. Data collection was terminated the week after more than 150 participants were enrolled in the study.
Participants. Participants were prescreened and excluded for physician-diagnosed hypertension, a cardiac pacemaker, BMI > 30, and medications with cardiac side effects (e.g., Blascovich et al., 2011). A total of 166 students were recruited from a university social science subject pool (120 females, 46 males; 76 White/Caucasian, 12 Black/African-American, 17 Latinx, 65 Asian/Asian-American, 2 Pacific Islander, 4 Mixed Ethnicity, 7 Other; Mage = 19.81, SD = 1.16, range = 18–26). After data collection, two participants were excluded due to experimenter errors. Additionally, impedance cardiography data for four participants could not be analyzed due to technical issues (prevalence of noise and artefacts in the signals). Decisions about inclusion of participants were made blind to condition assignment and to levels of the outcome. Participants were compensated $20 or 2-hrs of course credit for their participation.
Procedure. After intake questions, application of sensors, and acclimation to the lab environment, participants rested for a 5-min baseline cardiovascular recording which occurred approximately 25-min after arrival at the laboratory. They were then randomly assigned to an intervention condition by the computer software in real time and completed either intervention or control materials, which took approximately 20 minutes in this sample. Participants then completed the Trier Social Stress Test (TSST) 44. The TSST asks participants to give an impromptu speech about their personal strengths and weaknesses in front of two evaluators. Evaluators are presented as members of the research team who are experts in nonverbal communication and will be monitoring and assessing the participant’s speech quality, ability to clearly communicate ideas, and nonverbal signaling. Throughout the speech (and math) epochs of the TSST, evaluators provide negative nonverbal feedback (e.g., furrowing brow, sighing, crossing arms, etc.) and no positive feedback, either nonverbal or verbal 44. At the conclusion of speeches, and without prior warning, participants are asked to do mental math (counting backwards from 996 in increments of 7) as quickly as possible in front of the same unsupportive evaluators. Incorrect answers were identified by evaluators, and participants were instructed to begin back at the start. This stress induction procedure is widely used to induce the experience of negative, threat type stress responses 45,47. After completion of the TSST task, participants rested quietly for a 3-min recovery recording, and prior to leaving the lab all participants were debriefed and comforted.
Physiological Measures. The following measures were collected during baseline and throughout the Trier task: electrocardiography (ECG), impedance cardiography (ICG), and blood pressure (BP). ECG and ICG signals were sampled at 1000 Hz, and integrated with a Biopac MP150 system. ECG sensors were affixed in a Lead II configuration. Biopac NICOO100C cardiac impedance hardware with band sensors (mylar tapes wrapped around participants’ necks and torsos) were used to measure impedance magnitude (Zo) and its derivative (dZ/dt). BP readings were obtained using Colin7000 systems. Cuffs were placed on participants' non-dominant arm to measure pressure from the brachial artery. BP recordings were taken at 2-min intervals during baseline, throughout the stress task, and recovery. BP recordings were initiated from a separate control room. ECG and ICG signals were scored offline by trained personnel. First, one-minute ensemble averages were analyzed using Mindware software IMPv3.0.21. Stroke volume (SV) was calculated using the Kubicek method 78. B- and X-points in the dZ/dt wave, as well as Q- and R-points in the ECG wave, were automatically detected using the maximum slope change method. Then, trained coders blind to condition examined all placements and corrected erroneous placements when necessary.
Analyses targeted three physiological measures: pre-ejection period (PEP), stroke volume (SV), and total peripheral resistance (TPR). This suite is commonly used to threat-type stress responses (for a review see 79). TPR is the clearest indicator of threat-type responses and was therefore the focal outcome measure in this research. TPR assesses vascular resistance, and when threatened, resistance increases from baseline 43. TPR was calculated using the following validated formula: (MAP / CO) ∗ 80 80. PEP is a measure of sympathetic arousal and indexes the contractile force of the heart. Shorter PEP intervals indicate greater contractile force and sympathetic activation. Both challenge and threat type stress responses are accompanied by decreases in PEP from rest. SV is the amount of blood ejected from the heart on each beat (on average per minute). Increases in SV index greater beat-to-beat cardiac efficiency and more blood being pumped through the cardiovascular system, and are often observed in challenge states 45. Decreases in SV, on the other hand, are more frequently observed in threat states (even though threat can also elicit little or no change in SV 81). Cardiac output (CO), which is SV multiplied by heart rate (HR), is frequently used to assess threat and challenge type stress responses as well. As in a past paper45 we focused on SV rather than CO because the effects of the treatment on PEP (and thus HR) during the recovery period could distort effects on CO.
Study 4
Sample size determination. The number of students recruited each week was constrained by the research team’s capacity to support twice-daily diary surveys and thrice-daily saliva samples in a school environment. The ultimate sample size was determined by the total number of students who could be recruited from the school in the fall semester of 2019 given these constraints.
Participants. Participants were adolescents from economically-disadvantaged families who were nearly all (95%) from black or indigenous racial/ethnic groups. Students attended a high-quality urban charter school which showed a high graduation rate (98%) relative to the urban city school district (68%). Therefore, this was a population that was expected to face social, economic, and academic stressors, and who could therefore make use of a stress optimization intervention.
Procedure. Participants were assigned to one of three data collection cohorts based on their academic schedules and available research staff. Cohorts 1, 2, and 3 completed daily diary measures across three consecutive weeks during the Fall term. The intervention was administered on a Thursday, and then students began their weekly daily diary data collection 1-3 weeks later (M = 14 days). Intervention materials (see Experiment 1) were completed on a tablet computer with headphones in a quiet room at the school. Randomization to conditions occurred at this time. All instructors and research staff were blind to condition assignment and specific hypotheses. Prior to intervention/control materials participants completed baseline measures of mindsets (stress mindsets and growth mindsets) along with demographic information.
The week of daily diary data collection began on a Monday and students were surveyed twice each day for five consecutive days through Friday. Students provided their first report at lunch and the second at the conclusion of the school day but before leaving the school’s campus. Saliva samples were collected three times per day by adding the morning, before the first class period of the day. Thus, we targeted 10 total reports for each student and 15 total saliva samples. In addition to occasional non-response, there were two exceptions to these targeted numbers. One cohort had four days of data collection due to a school-wide event on a Friday, and the first cohort had up to three preliminary days of self-report (not saliva) data collection while the research team was refining procedures. Rather than exclude these additional self-report records, they were included, although the results were the same when excluding them.
The daily diary measures were designed to be brief (~5 min) and were completed on paper. When reporting on daily stressful events, students first indicated the categories of stressors they experienced that day (e.g., friends/social, academics, romantic relationships, daily hassles, etc.), then how intense the stressors, combined, were overall (“How negative would you say these experiences were?” 1 = Not negative at all, 5 = Extremely negative). Following published standard operating procedures for the diary studies in this lab45, days on which no social-evaluative stressors were listed were coded as a “1” for stressor intensity (the lowest value), to avoid dropping data from analysis.
Students were compensated $10 for completing intervention materials, and $5 for each daily diary entry. Thus, the maximum compensation per participant was $60. After the conclusion of data collection, students and instructors were debriefed, and students randomly assigned to the control condition were provided with the mindset intervention.
Internalizing symptoms. On each daily survey, students reported internalizing symptoms, operationalized as overall positive or negative feelings about themselves (“Overall, how good or bad did you feel about yourself today?” 1= Extremely good, 7 = Extremely Bad).
Cortisol. Acute cortisol responses follow a specific time course (i.e. peak levels occur ~30 minutes after stress onset). However, the diary survey stressors were not calibrated to identify the timing of specific events, so the two sources of information could not be tightly yoked. Indeed, as noted in the main text, there was no association between intensity of stressors reported and cortisol in the control condition. This is in contrast to the relation between internalizing symptoms and stressor intensity in the control condition. Additionally, cortisol levels have a diurnal cycle (i.e., peak levels at wakening, rapid declines within the first waking hours, and nadir at the end of the day). Waking levels and diurnal slopes can map onto wellbeing, stress coping, and health 82. Because all sampling was conducted during the school day, waking levels and diurnal cortisol slopes could not be accurately and precisely measured. The lack of time-course specificity and diurnal cycle data meant that our reported effect sizes for global cortisol levels are likely conservative.
Study 5.
Sample size determination. We recruited all students possible from an entire social science class in the spring of 2020, which, we would later learn, was a unique cohort for examining stress during the COVID-19 lockdowns.
Participants, procedure, and measures. Data were collected during the Spring semester of 2020. Participants were from the same university as Study 2 and the same intervention procedures were followed. (Due to a difference in data collection procedures relative to Study 2, quiz appraisal data could not collected in Study 5). The intervention was delivered at the end of January 2020. In March of 2020, students were sent home due to COVID-19 quarantines. In mid-April of 2020, students completed the General Anxiety Disorder-7 (GAD-7)59 as a part of a class activity focused on psychopathology. The GAD-7 asks “How often have you been bothered by the following over the past 2 weeks?” and offers several symptoms, including “Feeling nervous, anxious, or on edge,” “Not being able to stop or control worrying,” and “Feeling afraid as if something awful might happen.” Each symptom is rated on a scale from 0 = Not at all to 3 = Nearly every day. The seven items were averaged, producing an overall score with higher values corresponding to higher levels of general anxiety symptoms.