Power analyses for response shift detection with structural equation modeling

doi:10.21203/rs.3.rs-1808070/v1

Download PDF

Research Article

Power analyses for response shift detection with structural equation modeling

https://doi.org/10.21203/rs.3.rs-1808070/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 01 Mar, 2024

Read the published version in Quality of Life Research →

You are reading this latest preprint version

Purpose. Statistical power for response shift detection with structural equation modeling (SEM) is currently underreported. The present paper addresses this issue by providing worked-out examples and syntaxes of power calculations relevant for the statistical tests associated with the SEM approach for response shift detection.

Methods. Power calculations and related sample-size requirements are illustrated for two modelling goals: 1) to detect misspecification in the measurement model, and 2) to detect response shift. Power analyses for hypotheses regarding (exact) overall model fit and the presence of response shift are demonstrated in a step-by-step manner. The freely available and user-friendly R-package lavaan and shiny-app ‘power4SEM’ are used for the calculations.

Results. Using the SF-36 as an example, we illustrate the specification of null-hypothesis (H0) and alternative hypothesis (H1) models to calculate chi-square based power for the test on overall model fit, the omnibus test on response shift, and the specific test on response shift. For example, we show that a sample size of 506 is needed to reject an incorrectly specified measurement model, when the actual model has two-medium sized cross loadings. We also illustrate an alternative power calculation based on the RMSEA index for approximate fit, where H0 and H1 are defined in terms of RMSEA-values.

Conclusion. By providing accessible resources to perform power analyses and emphasizing the different power analyses associated with different modeling goals, we hope to facilitate the uptake of power analyses for response shift detection with SEM and thereby enhance the stringency of response shift research.

Statistical power

sample size planning

structural equation modeling

response shift

chi-square test

root mean square error of approximation (RMSEA)

Interpretation of change in self-reports is difficult when it is affected by a change in the meaning of respondents’ self-evaluation, also known as response shift [1]. Response shift research has received increasing attention over the last decades, which has resulted in both theoretical and methodological advances (e.g. see the recent Response Shift—in Sync Working Group initiative [2–5]. Structural equation modeling (SEM) is currently the most widely used statistical approach for the investigation of response shift [6]. When investigating the presence of response shift using statistical hypothesis testing it is important to also consider the statistical power of the test. That is, one needs to consider the chance that the statistical test will be able to detect the response shift effect of interest when this effect truly exists. Low statistical power indicates that even if the response shift effect exists in reality, there is only a small chance that the statistical test will be able to detect it. In order to prevent allocating valuable resources to research with low statistical power, it is thus of utmost importance to consider a-priori power calculations (e.g., [7]).

The power of a statistical test depends on the size of the sample (N), the significance criterion (α), and the effect-size (ES) of the effect of interest in the population. A-priori power calculations are generally performed to be informed about the minimal required sample size to achieve sufficient statistical power. Because the significance criterion and the desired statistical power are usually set at .05 and .80 respectively, the calculation requires ‘only’ the specification of the population effect-size. However, determining this effect-size is not straightforward. With SEM analyses this is especially complicated because the effect-size that needs to be specified depends on many parameters in the model. Therefore, instead, one often relies on general rules of thumb about sample size (e.g. > 100 or > 200 [8–9]) or sample size in relation to the number of parameters or variables in the model (e.g. [10–11]). However, these rules of thumb are problematic because they are not model- or hypothesis-specific and may thus lead to over- or under-estimation of the required sample size; and consequently, to over- or under-powered studies.

The importance of statistical power and the resulting sample-size requirements for response shift detection methods have been emphasized in the literature (e.g. [4; 12]), but in practice power calculations are rarely reported [6]. In part, this may be due to the general complexity of effect-size calculations for SEM analyses. Another complication with power calculations for the detection of response shift is that the SEM approach includes two distinct modeling goals. One modeling goal is that the model as a whole describes the data well; another modeling goal is to test significance of (or differences between) specific model parameters, that is, the response shift effects. The first goal requires that an analysis has enough power to detect a meaningful level of model misspecification; the second goal requires that an analysis has enough power to detect a minimally meaningful effect-size corresponding to a specific parameter (i.e. response shift effect).

Therefore, the aim of the current paper is to provide accessible examples of power-calculations that are relevant for the two modelling goals that are part of the response shift detection approach with SEM; that is, 1) power to detect misspecification in the measurement model (i.e. the test of overall model fit), and 2) power to detect response shift. The latter power calculation can be applied to the overall test for response shift, but also for the detection of individual cases of response shift. Although there exists a number of excellent general tutorial papers on power calculations with SEM (e.g. [13–15]), their uptake in the research area of response shift may be limited due to their general scope and relatively technical language. The original paper of the SEM approach for response shift detection does describe power-calculations for SEM [16], but without the syntaxes being available these calculations may be hard to follow. In the current paper, technical formulas and language are avoided as much as possible to maximize readability for a general audience, but some basic knowledge of Oort’s SEM method [16] is desirable. The recently developed user-friendly and freely available shiny-app ‘power4SEM’ [13] will be used to further facilitate application of power-calculations in practice. Describing power-calculations for response shift detection with SEM in more detail also enables us to emphasize that one needs to consider different power-calculations for the different steps in the modelling procedure. In doing so, we hope to aid researchers with an interest in applying SEM for the detection of response shift in both understanding and using power-calculations, thereby enhancing the stringency of response shift research.

Illustrative Example

To illustrate power-calculations for response shift detection with SEM, we use – following [16] – the SF-36 health-related quality of life questionnaire as an example ([17]; see Figure 1). That is, the eight subscales of the SF-36 are modelled to be indicative of two underlying latent factors: general physical health and general mental health; measured at two occasions.

Appendices I – III include the lavaan syntax specification [18] of all H₀ and H₁ models that are used for the chi-square based power calculations of the SEM approach for response shift that are described below, including descriptive details on the model specification and model parameter values.

Step 1: Chi-square based power to detect misspecification of the measurement model

The first step of the SEM approach for the detection of response shift entails the specification of the measurement model. This measurement model specifies the measurement structure of the data, where the scores on the observed variables (e.g. scores on questionnaire items or, in this case, scores on the subscales of the SF-36) are related to one or more underlying latent variables (e.g. general mental and physical health) (see Figure 1). A correctly specified measurement model is important because the measurement model serves as a comparison for all subsequent models. When the measurement model is not correctly specified (e.g. the number of underlying factors is wrong), this will likely affect subsequent results with regards to detection of response shift effects [4]. Therefore, it is important to calculate the statistical power to detect possible misspecification of the measurement model.

The model fit of the measurement model is usually evaluated with the chi-square test of exact fit. The null-hypothesis (H₀) is that the model fits the data exactly. When the p-value falls below the significance criterion (α), we reject H₀ in favor of the alternative hypothesis. The alternative hypothesis (H₁) is that the model does not fit the data exactly. Incorrectly rejecting H₀ is called a Type I error; which is usually set at a .05 value. A type II error (β) is made when H₀ should have been rejected, but was incorrectly retained. The power of a statistical test is the chance to correctly reject H₀ (1-β; see Table 1).

Table 1. Statistical power for the three tests in steps 1-3 of the SEM approach to detect response shift

Reality

Statistical test

H₀ = true

H₁ = true

Reject H₀

α (Type I error)

Step 1: Incorrectly reject measurement model

Step 2: Incorrectly reject no response shift model

Step 3: Incorrectly reject no response shift parameter

1-β (Power)

Step 1: Correctly reject measurement model

Step 2: Correctly reject no response shift model

Step 3: Correctly reject no response shift parameter

Not reject H₀

1-α (Correct inference)

Step 1: Correctly retain measurement model

Step 2: Correctly retain no response shift model

Step 3: Correctly retain no response shift parameter

β (Type II error)

Step 1: Fail to reject misspecified measurement model

Step 2: Fail to reject no response shift model

Step 3: Fail to reject no response shift parameter

Notes: H₀ = null-hypothesis, H₁ = alternative hypothesis

Power calculations require the specification of H₀ and H₁. With a simple statistical test like a student t-test, H₀ is usually zero (e.g. there is no difference between groups) and H₁ is usually set at an effect-size value that is deemed plausible or minimally relevant (e.g. a mean difference according to rules of thumb of small, medium or large effects). Power calculations for the chi-square test of exact fit are based on the difference in chi-square distributions between H₀ and H₁, and therefore require the specification of both H₀ and H₁ models [19]. Following Oort [16], the H₀ model for the SF-36 could be the measurement model as specified in Figure 1. This model works well as an illustration, because it has simple structure (i.e. each variable loads on only one underlying latent factor) and is therefore relatively easy to specify and interpret. The H₁ model can be any alternative measurement model of the SF-36. Determining a plausible H₁ model is complicated because model misspecification generally does not entail a specific effect of interest within the model. There thus exist many different options for the definition of H₁, e.g. a one-factor model, a three-factor model, or a model with one or multiple cross-loadings. Moreover, the calculation of an effect size for H₁ requires that the values for all model parameters in the H₁ model are specified. It may thus take quite some deliberation on what the exact misspecification should entail. An approach that one could take is to first specify the model under H₀, i.e. the model that the researcher thinks is the plausible model, including plausible values for all model parameters. Subsequently, one could think of a variation of the H₀ model that includes one or more additional parameters for which – if these parameters are not zero – the H₀ model should be rejected. For example, with regards to the measurement model of the SF-36 from our illustrative example, one could think of an alternative measurement model that includes additional loadings (i.e. cross-loadings) of the indicators GH, VT and/or SF that have been previously described in the SF-36 manual [17]. With regards to the value of these additional parameters, the recommendation would be to choose the minimum value that would be of interest. In general, specifying the values of model parameters in standardized metric is convenient because they can be interpreted according to general rules of thumb for representing small, medium, and large effects. For example, standardized factor loadings of .1, .3 and .5 can be interpreted as correlation coefficients and thus represent small, medium and large respectively [7]. Also, previous findings can be used to inform plausible model parameter values.

Specification of H₀ and H₁ for the Step 1 chi-square based power calculation. Using the illustrative example, the H₀ model of the SF-36 is defined as depicted in Figure 1 (see also Appendix I, page 1). It is based on information of the 8 subscales of the SF-36 at baseline and follow-up; the number of unique elements in the variance-covariances matrix of the empirical data is thus 16*17/2=136. The H₀ model contains the specification of 16 factor loadings, 4 underlying latent factor variances, 6 underlying latent factor covariances, 16 residual factor variances, and 8 residual factor covariances. Identification of the model requires that either the underlying latent factor variance or one factor loading for each latent factor is restricted to a fixed value [16], so that the total number of free parameters in the H₀ model is 46 (see also Appendix I, page 3). Note that for reasons of conciseness the mean structure is not considered here.

The H₁ model is defined as the H₀ model with the addition of two medium-sized cross-loadings of the GH and VT subscales (see Figure 2). Note that there are multiple options for defining H₁. This specific H₁ was considered a plausible alternative model based on previous research that has found substantial cross-loadings in the measurement model of the SF-36 (e.g. [20-21]). We specified the parameter values in standardized form, where the values for the factor loadings are chosen to be .5 (i.e. of large size [7]) and the values of the variances of the residual factors are chosen so that the total variance of each observed variable is 1. Similarly, the variances of the underlying latent factors are standardized. This entails that also the values for associations between the residual factors and between the underlying factors can be interpreted as correlation coefficients. The additional cross-loadings in H₁ were specified to be of medium size (i.e., .3 [7]). As choosing parameter values for all parameters in the H₁ is arguably the most difficult part of chi-square based power calculations, we return to this issue in the discussion section.

Step 1 chi-square based power calculation with power4SEM. When both models are specified, and plausible values for all model parameters of H₁ are provided, we can use power4SEM to calculate the chance to correctly reject H₀. For reasons of conciseness, we will only describe what steps to take in order to arrive at the desired result. We will not go into (technical) details of the underlying calculations or required input values, for which the reader is referred to the tutorial paper of power4SEM [13] and/or the help files available under the question mark buttons on the webpage. In addition, Appendix I also includes a more detailed visual description of the required procedure. As a first step, insert the lavaan syntax of the H₁ and H₀ models in the dedicated areas from the “lavaan input” page. You will see a graphical display of both models at the right-hand side of the screen (see Figure 3). Use the default setting of N=200 for the “Intended sample size” box; when the researcher has information on the intended or acquired sample size for the proposed/performed study, one could inserted that specific number instead. Click on the green button “obtain NCP” at the top of the page.

Second, go to the “Chi-square test” page and insert the following values in the box “Input” on the upper left side of the screen: the noncentrality parameter (NCP) value obtained in the first step (i.e. 13.796), the degrees of freedom (Df) of the measurement model (i.e., the number of free statistics minus the number of free model parameters, in our illustrative example this is 136 – 46 = 90), and the alpha-value (α = .05). Click on the blue button “Calculate!”. The result of the power-calculation is now shown both numerically and graphically at the right-hand side of the screen (see Figure 4). That is, the statistical power to correctly reject our H₀ model as specified in Figure 1, when in reality the true model includes two medium-sized cross-loadings, is .261. In other words, there is a 26.1% chance of correctly rejecting H₀. A rather disappointing result considering that one generally wants to achieve a power of 80%.

Sample size needed to acquire sufficient power. An additional feature of power4SEM is that it can also be used to calculate the minimum sample size to achieve a desired power of 80%. If we fill in the required values in the box at the bottom left of the “Chi-square test” page, we find that for our illustrative example the minimum sample size needed is 560. In other words, to increase our confidence that the chi-square test of exact fit will reject our model in Figure 1 when it is misspecified (as defined by two medium-sized cross loadings), we should fit the model to data from at least 560 participants.

The illustrated chi-square based power calculations can thus be a valuable tool in two situations. First, it can be used as a helpful tool for studies in which the sample size is already determined or for studies that have already been completed, as it can provide confidence in the accurateness of the specified measurement model. In addition, and preferably, it can be helpful for sample-size planning at the stage of study design. A general drawback of chi-square based power calculations, however, is that it requires explicitly specified models with values for all model parameters. As an alternative, there is also the option to base power calculations for overall model fit evaluation on the root mean square error of approximation (RMSEA) fit index.

Alternative power-calculation for overall model fit evaluation: RMSEA-based power

Instead of relying on chi-square-based power for the power calculation to detect possible misfit in the measurement model, one can also use RMSEA-based power [22]. The RMSEA is an alternative fit-measure for overall model fit, where values of < .05 are indicative of ‘close fit’, < .08 of ‘acceptable fit’, and > .10 of ‘poor fit’ [23]. Because the RMSEA value is derived from the chi-square value we can also derive the chi-square distributions under H₀ and H₁ from an RMSEA value. That is, in order to calculate statistical power for overall model fit evaluation, we only need to specify the RMSEA-values of H₀ and H₁, instead of having to specify all model parameters in both models. So, for example, one can investigate the power to reject close fit (RMSEA value H₀ = .05) when in the population there is not close fit (RMSEA value H₁ = .08). This power calculation is similar to the chi-square-based power calculation in that it provides the power to correctly reject a misspecified measurement model. Another advantage of the RMSEA-based power calculation, is that we can also switch the direction of hypothesis testing so that we can calculate the power to reject H₁ when H₀ is true. This is an advantage because with SEM we usually assume that H₀ is true. That is, we believe that the model that we specify under H₀ is the true model and so we are not directly interested in the power to reject H₀ when in fact H₁ is true; but, instead, it would be more informative to know the power to reject H₁ when H₀ is true. So, for example, we can investigate the power to reject a model with not-close fit (RMSEA value H₀ = .08) in favor of a model with close fit (RMSEA value H₁ = .05), when there is ‘true’ close fit of the model. More stringently, following MacCallum et al. [22] one could calculate the power to reject a model with ‘not close fit’, using RMSEA H₀ = .05 and RMSEA H₁ is .01. This will give us the probability to correctly reject a model with RMSEA > .05 if the population RSMEA is .01. Different values may be chosen for H₀ and H₁, which will of course impact the calculated power. As a general recommendation, one could use the cut-off values that one uses to base a decision on whether the model does or does not fit well to the data.

Step 1 RMSEA-based power calculation with power4SEM. RMSEA-based power calculations are also available in the power4SEM app, under the “RMSEA” page. Here, we need to provide the RMSEA-values for H₀ and H₁. Suppose we calculate the power to reject close fit (RMSEA = .05) of the measurement model in Figure 1, when there is ‘true’ not-close fit in the population (RMSEA = .08). We also provide the intended sample size (N = 200), alpha value (.05), and number of degrees of freedom of the model of interest (df = 90). If we click on the red button “Calculate!” the result is now shown both numerically and graphically at the right-hand side of the screen (see Figure 5). When the model in reality shows not-close fit, the power to reject the hypothesis of close-fit is 0.937. If we reverse the RMSEA-values, we will see that the power to reject the hypothesis of not-close fit (RMSEA H₁ = .08) when the model in the populations shows close fit (RMSEA H₀ = .05) is 0.936.

Step 2: Chi-square based power to detect the overall presence of response shift

The second step in the SEM approach for response shift detection entails an omnibus test on the presence of response shift. The presence of response shift is indicated by a change in the pattern of factor loadings (reconceptualization), the value of factor loadings (reprioritization) or the values of intercepts (recalibration[1]). The omnibus test is performed by comparing the so-called ‘no response shift model’, i.e. a model in which all parameters that are associated with response shift are restricted to be equal across time, to the measurement model (in which all these parameters are free to vary across time). The chi-square values of both models can be compared using a chi-square difference test, where a significant p-value indicates that H₀ (no response shift) should be rejected (see also Table 1). In other words, it indicates the overall presence of (any type of) response shift. Statistical power for this chi-square difference test will indicate the chance of correctly rejecting H₀ (no response shift) when in reality response shift effects are present (see also Table 1). When statistical power is low, there is a high chance that the test will incorrectly indicate that there is no response shift. The difficulty for the power calculation is – similar to Step 1 – to define H₁. Here, H₁ refers to a model that includes indications of response shift, and one thus has to determine what the ‘overall presence of response shift’ looks like. That is, to determine the exact type, number, and size of possible response shift effects for which H₀ should be rejected.

Specification of H₀ and H₁ for the Step 2 chi-square based power calculation. The H₀ model that is used in power calculations for the omnibus test on response shift is the ‘no response shift model’ in which all factor loadings and intercepts are restricted to be equal across baseline and follow-up (see Figure 6 and Appendix II). The number of degrees of freedom for this model are 102 (see Appendix II for more details). The degrees of freedom for the chi-square difference test that is used to test for the overall presence of response shift is thus 102-90=12. The H₁ model is specified the same as the H₀ model, but includes some response shift effects. That is, the H₁ model is defined by including differences in the pattern of factor loadings, values of factor loadings and/or intercepts across time. The choice on the type, number, and size of possible response shift effects to include in H₁ is greatly facilitated when there exist a-priori hypotheses on the potential occurrence of response shift. Based on theory or prior research one may have an idea of what type (i.e. recalibration, reprioritization or reconceptualization), what number, and how large the possible response shift effects may be. For example, previous studies on response shift with the SF-36 indicated the presence of reconceptualization (GH subscale [24]), reprioritization (SF subscale [24], RP subscale [21]) and recalibration response shift (PF subscale [25], RP and BP subscales [24]). When there is no a-priori information available, the specification of a plausible H₁ is more difficult. As a general recommendation, one could include the minimum number of response shift effects that would be of interest. As the response shift effects refer to targeted parameters, general accepted rules of thumb for the size of the effects can be used to specify small, medium or large effects respectively. The choice of H₁ model specification in our illustrative example is not based on previous findings of (size of) effects, as the lack of context complicates using substantive considerations in our model specification. Therefore, in our illustrative example H₁ is specified as a model that includes a total of three response shift effects, i.e. one medium-sized recalibration, reprioritization and reconceptualization effect respectively (see Figure 6 and Appendix II). Note that we now include also the mean structure, as the estimation of underlying factor means is now part of the modelling procedure.

Step 2 chi-square based power calculation with power4SEM. When both H₀ and H₁ models are specified, and plausible values for all model parameters of H₁ are provided, we can use power4SEM to calculate the chance to correctly reject H₀ of no response shift (see also Appendix II). First, the lavaan syntax of the H₀ and H₁ models are inserted into the designated input-boxes on the “lavaan input” page (see Figure 7). The result is obtained by clicking on the green button “Obtain NCP”. Second, on the “Chi-square test” page the obtained NCP-value (36.688), the Df of the chi-square difference test (12), and the appropriate alpha (.05) are provided as input to obtain the statistical power of the test. The result is shown on the right side of the page (see Figure 8), where the power to correctly reject H₀ of no response shift is .994. Thus, when there exist three medium-sized response shifts in reality, the omnibus test for response shift is very likely to correctly reject the hypothesis of no response shift.

Step 3: Chi-square based power to detect specific response shift effects

The third step in the SEM approach for response shift detection includes specific tests for response shift effects. That is, the tenability of equality restrictions on model parameters associated with response shift are investigated one by one. Again, the chi-square difference test can be used to test the tenability of the equality restriction. The H₀ of no response shift now refers to one specific response shift effect (see Table 1). When the p-value falls below the alpha-criterion the H₀ of no response shift specific to the parameter is rejected. Sufficient statistical power is needed to ensure that when the specific response shift effect that is being evaluated exist, that there is a high chance that the chi-square difference test will detect it. If statistical power is low, there is high chance that response shift effects are missed.

Specification of H₀ and H₁ for the Step 3 chi-square based power calculation. The H₀ model that is used in power calculations for tests on specific response shift effects is – again – the ‘no response shift model’ in which all factor loadings and intercepts are restricted to be equal across baseline and follow-up (see Figure 9 and Appendix III). The difference with the power calculations for the omnibus test for response shift is that the H₁ model includes only one specific response shift effect. The degrees of freedom for the chi-square difference test that is used to test for the presence of a single response shift is 1 (instead of 12 for the omnibus test of response shift). Using the illustrative example, we specify three different H₁ models for the detection of one medium-sized recalibration, reprioritization or reconceptualization response shift respectively (see Figure 8). In this situation there are thus three different power calculations associated with the chi-square test for specific response shift. Here, we elaborate on the power to detect a specific indication of reconceptualization response shift (but see Appendix III for syntaxes of all three power calculations), which is defined as a medium-sized cross-loading of VT at follow-up measurement (H₁ model A in Figure 9).

Step 3 chi-square based power calculation with power4SEM. We use power4SEM to calculate the chance to correctly reject H₀ of no response shift, in favor of H₁ with one indication of a medium-sized reconceptualization response shift (see also Appendix III). The NCP value that is derived by inserting the H₀ and H₁ model syntaxes in the “lavaan input” page is 9.013 (see Figure 10). In combination with Df = 1 and α = .05 this results in a power of .851 (see Figure 11). That is, the chance that the H₀ of no reconceptualization response shift of VT will be correctly rejected (when there is a medium-sized effect present in reality) is 85.1%. This is good news, as the calculated power falls above the desired power of 80%.

Note, that when the omnibus test of response shift is used in the same situation (i.e., when only one reconceptualization response shift is present in reality), the power to detect such an effect is reduced to 45.4% (see Appendix III for details). That is, the power to detect a single response shift effect will be higher for the chi-square test on a specific parameter (i.e., Step 3 of the SEM approach) than it will be for the omnibus chi-square test (i.e., Step 2 of the SEM approach). However, as there are many specific parameters that can be tested for the presence of response shift the increasing number of statistical tests performed on the same data will generally lead to an increased Type I error rate (see Table 1). There is thus a balance to be found between the protection against Type I errors with the omnibus test and the higher power to detect single indications of response shift of the specific test.

The current paper illustrated power calculations for the different steps of the SEM approach to investigate response shift. First, power calculations were illustrated for overall model fit evaluation of the measurement model (i.e. step 1 of the SEM approach). Chi-square based power calculations require the specification of an alternative measurement model that defines the amount of misspecification that one wants to be able to detect (the effect-size value), including all model parameter values. The resulting power can be interpreted as the probability that the hypothesis of exact fit of the measurement model will be rejected, when the alternative measurement model holds in the population. Instead of chi-square based power, one can also use RMSEA-based power. One advantage of RMSEA-based power is that it does not require exact model specifications, but instead relies on the RMSEA-values associated with the two measurement models. The resulting power can be interpreted as the probability to reject the hypothesis of close fit of the measurement model, when in reality the measurement model shows not-close fit. Another advantage of RMSEA-based power is that you can flip the hypothesis, so to retrieve the power to reject the hypothesis of not-close fit of the measurement model when in reality the measurement model shows close fit. The latter type of power is most relevant in practice, as the hypothesis is usually that the specified measurement model is the correct model. A drawback, however, is that the amount of misfit in the measurement model as defined by RMSEA-values is hard to interpret. Although chi-square based power calculations are more complicated, it is also more intuitive to think about misfit in terms of specific parameters (e.g. additional factor loadings).

Chi-square based power calculations were also illustrated for the detection of response shift (i.e. steps 2 and 3 of the SEM procedure). As the hypotheses about the presence of response shift do refer to specific parameters (and not just a general notion of model misfit), it is relatively easy to derive explicitly specified models. Also, as one generally expects that there is some indication of response shift, the resulting power in terms of probability to detect response shift when it is present in the population is directly relevant. However, a difficulty for power calculations and sample size planning for response shift detection with SEM is that it includes different types of power, i.e. power to detect misspecification in the measurement model and power to detect response shift. The different types of power may require very different sample sizes, such that a SEM analysis may be well-powered to detect a model misspecification in the measurement model, but poorly powered to detect response shift, or vice versa. Moreover, although the test on overall presence of response shift (i.e. step 2) and the test for specific indications of response shift tests (i.e. step 3) share the aim to detect possible response shift effects, they do not share the same focus on power to detect effects. That is, the omnibus test may lack power to detect specific indications of response shift, but generally protects against false positives (Type I error). Finding a balance between confidence in the appropriate of the measurement model, desired statistical power to detect response shift, and protection against false positives, is challenging. The different power calculations do provide insight into this balancing act. Another possibility is to consider compromise power analyses, an alternative power analysis in which statistical power and risk of Type I errors are balanced [14; 26–27].

In general, the factors that affect power in SEM include well-known factors that affect power in any method, like sample size. Other, less well-known factors that influence power in SEM include the distribution of the data, the number and reliability of indicators, the number of latent variables, and the values of all the other parameters in the model. Arguably, the most difficult part of (chi-square based) power calculations is choosing values for all model parameters. That is, specifying the (alternative) measurement model, and to determine the number, type and size of response shift effects to specify. Generally, it is advised that such decisions are based on existing knowledge from prior research. Statistical rules of thumb about the size of effects (i.e., small, medium, large) can also be used to choose appropriate parameter values for effects of interest. Using the illustrative example, it was shown how relevant literature can be used to make a decision on the specification of an alternative measurement model, and inclusion of response shift effects. The parameter values used in the illustrative example were primarily chosen based on statistical rules of thumb of small, medium and large size. As the size of parameter values determine the computed effect-size relevant for the associated statistical power, different parameter values may lead to different conclusions about achieved power or required sample size. Therefore, I would like to note that some alternative recommendations exist for specifying the size of factor loadings. For example, Tabachnick and Fidell [28] argued that based on some general rules of thumb about sample size and alpha level, a factor loading of at least 0.32 should be considered statistically meaningful. Also, one could rely on the size of factor loadings in relation to explained variation. A factor loading of 0.32 would translate to approximately 10 percent variation of the indicator explained by the underlying factor, and the large-sized factor loadings of 0.50 that were used in our illustration translate to 25 percent explained variance. One could argue that for a relevant indicator at least half of the variance should be explained by the underlying factor, and thus the factor loading must exceed 0.70. By relying on statistical rules of thumb for specifying the values of the factor loadings (and other model parameters) we have chosen to use conservative estimates of population values that may have lowered resulting power estimates (i.e. statistical power is higher with stronger measurement structures and/or larger effects). We would thus like to emphasize the importance of relying on previous research to make ecologically valid decisions regarding parameter values when power calculations are used in practice.

Another approach to power calculations is to use a Monte Carlo simulation study (e.g. [15;29]). In this approach a large number of datasets is generated under the model corresponding to the alternative hypothesis (H₁), and the null-hypothesis model (H₀) is fitted to the generated data. Model fit statistics (i.e. chi-square values) and model parameters can be extracted to calculate the proportion of statistically significant results. This results in an empirical estimate of power. It has the advantage that it can take into account possible nonconvergence of models, and is flexible in handling violated assumptions such as non-normally distributed data. However, it is not suited to include RMSEA-based power, is computationally intensive, and conducting simulations generally requires a substantial level of programming experience and statistical expertise.

Concluding, it is important to consider power of intended and performed statistical analyses in the field of response shift research. Recent developments have made power analyses with SEM more feasible and accessible, and the current paper adds to this literature by providing detailed examples. Ideally, any response shift study with SEM should use power calculations when planning sample sizes, or report the power achieved for already performed analyses. Therefore, it is my hope that this paper advances the use of power analyses in applications of SEM for detection of response shift.

Compliance with Ethical Standards

Funding: The author declares that no funds, grants, or other support were received during the preparation of this manuscript.

Conflict of Interest: The author declares that she has no conflict of interest.

Ethical approval: This article does not contain any studies with human participants performed by any of the authors.

Sprangers, M. A. G., & Schwartz, C. E. (1999). Integrating response shift into health-related quality of life research: A theoretical model. Social Science and Medicine, 48, 1507-1515. DOI: 10.1016/s0277-9536(99)00045-3.
Sprangers, M. A. G., Sajobi, T., Vanier, A., Mayo, N. E., Sawatzky, R., Lix, L. M., Oort, F. J., Sébille, V., & Response Shift – in Sync Working Group (2021). Response shift results of patient-reported outcome measures: a commentary to The Response Shift-in Sync Working Group initiative. Quality of Life Research, 30(12), 3299-3308. DOI: 10.1007/s11136-020-02747-4.
Vanier, A., Oort, F. J., McClimans, L., Ow, N., Gulek, B. G., Böhnke, J. R., Sprangers, M. A. G., Sébille, V., Mayo, N., & Response Shift – in Sync Working Group (2021). Response shift in patient-reported outcomes: definition, theory, and a revised model. Quality of Life Research, 30(12), 3309-3322. DOI: 10.1007/s11136-021-02846-w.
Sébille, V., Lix, L. M., Ayilara, O. F., Sajobi, T. T., Janssens, C. J. W., Sawatzky, R., Sprangers, M. A. G., Verdam, M. G. E. & the Response Shift – in Sync Working Group (2021). Critical examination of current response shift methods and proposal for advancing new methods. Quality of Life Research, 30(12), 3325-3342. DOI: 10.1007/s11136-020-02755-4.
Sawatzky, R., Kwon, J-Y., Barclay, R., Chauhan, C., Frank, L., van den Hout, W. B., Kongsgaard Nielsen, L., Nolte, S., Sprangers, M. A. G., & Response Shift – in Sync Working Group (2021). Implications of response shift for micro-, meso-, and macro-level healthcare decision-making using results of patient-reported outcome measures. Quality of Life Research, 30(12), 3343-3357. DOI: 10.1007/s11136-021-02766-9.
Sajobi, T. T., Brahmbatt, R., Lix, L. M., Zumbo, B. D., & Sawatzky, R. (2018). Scoping review of response shift methods: Current reporting practices and recommendations. Quality of Life Research, 27(5), 1133–1146. DOI: 10.1007/s11136-017-1751-x.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Erlbaum: Hillsdale, NJ. DOI: 10.4324/9780203771587.
Boomsma A. (1982). Robustness of LISREL against small sample sizes in factor analysis models. In: Joreskog KG, Wold H, editors. Systems under indirection observation: Causality, structure, prediction (Part I) Amsterdam, Netherlands: North Holland. pp. 149–173.
Boomsma A. (1985). Nonconvergence, improper solutions, and starting values in LISREL maximum likelihood estimation. Psychometrika, 50, 229–242. DOI: 10.1007/BF02294248
Bentler, P. M., & Chou, C-P., C. (1987). Practical issues in structural modeling. Sociological Methods and Research, 16, 78-117. DOI: 10.1177/0049124187016001004.
Cattell, R. B. (1978). Conducting a Factor Analytic Research: Strategy and Tactics. In: The Scientific Use of Factor Analysis in Behavioral and Life Sciences. Springer, Boston, MA. DOI: 10.1007/978-1-4684-2262-7_15.
Schwartz, C. E., Ahmed, S., Sawatzky, R., Sajobi, T., Mayo, N., Finkelstein, J., Verdam, M. G. E., Oort, F. J., & Sprangers, M. A. G. (2013). Guidelines for secondary analysis in search of response shift. Quality of Life Research, 22(10), 2663-2673. DOI: 10.1007/s11136-013-0402-0.
Jak, S., Jorgensen, T. D., Verdam, M. G. E., Oort, F. J., & Elffers, L (2021). Analytical power calculations for structural equation modeling: A tutorial and shiny app. Behavioral Research Methods, 53, 1385–1406. DOI: 10.3758/s13428-020-01479-0.
Jobst, L. J., Bader, M., & Moshagen, M. (2021, October 21). A tutorial on assessing statistical power and determining sample size for structural equation models. Psychological Methods. Advance online publication. DOI: 10.1037/met0000423
Wang, Y. A., & Rhemtulla, M. (2021). Power analysis for parameter estimation in structural equation modeling: A discussion and tutorial. Advances in Methods and Practices in Psychological Science, 4(1), 1-17. DOI: 10.1177/2515245920918253.
Oort, F. J. (2005). Using structural equation modeling to detect response shift and true change. Quality of Life Research, 14(3), 587-598. DOI: 10.1007/s11136-004-0830-y
Ware, J. E., Snow, K. K., Kosinski, M., & Gandek, B. (1993). SF-36 health survey: Manual and interpretation guide. Boston, MA: The Health Institute, New England Medical Center.
Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1-36. DOI: 10.18637/jss.v048.i02.
Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50(1), 83–90. DOI: 10.1007/BF02294150.
Anagnostopolouos, F., Niakas, D., & Tountas, Y. (2009). Comparison between exploratory factor-analytic and SEM-based approaches to constructing SF-36 summary scores. Quality of Life Research, 18, 53-63. DOI: 10.1007/s11136-008-9423-5.
Oreel, T. H., Nieuwkerk, P. T., Hartog, I. D., Netjes, J. E., Vonk, A. B. A., Lemkes, J., van Laarhoven, H. W. M., Scherer-Rath, M., Henriques, J. P. S., Oort, F. J., Sprangers, M. A. G., & Verdam, M. G. E. (2022). Response shift after coronary revascularization. Quality of Life Research, 31(2), 437-450. DOI:10.1007/s11136-021-02902-5.
MacCallum, R. C., Browne, M. W., Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 2, 130–149. DOI: 10/1037/1082-989X.1.2.130.
Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods Research, 21, 230-258. DOI: 10.1177/0049124192021002005.
Oort, F. J., Visser, M. R. M., & Sprangers, M. A. G. (2005). An application of structural equation modeling to detect response shifts and true change in quality of life data from cancer patients undergoing invasive surgery. Quality of Life Research, 14, 599-609. DOI: 10.1007/s11136-004-0831-x.
Gandhi, P. K., Ried, L. D., Huang, I.-C., Kimberlin, C. L., & Kauf, T. L. (2013). Assessment of response shift using two structural equation modeling techniques. Quality of Life Research, 22(3), 461–471. DOI: 10.1007/s11136-012-0171-1.
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191. DOI: 10.3758/bf03193146.
Moshagen, M., & Erdfelder, E. (2016). A new strategy for testing structural equation models. Structural Equation Modeling, 23(1), 54–60. DOI: 10.1080/10705511.2014.950896.
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston, MA: Allyn & Bacon.
Muthén, L. K., & Muthén, B. O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 9(4), 599–620. DOI: 10.1207/S15328007SEM0904_8.

Download PDF

Journal Publication

published 01 Mar, 2024

Read the published version in Quality of Life Research →

Editorial decision: Major revisions
01 Apr, 2023
Reviewers agreed at journal
27 Sep, 2022
Reviewers invited by journal
17 Sep, 2022
Editor invited by journal
12 Jul, 2022
Editor assigned by journal
01 Jul, 2022
First submitted to journal
30 Jun, 2022

You are reading this latest preprint version

Power analyses for response shift detection with structural equation modeling

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Power Calculations

Discussion

Declarations

References

Supplementary Files

Status:

Journal Publication

Version 1