To date, there has been considerable attention on the existence of diseases and health-related issues as indices of health status, centring mostly on illness and pathology [52]. As posited by Seligman and Csikszentmihalyi, “the exclusive focus on pathology that has dominated so much of our discipline results in a model of the human being lacking the positive features that make life worth living” [54, p. 5]. Over several decades, however, there has been a paradigm shift, where the relevance of individual virtues, strengths, and areas of subjective well-being (SWB) have been proclaimed [34, 39, 52].
Several social scientists and philosophers have concerned themselves with defining happiness or SWB. The construct of SWB has three distinctive features: It is intrinsic within one’s experience; SWB comprises positive measures––it is not merely the absence of negative aspects; SWB measures typically include a holistic assessment of all aspects of a person’s life [11, 18]. However, as satisfaction or affect within life domains may be assessed, the significance is centred on integrated judgment of one’s life [18].
It is not uncommon to see gender comparisons as a focal point in research on distinct psychological characteristics [3]. Gender differences along with the role of gender in SWB has been of much interest [3]. Over several decades, research has shown that men have significantly greater levels of SWB (e.g., [3, 26, 60]). Fewer studies, however, have shown the inverse (e.g., [24]). Further complicating the matter, several studies have found no significant differences in men and women regarding SWB, even after controlling for some demographic factors (e.g., marital status, age, etc.) (e.g., [36, 31, 55, 67, 70]).
Several theoretical approaches could outline why there are such variations in gender differences regarding SWB. Conflicting and inconsistent findings between gender could be attributed to SWB consisting of three dimensions, including life satisfaction, positive affect, and negative affect [20, 17, 3]. The direction and magnitude of gender differences disunite for the separate dimensions, which may have conflation within the analyses, in turn reducing any observed differences [3, 21, 46].
Social construction theorists believe that men may experience poorer SWB than women due to the pressures to adhere to stereotypic beliefs (e.g., [6]). Researchers posit that adherence to masculinity norms in men contributes to harmful social relationships [27, 37] and heightened levels of psychological distress [48]. More specifically, men who endorse the self-reliance norm may value independence and therefore avoid mental health guidance, heightening psychological distress and reducing SWB [32, 41]
Contrarily, women may experience poorer SWB than men due to the power structure in society [45]. Additionally, on average, women are not as financially stable as men, are more likely than men to be sexually harassed within their occupation, feel ‘burn out’ [45].
Various scales have been employed to measure SWB in men and women, including the Positive and Negative Affect Scale [68], the Satisfaction With Life Scale [19], the Scale of Psychological Well-Being [51], and the Short Depression-Happiness Scale [33]. Whilst these scales aim to measure SWB, they do not holistically capture the full conception of SWB, including psychological functioning, cognitive-evaluative dimensions, and affective-emotional aspects. Of late, researchers have consistently used the Warwick-Edinburgh Mental Well-Being Scale (WEMWBS) [1; 61, 63]. The WEMWBS encapsulates a holistic conception of SWB, and it has demonstrated sound psychometric properties exhibiting acceptable validity and reliability [13, 38, 42, 55]
In a series of examinations on the WEMWBS, some studies have shown no significant differences in SWB in men and women [13, 15, 55, 64], however, others have shown that there is a significant difference between men and women regarding SWB [42, 63, 66]. Given these discrepancies, one could assume that the WEMWBS may operate differently between male and female respondents warranting investigations into potential issues around differential item functioning across gender groups. While researchers have traditionally employed well established paradigms to assess psychometric properties of an instrument (i.e., classical test theory, CTT), recent technological developments enabled the employment of alternative perspectives. Therefore, more research is needed to further validate the psychometric properties of the WEMWBS across genders using newly formed approaches, including item response theory (IRT). What’s more, perhaps differences in gender may be due to a lack of understanding and available empirical evidence supporting the robust psychometric properties of this measurement. Establishing the psychometric properties of the WEMWBS can be useful for information interventions, along with assisting clinicians in appraising the impact of one’s services on people’s lives, but also evaluate which aspects of their lives people are displeased with. This will allow clinicians to tailor their services to men and women to meet their needs. To address this aim, the current work will utilise two statistical methods: Measurement Invariance (MI [49]) and IRT. The following section will identify MI across gender regarding the WEMWBS.
Measurement Invariance (MI)
MI is a statistical method to evaluate whether the psychometric properties of a given measure are stable (i.e., invariant) across groups of interest [5]. For example, one could evaluate whether the WEMWBS assesses SWB in men and women in the same manner. Observing non-invariant responses to WEMWBS items in men and women would indicate that items need to be weighted to obtain similar responses across groups, or that conceptual differences in SWB exist across genders [58]. Specifically, Multigroup Confirmatory Factor Analysis (MCFA) can be employed in the evaluation of MI because it enables structural comparisons at various levels including: configural (i.e., factorial structure); metric (i.e., factor loadings); scalar (i.e., intercepts and thresholds); and strict (i.e., residuals) invariance [25, 44]. In this regard, acquiring configural invariance suggests that the pattern of item-factor loadings along with the number of factors within the WEMWBS are alike for men and women [72]. Moreover, attaining metric invariance for the WEMWBS would suggest that the item-factor loading relationship is being measured with the same metric scale for both groups [58]. Last, confirming scalar invariance for the WEMWBS proposes that the item intercept values are equal across groups. Whilst testing for error/residual variance across groups can be estimated, investigating this layer of invariance is often overlooked [57]. As the residual variance is anticipated to be random, examining their intergroup equality may result in redundant and overly strict models [7].
Tennant and colleagues’ [63] invitation for further investigation of the WEMWBS’ equivalence of psychometric properties across the two genders has been examined in Australian [30], Northern Irish and Scottish [43], Danish [35], and Norwegian [56] samples. Studies evaluating WEMWBS MI across binary gender groups compared goodness-of-fit (GOF) indices (such as comparative fit index, CFI; and root mean standard error or approximation, RMSEA) to determine whether WEMWBS items were indeed invariant [30, 43, 56]. Additionally, bootstrapped likelihood ratio was tested (BLRT [35]) to evaluate MI between gender groups [14]. These studies concluded that gender invariance was consistently observed at the configural and metric levels, and sometimes observed at the scalar level (with non-invariance observed in Australian samples [30]). The sensitive nature of χ2 tests to large sample sizes often results in an unnecessarily ‘stringent’ approach, thus differences in GOF indices (i.e., CFI and RMSEA) have been the preferred method to evaluate invariance in SWB across gender groups [7, 30, 43, 56].
Item Response Theory (IRT)
IRT is a relatively modern technique that is often projected to overcome some of the limitations that exist with Classical Test Theory (CTT; [15]). First, CTT assumes that the best possible individual score is a composite of observed scores and error resulting in sample-dependent inferences [22]. This results in a major limitation often called sample dependency [23]. Alternatively, IRT emphasises item-person relationships enabling inferences to be made at different levels of the latent trait and thus be sample independent. Second, unlike CTT, IRT can estimate reliability coefficients at the test and item level [23]. Analysing reliability coefficients at the item level can provide greater insights into measurement reliability, enabling a robust evaluation of internal construct and item validity [16].
In the context of IRT, the item-participant relationship is represented by the probability that participants with a certain level of the latent trait (in this case SWB) will endorse a particular item [22]. For example, students with greater math capabilities will be more likely to respond correctly to a difficult math item. This is graphically represented by the item response function (IRF) through a nonlinear (logit) regression line [22]. The exact value of the probability that an individual will endorse an item depends on a set of item parameters including item difficulty (β) and discrimination (α). Difficulty (β) specifies the level of the latent trait required where a participant will endorse a specific item or criterion [25]. For example, ‘easier’ items have lower β values and their IRF is displayed closer to the horizontal axis. In this context, easier items may be endorsed by most participants because it would require little SWB to agree with the proposed criterion/statement. Contrarily, those who endorse ‘difficult’ items may have higher SWB [22]. Discrimination (α) describes how steeply the rate of endorsing an item varies considering the level of the latent trait in each participant [25]. Therefore, items more strongly related to the latent variable present steeper IRF functions and can accurately discriminate different levels of the latent trait (i.e., SWB). IRT models differ according to the estimated number of parameter logistic (PL; [16]). For example, Rasch models behave like 1PL models and assume equal α across different items. Alternatively, Graded Response (GR) or Generalised Partial Credit (GPC) models behave like 2PL models and include free estimation of β and α across items [22]. To maximise information attained utilising IRT and seeing as the WEMWBS was measured employing a 5-point Likert scale, the GR and GPC models were assessed.
Additionally, differential item functioning (DIF) methods can be used to determine whether men and women respond differently to specific items within the WEMWBS [53]. There are three reasons why IRT methods are more suitable than CTT methods to detect DIF [10]: (i) IRT provides more accurate statistical properties of items than CTT to ascertain where the item functions differently (i.e., difficulty, discrimination, or pseudo-guessing); (ii) item parameter estimates derived from IRT are less confounded and influenced with sample specific characteristics; (iii) finally, the item characteristic curve (ICC) for each group (men and women) can be exhibited via graphic illustration, which increases the comprehensibility of items displaying DIF [10].
Present Study
While WEMWBS psychometric properties have been examined with IRT models, some authors (e.g., ([2, 28, 62]) were limited to Rasch Models to assess item-participant relationships. Additionally, one study investigated the psychometric properties of the WEMWBS between participants under 65 and over 65 years of age employing GR and GPC models that freely estimate item discrimination (α; slope), and item difficulty (β; location) parameters [47]. To our knowledge, however, no other research has examined the WEMWBS employing GR and GPC models that freely estimate item discrimination (α; slope), and item difficulty (β; location) parameters in men and women. Subsequently, the present study aims to extend on previous findings related to the psychometric properties of the WEMWBS in two meaningful ways: a) it aims to expand gender MI findings using relaxed research methods (i.e., ΔCFI, ΔRMSEA) from a different national sample; and b) it will be the first to investigate the DIF of the WEMWBS items through GR and GPC models for participants with differing levels of SWB. This is noteworthy in three ways. Firstly, it will add clarity regarding the comparability of men and women from scores within the WEMWBS in both clinical practice and research. Secondly, it will allow ranking of the WEMWBS items based on their psychometric performance (i.e., item priority ranking). Finally, it will inform how particular items from the WEMWBS may provide reliable and/or less reliable information among men and women with both higher and lower levels of SWB. We expect the scale to be invariant across gender and to have differing levels of reliability across different responses and scale scores.