There is a clear increase in availability of studies on levels of mental health of health workers over time, with half of the included articles being published in 2015 and later and less than 5% before 2005 (Fig. 2). Of note, 40% of studies did not report the year in which data collection took place (quality criterion 1).
88% of studies assessed only one mental health outcome, 11% two outcomes, and 1% three outcomes. As described in more detail below, 67% of studies assessed burnout, 25% general psychological wellbeing, 10% depression, and 10% other mental health outcomes. Given this relative homogeneity, particularly in light of an on-going debate around whether it is clinically meaningful to differentiate burnout and depression, especially when using self-reported measures [32], as well as the large heterogeneity in measurement tools used for general psychological wellbeing, we decided against organizing results by mental health outcome as might have been most intuitive.
Rather, in the following, we first present data related to availability and characteristics of studies on mental health levels, by study setting, study population, and study methodology and outcomes, along with findings related to the methodological quality of the included studies. The second part of the results section presents studies on correlates of mental health levels, with a focus on the investigated factors and on methodological aspects.
A comprehensive overview of all included studies including key characteristics can be found in Additional Files 3.
Research on levels of good or poor mental health amongst health workers in LLMIC
Our broad search strategy allowed us to include both studies with an explicit aim to assess levels of mental health as well as studies with a different primary study aim, but having reported data on mental health levels in the process. 84 of the 135 included studies (62%) explicitly aimed to assess mental health levels. For simplicity and in line with how the overwhelming majority of authors have called their studies, we will call these studies “prevalence studies” in the following.
The remaining 49% included studies with a focus on assessing factors associated with mental health, studies where mental health was one predictor variable among many for another substantive outcome variable, studies testing specific hypotheses including mental health variables (usually from the psychology literature), or validation studies of tools to measure the mental health outcome or a different construct.
This section provides an overview over findings from studies investigating levels of mental health. Key results pertaining to study characteristics are summarized in Table 2, overall as well as for the subset of explicit prevalence studies. Results regarding study quality are summarized in Table 3.
Table 2
Studies assessing levels of mental health - key results by study type
| All studies | Prevalence study subset |
---|
Total number of studies | 135 | 84 |
Study region | | |
| WHO Africa Region | 56 (41.5 %) | 34 (40.5 %) |
| WHO Eastern Mediterranean Region | 39 (28.9 %) | 23 (27.4 %) |
| WHO South-East Asia Region | 35 (25.9 %) | 23 (27.4 %) |
| WHO Western Pacific Region | 5 (3.7 %) | 4 (4.7 %) |
Study setting* | | |
| Urban | 118 (87.4 %) | 77 (91.7 %) |
| Rural | 39 (28.9 %) | 22 (26.2 %) |
| Multi-site | 90 (66.7 %) | 52 (61.9 %) |
Study population* | | |
| Primary level of care | 35 (26.3 %) | 20 (23.8 %) |
| Second and/or tertiary level of care | 126 (94.7 %) | 80 (95.2 %) |
| Public sector | 123 (91.1 %) | 75 (89.3 %) |
| Private sector | 45 (33.3 %) | 28 (33.3 %) |
| Medical doctors | 72 (53.3 %) | 50 (59.5 %) |
| Nurses | 96 (71.1 %) | 55 (65.5 %) |
| Other clinical staff | 38 (28.2 %) | 26 (31.0 %) |
| Other managerial staff | 3 (2.2 %) | 3 (3.6 %) |
Sampling | | |
| Census | 50 (37.0 %) | 40 (47.6 %) |
| Random / stratified random sample | 24 (17.8 %) | 15 (17.9 %) |
| Convenience sample | 37 (27.4 %) | 17 (20.2 %) |
| Unclear | 24 (17.8 %) | 12 (14.3 %) |
Study outcomes* | | |
| Burnout | 90 (66.7 %) | 57 (67.9 %) |
| Depression | 14 (10.4 %) | 10 (11.9 %) |
| Anxiety | 10 (7.4 %) | 7 (8.3 %) |
| Trauma | 5 (3.7 %) | 3 (3.6 %) |
| General psychological wellbeing | 33 (24.4 %) | 20 (23.8 %) |
Outcome measurement* | | |
| Continuous outcome | 73 (54.1 %) | 36 (42.9 %) |
| Proportions / categorical outcome | 90 (66.7 %) | 71 (84.5 %) |
Results reporting | | |
| By cadre: yes, by design | 87 (64.4 %) | 52 (61.9 %) |
| By cadre: yes | 22 (16.3 %) | 17 (20.2 %) |
| By cadre: no | 26 (19.3 %) | 15 (17.9 %) |
| By gender: yes, by design | 14 (10.4 %) | 6 (7.1 %) |
| By gender: yes | 47 (34.8 %) | 36 (42.9 %) |
| By gender: no | 74 (54.8 %) | 42 (50.0 %) |
*Proportions do not add up to 100% as many studies include mixed settings/study populations and/or several outcomes |
Table 3
Studies assessing levels of mental health - study quality by study type
| All studies | Prevalence study subset |
---|
Qual 1: Is the year of data collection mentioned? | | |
| 1 Mentioned | 82 (60.7 %) | 56 (66.7 %) |
| 0 Not mentioned | 53 (39.3 %) | 28 (33.3 %) |
Qual 2: Is the study setting adequately described? | | |
| 2 Fully adequate | 116 (85.9 %) | 75 (89.3 %) |
| 1 Mostly adequate, with only some ambiguity | 10 (7.4 %) | 4 (4.8 %) |
| 0 Insufficient | 9 (6.7 %) | 5 (5.9 %) |
Qual 3: Is the study population clearly specified and defined? | | |
| 2 Fully clear | 129 (95.6 %) | 82 (97.6 %) |
| 1 Mostly clear, with only some ambiguity | 5 (3.7 %) | 2 (2.4 %) |
| 0 Unclear | 1 (0.7 %) | 0 (0 %) |
Qual 4: Is the sample likely to be representative for the intended study population? | | |
| 2 Highly likely | 30 (22.2 %) | 19 (22.6 %) |
| 1 Somewhat likely | 46 (34.1 %) | 36 (42.9 %) |
| 0 Unlikely | 59 (43.7 %) | 29 (34.5 %) |
Qual 5: Are the study participant characteristics described in sufficient detail? | | |
| 2 Fully adequate | 112 (83.0 %) | 72 (85.7 %) |
| 1 Mostly adequate, with only some ambiguity | 15 (11.1 %) | 9 (19.7 %) |
| 0 Insufficient | 8 (5.9 %) | 3 (3.6 %) |
Qual 6: Is the tool(s) used to measure the outcome(s) adequately reported? | | |
| 2 Fully adequate | 115 (85.2 %) | 74 (88.1 %) |
| 1Mostly adequate, with only some ambiguity | 10 (7.4 %) | 7 (8.3 %) |
| 0 Insufficient | 10 (7.4 %) | 3 (3.6 %) |
Qual 7: Is convincing information on the validity of the tool(s) used to measure the outcome(s) reported? | | |
| 2 Convincing | 11 (8.2 %) | 5 (6.0 %) |
| 1 Partially convincing | 36 (26.7 %) | 15 (17.8%) |
| 0 Unconvincing or no information provided | 88 (65.2 %) | 64 (76.2 %) |
Qual 8: Is all necessary background information to interpret numeric representations of measurements provided? | | |
| 2 Convincing | 11 (8.2 %) | 5 (6.0 %) |
| 1 Partially convincing | 36 (26.7 %) | 15 (17.8%) |
| 0 Unconvincing or no information provided | 88 (65.2 %) | 64 (76.2 %) |
Qual 9: Are results adequately displayed? | | |
| 1 Adequate | 124 (91.8 %) | 76 (90.5 %) |
| 0 Unclear or containing obvious errors | 11 (8.2 %) | 8 (9.5 %) |
Combined quality classification | | |
| High quality | 25 (18.5 %) | 18 (21.4 %) |
| Moderate quality | 81 (60.0 %) | 52 (61.9 %) |
| Low quality | 29 (21.5 %) | 14 (16.7 %) |
Study countries and settings
Study country. Of the 135 studies reporting on levels of poor or good mental health, all but two reported on one country only, while two reported on three countries each. The total number of country samples therefore amounts to 139, and covers a total of 26 unique countries (Fig. 3).
Geographically and in alignment with where most LLMIC are located, most studies (41%) were conducted in countries of the WHO Africa Region, with 56 studies across 12 countries, therefore covering about one third of the region’s 39 LLMIC. 39 studies (29%) were conducted in the WHO Eastern Mediterranean Region, in 6 out of the region’s 10 LLMIC (60%). 35 studies (26%) were conducted in the WHO South-East Asia Region, in 5 out of the region’s 9 LLMIC (56%). Finally, 5 studies (4%) were conducted in the WHO Western Pacific Region, in 3 out of the regions 10 LLMIC (33%). None of the 6 LLMIC in the WHO European Region and of the 4 LLMIC in the WHO Region for the Americas is represented in the included studies.
Close to two thirds of studies were conducted in four countries, namely India (29 studies), Nigeria (25 studies), Pakistan (15 studies), and Egypt (10 studies). The number of available studies in each of the remaining 22 countries ranged between 1 and 7.
Study setting. 65% of studies were conducted in urban settings, 7% in rural settings, 22% in both urban and rural settings, and for 6%, it was not possible to judge based on the reported information. 80 studies (59%) were conducted within only one city, of which 45 studies (33% of all included studies) within only one healthcare facility, most of which university or other tertiary care hospitals. For multi-site studies, the number of healthcare facilities from which respondents were sampled ranged from 2 to 89 (mean = 12.8, sd = 18.4), with 13% of studies not reporting and 14% of studies having sampled through other channels.
Reporting of the study setting (quality criterion 2) was largely satisfactory, with 86% of studies reporting adequate information. Among explicit prevalence studies, reporting was even slightly better.
Study populations and samples
Study population. 93% of study samples included health workers working at secondary- and/or tertiary-level health facilities, of which 42% exclusively tertiary-level staff. Only 26% of study samples, from a total of 13 countries, included health workers at the primary level of care. 2 studies did not provide information.
64% of studies focused exclusively on the public sector, 6% exclusively on the private (including faith-based) sector, 27% included a mix of both, and for 3% it was not possible to judge based on the information provided. Studies including private-sector health workers were conducted across a total of 14 countries.
We classified study populations into medical doctors, nurses, other clinical staff, and other managerial staff. The “other clinical staff” category contains dentists, pharmacists, auxiliary staff (i.e., “untrained” clinical staff according to the WHO/ILO definition used as inclusion criterion), laboratory and imaging personnel, physiotherapists, environmental health staff, and other paramedical staff. It must be noted that within the medical doctor and nursing categories, many studies focused on staff with specific specialization.
64% of studies investigated only one staff category, whereas the remaining 36% two or more. 53% of studies included medical doctors, 71% of studies included nurses, 28% of studies included other clinical staff, and 2% of studies included other managerial staff.
96% of studies reported fully adequate information on the study population (quality criterion 3; 98% in prevalence studies).
Sampling and resulting samples. 85% of studies sampled respondents using multi-step procedures, where they first selected health facilities, some then specific departments within health facilities, and then respondents. Some studies had further explicit pragmatic inclusion criteria such as only staff who had been working at the facility for a specific time frame, or inclusion criteria related to the main study aim, such as only respondents who had witnessed death recently. Only 14% of studies sampled respondents directly, for instance by making use of other channels such as mailing lists or meetings of professional associations or snowballing in the researchers’ own networks.
37% of studies described their sampling strategy as a census, 18% as a random or stratified random sample, and 27% as a convenience sample. For 18% of studies, the sampling strategy was unclear (14% among prevalence studies). The proportion of studies with a census or random/stratified random sample was substantially higher among explicit prevalence studies than among studies with a different primary aim (65% vs. 37%).
Among the 24 studies with a declared random or stratified random sample, 42% provided a rationale for the envisioned sample size (40% of prevalence studies). 19 studies (79%) provided a response rate (87% of prevalence studies). 15 of the 19 studies reported a response rate above 70% (10 of the 13 prevalence studies). It is important to note, however, that 6 studies reported response rates close to or of 100%, calling into question whether the studies actually used a fully random sample as opposed to some elements of replacement and/or convenience.
Of the 37 studies with a declared convenience sample, 8 provided a rationale for the envisioned sample size, 12 did not, and 17 did not state which sample size they attempted to reach at all.
Among the 50 studies with a declared (attempted) census, 38 (76%) provided a response rate (80% of explicit prevalence studies). 23 of the 38 studies reported a response rate above 70% (18 of the 32 prevalence studies).
Resulting sample sizes ranged from 29 to 2245 respondents (mean = 274.4, sd = 273.9). The difference in sample size between prevalence studies and studies with a different primary aim was small (mean 281.3 vs. 262.8, not statistically significant).
Based on the description of the sampling strategy and resulting sample (quality criterion 4), we judged only 22% of studies to be based on a sample highly likely to be representative of the intended study population (23% of prevalence studies). 34% of studies presented a convincing description of a census or random sample, but with a response rate below 70% or not provided, or a very convincing description of a convenience sample (43% of prevalence studies). 44% of studies presented an unconvincing description or not enough information to judge (35% of prevalence studies).
Of note, as most studies sampled health workers through health facilities and relied on health workers present at work, they by design did not capture health workers ill enough not to be able to work, making them prone to a systematic underestimation of severe cases of mental illness. This was discussed and acknowledged as a limitation by only a handful of studies.
Reporting of key respondent characteristics (quality criterion 5), defined by us as at minimum sex, age and/or seniority in health care, and health worker type or cadre, was fully adequate in 83% of studies (85% among prevalence studies), mostly adequate with only some ambiguity or omission in 11% of studies, and insufficient in 6% of studies.
Study outcomes and measures
Study outcomes. 88% of studies assessed only one mental health outcome, 11% two outcomes, and 1% three outcomes. 67% of studies assessed burnout, 10% depression, 7% anxieties, 4% traumata, and 25% general psychological wellbeing.
Outcome measures. All burnout studies used self-reported measures rather than diagnostic interviews. Among the 90 burnout studies, the Maslach Burnout Inventory - Health Services Survey (MBI-22 HSS) was by far the most common measurement tool, used by 55 studies (61%). A further 15 studies used adaptations of the MBI (e.g., only one subscale, only selected items) or unspecified MBI versions. 7 studies (7%) used the Copenhagen Burnout Inventory, either in full or one of the subscales. The remaining 13 studies used other established or self-developed tools including the Freudenberger Burnout Scale; the Oldenburg Burnout Inventory; the Shirom Melamed Burnout Inventory; a two-item measure developed by Mbindyo et al [33] as part of a motivation inventory; or a single-item direct question (“Do you feel burned out?”).
Among the 14 studies having assessed depression, only one used a clinical interview (depression component of the Structured Clinical Interview for DSM-IV), while the remaining 13 studies used self-reported measures, including the Depression, Anxiety and Stress Scale (DASS-21; 4 studies); the Aga Khan University Anxiety and Depression Scale (2 studies); the Patient Health Questionnaire (PHQ-9; 2 studies); the Zung Depression Scale (2 studies); the Beck Depression Inventory (1 study); the Standardized Hospital Anxiety and Depression Scale (1 study); and the Death Distress Scale (1 study).
Similar to depression, only 1 of the 10 studies assessing anxiety used a clinical interview (anxiety component of the Structured Clinical Interview for DSM-IV), while the remaining 9 studies used self-reported measures. The latter included the DASS-21 (4 studies); the Aga Khan University Anxiety and Depression Scale (1 study); the Spielberger State-Trait Anxiety Inventory (1 study); the Zung Anxiety Scale (1 study); the Standardized Hospital Anxiety and Depression Scale (1 study); and the Death Distress Scale (1 study).
All 5 studies assessing trauma used self-reported measures, including the Impact Event Scale – Revised (3 studies); the PTSD Checklist (1 study); and PROQoL (1 study).
Finally, all 33 studies assessing general (ie diagnosis-unspecific) psychological wellbeing used self-reported tools, including the General Health Questionnaire (GHQ) in the 12-item (11 studies), the 28-item (3 studies), the 30-item (5 studies), or an unspecified (1 study) version; the Warwick Edinburgh Mental Wellbeing Scale (4 studies); the WHOQOL-BREF (4 studies); the WHO-5 Wellbeing Index (1 study); the SF-36 mental health subscale (1 study); the SRQ-20 (1 study); the Reker Wong Perceived Wellbeing Scale (1 study); and an unspecified tool (1 study).
We considered reporting of the tool(s) to measure outcome(s) adequate (quality criterion 6) if name, version, language, and any potential modifications were clearly reported or referenced. In the case of non-established tools, we expected a clear description including the item list and response modalities. 85% of studies met our criteria (88% of prevalence studies), whereas 7% of studies each either reported with some ambiguity or insufficiently.
Of note, only few articles demonstrate awareness of the limitations and implications associated with using a self-reported tool rather than a clinical interview to measure the mental health outcome.
Validity considerations. Given the culture-sensitive nature of mental health and the predominant use of self-reported measures, we further assessed the extent to which studies provided convincing information of the validity of the tool used to measure the intended mental health constructs (quality criterion 7). We considered validity information as convincing if the study provided self-generated content and criterion validity (e.g. a convincing combination of expert judgement/qualitative pre-study, Confirmatory Factor Analysis, and assessment of relationships with related constructs) which based on the description was achieved following standard psychometric quality criteria and yielded adequate psychometric results, or if the study referred to an external validation paper which was accessible, provided similar high-quality evidence, and was carried out in a similar population (ie at minimum same country or cultural context, even if different population). For studies reporting their measurements in categorical fashion (see below), we further required context-appropriate validity evidence of the threshold used to classify respondents into different mental health categories.
Only 8% of studies provided information which we considered convincing. 27% of studies provided some, but incomprehensive or not fully convincing validity information. 65% of studies provided no or unconvincing validity information. The proportion of studies providing convincing validity information was even lower among explicit prevalence studies (convincing: 6%; somewhat convincing: 18%; insufficient: 76%), which is due to the higher proportion of studies reporting categorical outcomes and failing to provide validity evidence regarding the used thresholds to categorize respondents.
Of note, only 25% of studies reported having performed a pretest (both overall and among explicit prevalence studies). Irrespective of the quality or appropriateness of the information, 62% of studies provided some information on reliability (usually Cronbach’s alpha) and 49% of studies provided some information on validity (usually references to the tool manual and/or validation studies conducted in high-income settings). Among explicit prevalence studies, some information on reliability and validity (irrespective of quality of the information) was provided by 54% and 50%, respectively.
Measurement. Beyond the measured outcomes and tools themselves, studies differed in how they reported the outcome measurements. All utilized measurement tools employed either Likert response scales or symptom counts and therefore, in a first analytical step, resulted in a quasi-continuous numeric measurement. 33% of the included studies reported outcome measurements in this “crude” metric, i.e. as means of sum scores or scale means (16% of prevalence studies). 46% of studies divided respondents into different categories along this quasi-continuous raw score, and reported proportions of participants in each category (57% of prevalence studies). 21% of studies reported data both in quasi-continuous and in proportional form (27% of prevalence studies).
In this context, we also assessed the extent to which the authors provided all necessary background information to interpret numeric representations of measurements (quality criterion 8), including the numeric codes used for response options, information on aggregation, and for studies reporting proportions, information on thresholds for categorization. 58% of studies reported sufficient information to allow interpretation and comparison to other studies having used the same measurement tool (56% of prevalence studies). For 28% of studies, there was some ambiguity, about which reasonable assumptions can be made however (33% of prevalence studies). For 14% of studies, information was insufficient (11% of prevalence studies).
Given the widespread use of the MBI-22, we wish to highlight three specific issues frequently encountered and complicating comparability of results across studies. First, how to use the MBI is rather strictly prescribed by the publishers. However, many studies did not adhere to the publisher’s prescription, rather having altered certain items, used different response scales in regards to number of answer options and numeric representation of answer options, leading to different score ranges, as well as in regards to labelling of answer options. Often, not enough detail was provided to understand what was done exactly, compromising the extent to which findings can be compared across studies. Second, the publishers made small updates to the MBI-22 over time, particularly in relation to the thresholds used to categorize severity of burnout. Many studies unfortunately did not report which version they used, compromising comparability between studies even in otherwise relatively homogeneous settings. For instance, of the seven studies from India having used the MBI-22 in its recommended form and with categorical reporting of results, two used the 2nd edition threshold, four did not report or allow to elsehow infer whether they used the 1st or 2nd edition thresholds, and one appears to have used a mix of both. Finally, the MBI-22 measures three sub-constructs of burnout, one of which is “reduced personal accomplishment” (RPA). The items intended to measure RPA are reversely phrased, however, so that a high raw score indicates low burnout, unlike for the other two subscales. In a large number of studies, it did not become fully clear whether the authors had reversed RPA scores/proportions so that they are interpretable “in the same direction” as the other two dimensions, or whether they report original scores. Sometimes, inference from the description of results or discussion was possible, whereas in other cases, both numeric estimates and description left doubt as to whether the authors had or had not reversed scores and/or interpreted results correctly.
Results reporting
Depending on the study aim and population, studies reported estimates of levels of mental health either overall for the entire study sample, or broken down by different sample subgroups. For simplicity, we only assessed the extent to which studies broke down results by two key sample characteristics, namely cadre and gender. However, many studies also reported estimates broken down by other demographic characteristics such as marital status or age, by work-related factors such as qualification level, work experience, or place of work, or by other factors assessed in relation to mental health, as described in more detail below.
In regards to health worker cadre, 64% of studies included respondents from only one staff category and therefore by design reported estimates by cadre (62% of prevalence studies). Of the remaining 48 studies including mixed samples, 46% reported estimates separately by cadre and 54% provided only overall estimates (among prevalence studies: 53% vs 47%, respectively).
In regards to gender, 10% of studies only estimated mental health levels among one gender (usually female nurses) or had a heavily skewed sample in terms of gender, presumably reflecting the reality in the context (usually predominantly male medical doctors with female doctors below 5%) (7% among prevalence studies). Of the remaining 121 studies, 39% reported estimates separately by male and female participants, whereas 61% only provided overall estimates (among prevalence studies: 46% vs 54%, respectively).
We further assessed the extent to which results were displayed adequately, meaning that they could be read without necessitating guesses and that they did not contain obvious errors (quality criterion 8). This was the case for 92% of all studies, and 91% among explicit prevalence studies.
Overall study quality
From the quality judgements in the nine individual categories presented above, we further calculated an overall quality classification for each study as outlined in detail in Additional Files 2. In order to be classified as of high quality, a study had to report results in a readable manner (quality criterion 9), provide sufficient information to allow for the measurements to be interpreted (quality criteria 6 and 8), and reach satisfactory quality scores on all other criteria combined.
As shown at the bottom of Table 3, only 19% of all studies fulfilled our criteria for high quality. We found the majority of studies as of moderate quality (60%), and 21% as of low quality. Among explicit prevalence studies, the proportion of studies in the high and moderate categories was only marginally higher than among all studies.
We did not observe any trend in study quality over time, nor any striking differences by region.
Research on factors associated with mental health amongst health workers in LLMIC
As described above, of the 143 relevant studies identified by our search, 126 included information on associations of mental health with other factors. 118 studies included both estimates of levels of good or poor mental health as well as relationships with one or multiple correlates of mental health. Only 8 studies focused exclusively on associations; these were mostly studies from the field of psychology where mental health was one of multiple constructs assessed in relation to a different focal outcome. Given this large extent of overlap as well as the relatively lower importance of study setting in the context of establishing associations, we will not present study characteristics (setting, population, sample, outcome, measurement), but rather focus on which factors have been investigated as well as on methodological and reporting considerations in the following.
Of the 126 studies reporting associations of mental health with one or more other factors, 100 (79%) reported this investigation of associations to be a key study aim, whereas the remaining 26 studies did so as a side-product of an estimation of mental health levels, as part of a validation study, or with mental health being a minor correlate to a different focal construct of interest.
Table 4 summarizes the key findings elaborated below, for all studies as well as only for the subset of studies with an explicit aim to assess associations of health worker mental health with other factors (“associations studies”).
Table 4
Studies assessing factors associated with mental health - key results by study type
| All studies | Associations study subset |
---|
Total number of studies | 126 | 100 |
Assessed factors* | | |
| Socio-demographic characteristics | 94 (75.6 %) | 74 (74.0 %) |
| Tangible work factors | 96 (76.2 %) | 75 (75.0 %) |
| Intangible work factors (perceptions, attitudes) | 79 (62.7 %) | 69 (69.0 %) |
| Health factors | 29 (23.0 %) | 23 (23.0 %) |
| Job performance | 6 (4.8 %) | 4 (4.0 %) |
Is it clear which factors were considered? | | |
| Clear and unambiguous from introduction/methods | 64 (50.8 %) | 57 (57.0 %) |
| Inference possible from results section | 56 (44.4 %) | 39 (39.0 %) |
| Unclear | 6 (4.8 %) | 4 (4.0 %) |
Is it clear how the factors were measured? | | |
| Largely clear | 74 (58.7 %) | 55 (55.0 %) |
| Partly clear, partly unclear | 46 (36.5 %) | 39 (39.0 %) |
| Mostly unclear | 6 (4.8 %) | 6 (6.0 %) |
Is the analytical technique clearly described or inferable? | | |
| Clear | 112 (88.9 %) | 89 (89.0 %) |
| Unclear | 14 (11.1 %) | 11 (11.0 %) |
Was bivariate/univariate analysis performed? | | |
| Yes | 114 (90.5 %) | 89 (89.0 %) |
| No | 12 (9.5 %) | 11 (11.0 %) |
Was multivariate analysis performed? | | |
| Yes | 66 (52.4 %) | 60 (60.0 %) |
| No | 56 (44.4 %) | 36 (36.0 %) |
| Not applicable (only one associated factor investigated) | 4 (3.2 %) | 4 (4.0 %) |
*Proportions do not add up to 100% as many studies assess factors from multiple categories and/or use multiple analytical techniques |
Of note, among the studies with an explicit focus on assessing associations of mental health with other factors, most frame their study in causal terms, in the sense of aiming to investigate determinants or consequences of mental health. However, all but a handful use cross-sectional study designs and analytical techniques that strictly speaking do not allow for causal inference.
Assessed correlates of mental health
The number of investigated correlates of mental health ranged from 1 to 30 (mean = 9.4, sd = 6.3), without notable differences between studies with and without explicit aim to investigate associations.
75% of studies assessed one or more socio-demographic characteristics; 76% tangible work factors; 63% intangible work factors; and 23% other health factors in relation to mental health, most frequently smoking, alcohol consumption, and chronic health issues. Finally, 6 studies assessed self-reported adverse behavior towards patients or other perceived job performance aspects in relation to mental health. Figure 4 provides an overview over all factors assessed by more than 5% of the studies.
It must be noted that with the exception of gender, studies differed in how they captured even basic demographic characteristics such as marital status (differences in grouping) and children (any vs number; number in continuous form vs various categories). Unsurprisingly, differences became more pronounced the more complex the investigated factors. Highly complex constructs, such as job satisfaction or work demands, were rarely measured with the same tool and reported in the same way by two or more studies.
Irrespective of the methodological challenges related to causality discussed above, there seems to be a clear focus of the currently available literature on measuring potential determinants or proximal and intangible outcomes of mental health, whereas research on relationships between mental health levels and more distal and tangible outcomes such as performance or actual turnover or absenteeism is practically non-existent. The few studies having assessed performance outcomes have done so via self-report rather than an objective performance measure.
Methodological and reporting considerations
As described above, we did not perform a study quality assessment analogous to studies reporting on levels of mental health due to the large heterogeneity of factors assessed, which would have necessitated a quality assessment by factor rather than by study. However, we did assess whether studies provided key information necessary for understanding the results, and for potentially comparing them across studies.
Measurement. 64 studies (51%) provided a clear and unambiguous description of which factors they considered in relation to mental health, either in the introduction or methods section. For a further 56 studies (44%), inference of all considered factors was possible from the results section. Only 6 studies (5%) left doubts as to which factors the authors had considered. This was usually because a full list was absent from the introduction and/or methods sections, and only significant associations were reported. Reporting was slightly better among studies with an explicit aim of assessing associations with mental health.
For the majority of studies (74; 59%), the description of how the investigated factors were measured was largely clear in the sense that the measurement tool and response scale/options were clearly reported and allowed for interpretation and comparison of the reported measures of strength of association. For 37% of studies, the description was clear for some of the investigated factors, but not for others. 6 studies provided insufficient information on most or all investigated factors. Reporting was slightly worse among studies with an explicit aim of assessing associations with mental health. Among the 35 studies having assessed only socio-demographic characteristics or tangible work factors, so factors which are comparatively more straightforward in measurement, 89% provided clear information on how factors were measured.
Analysis and reporting of results. For 89% of studies, either the methods or the results section allowed to understand which analyses were performed and how, the latter particularly in relation to multivariate analyses and whether all or only a subset of factors were included.
114 (91%) studies reported, exclusively or in a first step, the results of bivariate analyses between the measured mental health outcome and each associated factor under investigation. Of those, 55 (48%) reported the strength of association, either as correlations or as bivariate regression (beta or odds ratios) results. 66 studies (58%) rather reported group differences, resulting from either ttests/ANOVAs or from chi2 or some other non-parametric test as per the respective variables under investigation. In the vast majority of studies where group differences were reported, group means or proportions on the mental health outcome were provided along with the results of the difference test.
66 studies (52%) reported the results of multivariate analyses between the measured mental health outcome and the associated factors under investigation, of which 12 exclusively, so without prior bivariate analysis. All analysis employed multivariate regression, including either all potential associated factors under investigation or utilizing a step-wise approach where only factors significantly associated with mental health in a prior series of bivariate analyses were included. 33 studies provided estimates of strength of association for all factors included in the analysis, whereas the other 33 studies provided details only for those coefficients which emerged as statistically significant from the analysis. 55% of the studies reported beta coefficients, 32% odds ratios. In 8% of studies, we could not infer which type of effect estimate the reported coefficient represented. The remaining 6% of studies did not report any coefficients, but only stated in the text that associations were or were not statistically significant.