This study was conducted in line with the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) Study Design checklist for Patient-reported outcome measurement instruments (16).
Ethical concerns
King’s College London (Ref: BDM/13/14-99) and University of Nigeria Teaching Hospital (Ref: UNTH/CSA/329/Vol.5) gave ethical approval. The World Health Organisation gave permission to adapt the measure.
Study designs
This study involved cross-cultural adaptation, test-retest measurements and cross-sectional study of psychometric properties of the Igbo version of the WHODAS 2.0.
Outcome measurement tools
World Health Organisation Disability Assessment Schedule (WHODAS 2.0)
The WHODAS 2.0 is a comprehensive measure that assesses disability within the ICF biopsychosocial model of disability. It emphasizes the six domains of cognition, mobility, self-care, getting along with people, life activities and participation – including work-related disability. The cognition domain measures an individual’s difficulty in understanding and communicating. The mobility domain quantifies a person’s difficulties in getting around. The self-care domain assesses someone’s difficulties in taking care of oneself. The getting along with people domain measures an individual’s difficulties in getting along with people. The life activities domain assesses the difficulty with which activities involved in maintaining an individual’s household or work/school are performed. The participation domain measures a person’s difficulty with participating in their society and the impact of the specific health problem on them and their family. These difficulties are measured within the last 30 days. The measure has good face and content validity, construct validity, internal consistency, test-retest reliability and responsiveness. The Cronbach’s alpha ranges between 0.94 and 0.98. The test-retest reliability ranges between 0.93 and 0.98. The sensitivity to change ranges between 0.46 and 1.38. The 36-item interviewer-administered version (with simple and complex scoring methods) was used due to its relevance in populations with low literacy. Simple scoring involves assigning values “none” =1, “mild” =2 “moderate” =3, “severe” =4 and “extreme” =5, which are simply added up without weighting of individual items. However, this method may not be comparable across populations and conditions. Therefore, the complex scoring method was used in this study. Complex scoring is an “item-response-theory” (IRT) based scoring that takes into consideration multiple levels of difficulty for each item. It involves summing recoded item scores in each domain, summing all six domain scores, and converting the summary score into a metric ranging from 0 (no disability) to 100 (full disability) (15).
The WHODAS 2.0 has been translated into 47 languages (Amharic, Arabic, Bengali, Chinese, Danish, Dari, Dutch, English, Estonian, Farsi, French, Georgian, German, Greek, Haitian Creole, Hebrew, Hindi, Indonesian, Italian, Japanese, Kannada, Kinyarwanda, Korean, Krio, Latvian, Luganda, Lusoga, Malayalam, Nepali, Norwegian, Polish, Portuguese, Russian, Slovenian, Spanish, Swedish, Tamil, Thai, Tok Pisin, Turkish, Twi, Ukrainian, Urdu, Xhosa, Xitsonga, Yoruba, and Zulu); used in 94 countries; and employed in 27 research areas (17). The WHODAS 2.0 has a Yoruba version (18), which is one of the three (Hausa, Igbo, Yoruba) major native languages in Nigeria.
Igbo Roland Morris Disability Questionnaire (Igbo-RMDQ)
Igbo-RMDQ is a valid and reliable measure of LBP disability. It is recommended as a core outcome measure for LBP clinical trials. It is simple to administer, easily understood, and is the best measure for population or primary care-based studies. Igbo-RMDQ is a twenty-four item back specific self-report measure with possible scores of 0 or 1 for each item. A score of 24 is the highest possible disability level and 0 means that there is no disability. It has good face and content validity, construct validity, internal consistency, test-retest reliability and responsiveness. It has a global Cronbach’s alpha score of 0.91; test-retest reliability of 0.84; and a 2-3-point change from baseline means clinical significance (12).
Back performance scale (BPS)
BPS is a back-specific performance-based measure of mobility-related limitation that is scored by an evaluator. The instructions are simple and involves instructing participants to perform five physical performance tests (sock test, pick-up test, roll-up test, finger-tip-to-floor test and lift test) involving trunk movements. Sock test involves simulating putting on a sock normally from the sitting position. Pick-up test involves picking up a piece of paper from the floor normally. For the roll-up test, the participant rolls up slowly from supine lying to long sitting with both arms relaxed. Finger-tip-to-floor test involves standing on the floor with both feet 10 centimetres apart. There is then forward bending with straight knees. The person then attempts to touch the floor with the fingertips. The distance between the floor and the fingertips is then measured in centimetres. The lift test involves a participant repeating the lifting of a 5-kilogram box from the floor to a 76cm table and back to the floor for one minute. The number of lifts is then recorded. Each of the five tests has scores ranging from 0 to 3 depending on the difficulty or ease with which they are performed. A total possible score of 15 signifies maximum disability while 0 means no disability (19). The BPS has good validity and reliability. It has internal consistency of 0.73; moderate correlations with self-reported back pain specific disability (r= 0.454), and test-retest reliability of 0.91 (19,20).
Eleven-point box scale (BS-11)
BS-11 is a single item eleven-point numeric scale for pain intensity (21). It consists of eleven numbers (0 to 10) in boxes. Zero means ‘no pain’ and 10 is ‘pain as bad as you can imagine’ or ‘worst pain imaginable’. The measure is easily understood. In contrast, low literate populations in Nigeria found the visual analogue scale difficult to comprehend (22).
Cross-cultural adaptation process
Translation is the linguistic paraphrasing of a questionnaire. Conversely, cross-cultural adaptation involves translation and cultural adaptation to enable the content validity of the instrument to be at similar conceptual levels in different contexts (12).
Participants involved in the cross-cultural adaptation process
One clinical physiotherapist who had practised for 16 years in Nigeria and three non-clinical translators (one native English speaker [bilingual in English and Igbo], one native Igbo speaker [bilingual in Igbo and English], and one English/Igbo linguistic expert) were the translators. An English health psychologist with expertise in research methodology and an English academic physiotherapist working in the United Kingdom, an Igbo clinical psychologist and an Igbo clinical physiotherapist working in Nigeria, made up an external expert review committee. A convenience sample of 12 adults living with non-specific CLBP in rural Nigeria who had participated in a previous study (22) and gave informed consent, were involved in piloting/pre-testing the adapted measure (qualitative assessment of content validity).
Procedure adopted for cross-cultural adaptation
The original WHODAS 2.0 was translated and culturally adapted using evidence-based guidelines (16,23,24) as illustrated in Figure 1 below.
First step – the WHODAS 2.0 was forward translated independently from English to Igbo by one clinical physiotherapist (native Igbo speaker, bilingual in Igbo and English) and one bilingual non-clinical translator (native Igbo speaker, bilingual in Igbo and English) to obtain two Igbo versions: T1 and T2 respectively. The forward translators were both fluent in English. The physiotherapist, a specialist in musculoskeletal physiotherapy, had all the items explained to her to facilitate an understanding of the construct being assessed to ensure psychometric equivalence with the original WHODAS 2.0. For the non-clinical translator, items were not defined to ensure that the language and expressions used in the translation reflected the routinely used language in the population.
Second step – a discussion between the two forward translators, mediated by the bilingual (English and Igbo) lead author resulted in the synthesis of T1 and T2 to produce one Igbo version: T-12. The two forward translated versions of the WHODAS were compared to the original questionnaire to inform their synthesis. The lead author compared the translations, noted, and recorded all discrepancies and discussions. The process of consensus between the translators was achieved through the analyses of the discrepancies and choosing the meaning that most closely reflected the original measure.
Third step – the synthesized Igbo version (T-12) was back translated from Igbo to English by two back non-clinical translators blinded to the original WHODAS 2.0 and the construct it measures, and were naïve in the disease involved. This produced two back-translated English versions: BT1 and BT2. One of the back translators was an English/Igbo linguistic expert proficient in the professional translation of tools, and the other was a native English speaker, born in England to Nigerian-born Igbo parents. This validation process ensured that the adapted measure was reflecting the meaning in the original WHODAS 2.0.
Fourth step – a pre-final Igbo version of the WHODAS 2.0 was produced following several meetings of the external expert review committee and translators during which all versions of the measure (T1, T2, T-12, BT1 and BT2) were discussed, mediated by the lead author.
The committee achieved semantic equivalence by exploring Igbo and English words of the same object to determine if they meant exactly the same thing; if the same terms could have several meanings; and if grammatical difficulties were encountered during the translations. The committee accomplished experiential equivalence with the original measure by ascertaining that items in both versions were experienced in the same way in the two cultures. The committee established that words in the instructions, items, and responses had comparable conceptual meanings in Igbo and English cultures (23). The Igbo words used in the translations were simple enough to be understood by anyone regardless of their educational level.
Fifth step – twelve adults living with CLBP in a rural Nigerian community (22) pre-tested the pre-final Igbo-WHODAS 2.0. This number is sufficient for the qualitative assessment of the relevance, comprehensiveness and comprehensiveness of the translated WHODAS 2.0 since the COSMIN checklist recommends a sample size of at least 7 participants (16). The think-aloud cognitive interviewing procedure was used. This involved reading out each item. Participants then loudly verbalised their thoughts as they attempted to answer each question. Participants finally stated if they encountered any difficulty understanding any item, what they understood by each question, and the perceived meaning of their selected response(s). All responses were recorded verbatim. This procedure helped to maintain equivalence between the different settings ensuring face and content validity of the Igbo-WHODAS 2.0.
Psychometric testing process
Participants (sample size calculation for test-retest reliability)
A minimum sample size of 27 is required to detect an intra-class correlation coefficient of 0.9 and a maximum width of 0.23 for a 95% confidence interval. A study for examining test-retest reliability was conducted with a convenience sample of 50 adults with CLBP who had no underlying serious pathology, radiculopathy or spinal stenosis. The participants were aged between 18 and 69 years. They were recruited from rural and urban communities in Enugu State, South-eastern Nigeria. Informed consent was duly obtained prior to participation in the study.
Participants (sample size calculation for construct validity)
A correlation coefficient of 0.2 at a level of 0.05 with a power of 80% would require a sample size of 194. In a dataset with several high factor loading scores (> 0.80), a sample size of 150 would be sufficient for exploratory factor analysis (EFA). A representative random sample of 200 adults with CLBP were recruited from rural communities in Enugu State as part of a larger population-based study (25). Participants were screened to rule out underlying serious pathology, radiculopathy or spinal stenosis. Informed consent was obtained prior to participation in the study.
Procedure for psychometric testing
A significant proportion of rural dwellers in Nigeria are not literate. Therefore, community health workers (CHWs), the front line of rural Nigerian primary health care, were recruited and trained for interviewer-administration of the questionnaires. The training was daily, face-to-face, and group-based to minimise common survey errors. A representative sample of the population obtained through multistage cluster sampling prevented coverage error. An adequate sample size and gender stratification prevented sampling error. The use of validated measures and training CHWs to avoid administering the measures in ways that could bias participants’ responses reduced measurement error. Non-response error was avoided by ensuring that no items or scales were unanswered and that all recruited participants were assessed.
Collection and fidelity of data
CHWs screened participants by asking simple questions to exclude back pain due to malignancy, spinal fracture, infection, inflammation or cauda equina syndrome. They were then asked to describe the location of their pain with a body chart to confirm pain in the lower back. The WHODAS 2.0, Igbo-RMDQ and BS-11 were then interviewer-administered with Likert scales presented to participants as ‘flash cards’ as each corresponding item was read out. ‘lower back/waist pain’ was read out to participants in place of ‘illness’. The BPS was objectively used to assess performance-based disability.
For test-retest reliability, measures were completed at baseline and repeated seven to ten days post-baseline, with the same CHW collecting data on the two occasions.
To test validity, measures were completed at one time-point in a cross-sectional design.
Fidelity checks were done to avoid systematic differences in data collection. The CHWs were given post-training examinations, and only those that passed them were recruited. This facilitated adherence to data collection protocols. Additionally, each CHW was visited by the lead author during data collection without prior notice to assess their data collection and recording.
Data analyses
IBM Statistical Package for Social Sciences version 22 (SPSS, Chicago, IL) was utilised. Visual (normal distribution curve and Q-Q plot), and statistical (Kolmogorov-Smirnov, Shapiro-Wilk’s test and Skewness/Kurtosis scores) methods for assessing normality of data were employed.
Reliability: Reliability is the ability of an instrument to measure consistently. Test–retest reliability evaluated how consistently the adapted WHODAS 2.0 consistently measured disability over time using intra-class correlation coefficient (ICC). ICC was calculated using a two-way random effects model (measurement errors arising from either raters or subjects), using an absolute agreement definition between test-retest scores. 0.7, 0.8 and 0.9 signified good, very good and excellent ICCs (26). Internal consistency (Cronbach’s alpha) depicts the extent to which all items in a test measure the same construct and was rated as weak (0–0.2), moderate (0.3 0.6) and strong (0.7–1.0) (27). Bland-Altman plots, (which accounted for the weakness of ICC which might indicate strong correlations between two measurements with minimal agreement) were employed to visually assess the agreement level between test-retest measurements by plotting mean scores against difference in total scores. Standard error of measurement (SEM) and minimal detectable change (MDC) were also used to investigate reliability. MDC is a statistical estimate of the smallest change an instrument can detect which signifies a noticeable change which is not due to measurement error. MDC was calculated with the standard error of measurement (SEM), based on the distribution method, and the reliability of the measure [25]. SEM was based on the standard deviation (SD) of the sample and the test-retest reliability (R) of the Igbo-WHODAS 2.0, and was calculated with the equations (28):
SEM = SD √(1-R)
MDC was then estimated with the equation:
MDC = 1.96 * √2 * SEM
1.96: 95% confidence interval of no change;
√2: two assessments used in determining change.
Validity: Construct validity assesses the extent to which a measure evaluates the construct it was intended to measure. The domain of construct validity assessed was convergent validity, which assesses whether two measures of the same/similar construct that are assumed to be theoretically related, are in fact related. This was investigated using Spearman’s correlation (non-parametric data) and was rated as weak (0-0.2), moderate (0.3-0.6), and strong (0.7-1.0). The WHODAS 2.0 assesses self-reported disability within the ICF multiple domains of cognition, mobility, self-care, getting along with people, life activities and participation – including work-related disability. Hence, Igbo-WHODAS 2.0 is expected to correlate at least moderately with the Igbo-RMDQ (measuring self-reported back pain-related disability), the BPS (objective measure of performance-based disability), and the Igbo-BS-11 (self-reported pain intensity measure and a predictor of self-reported disability) (25,29).
Exploratory factor analyses (EFA) was used to determine the number of factors influencing the Igbo-WHODAS (the items that go together – dimensionality). EFA was applied in line with the Kaiser Meyer Olkin (KMO) and the Bartlett’s test with eigenvalue for retention set at ⩾1.0 (Kaiser’s rule) (30). Retained and excluded factors were also explored visually on a Scree plot. Promax (oblique) rotation, which assumes that factors can be related, was done, and factor loadings less than 0.3 were suppressed. Extraction was done using principal axis factoring. The number of factors and the fundamental relationships between the items were then compared with the factor structures of the original WHODAS 2.0 to augment any insight of possible differences in population characteristics.
Floor and ceiling effects: When a high proportion of participants score the highest or the lowest score, ceiling or floor effect respectively occurs. This implies that a measure is unable to discriminate between either extreme of the scale. A ceiling or floor effect was defined as 15% or more of the total sample of 250 participants scoring 0 or 100 on the Igbo-WHODAS 2.0 (31).