2.1. Data source
We drew data from a nested cohort from the INTRO-HCV trial that collected data on patients with SUDs in Bergen and Stavanger, Norway [20]. We recruited patients receiving opioid agonist therapy in Bergen and Stavanger and patients with SUDs receiving healthcare from the primary health clinics in the municipality of Bergen. This study included all patients in the cohort who had answered the FSS-9 and/or VAFS in the study period from May 2016 to January 2020.
2.2. Data collections
All included patients were invited to an annual health assessment, including FSS-9 and VAFS measurements and a survey of their current sociodemographic situation. We collected all data in a health register using data collection software (Checkware®) under the supervision of research nurses.
2.3. Study sample
We conducted 917 health assessments of 655 patients, and this included 916 FSS-9 measurements and 915 VAFS measurements during the study period. We defined a measurement as when at least one of the items in the FSS-9 or the VAFS were answered during a health assessment. Baseline was defined as the first health assessment including measurements of the FSS-9 or VAFS when the health assessments were listed chronologically. The FSS-9 and VAFS were completely answered in 914 health assessments. For the remaining three health assessments, one patient only completed the VAFS and not the FSS-9, one only answered five of the nine items on the FSS-9 and did not complete the VAFS, and a third completed the FSS-9 and not the VAFS. Of the 655 included patients, 188 completed two health assessments, while 37 patients completed three health assessments. The time intervals between the annual health assessments varied with a mean of 12 months (standard deviation (SD): 4 months) (Additional File 1). Due to the relatively small number of patients with three health assessments, we used two health assessments when estimating internal consistency reliability and construct validity. For patients with three annual health assessments, we only included the first (baseline) and second health assessments in these analyses.
2.4. Measuring fatigue
We used the FSS-9 and VAFS to measure the level of fatigue. The FSS-9 measures fatigue during the past week, and it includes items regarding: mental and physical functioning, motivation, exercise, carrying out certain duties, and interference with work, family, or social life. The VAFS measures the patient’s general experience of fatigue. The FSS-9 was answered on a Likert scale from 1 (no fatigue) to 7 (worst fatigue) and the VAFS was answered by placing a mark on a line from 0 (no fatigue) to 10 (worst fatigue) that represent the fatigue level. The data collection software only allowed valid responses for each question and prompted for responses to unanswered questions before submission in order to minimise missing data. In a previous study, the US-English version of the FSS-9 has been translated into Norwegian by a qualified native Norwegian-speaking translator and back-translated into US-English by an authorised native US-English-speaking translator (Additional File 2) [21].
2.5. Statistical analysis
We used Stata/SE 16.0 (StataCorp, TX, USA) for descriptive analysis and IBM SPSS version 24.0 and Mplus version 8.4 for reliability analysis (Cronbach’s α if-item-deleted and Item-Total correlation), for confirmatory factor analysis (CFA), and for linear mixed model (LMM) analysis (Mplus: TwoLevel analysis). The threshold for statistical significance was set to P < 0.05 for all analyses unless otherwise stated.
2.5.1. Internal consistency of the FSS-9 and the shortened version of FSS-9 (FSS-3), and these scales including the VAFS
We calculated the internal consistency of the FSS-9 and this scale including the VAFS at baseline and at the second health assessment. As a part of the validation study, we explored whether there was value in adding the VAFS to the FSS-9 by evaluating if the VAFS added more information than captured by the FSS-9. Cronbach’s α was considered to show good internal consistency if Cronbach’s α was above 0.70 [22-24]. We then shortened the FSS-9 by deleting the item that resulted in the highest Cronbach’ α value for the remaining items (alpha-if-item-deleted analysis). The remaining items’ Cronbach’s α coefficients were recalculated, and the next item was deleted. If the remaining scale showed almost equal Cronbach’s α values after removing one or another item, clinical experience was used in the decision of what item we removed. We deleted items that were less adaptable to patients with SUDs, for example, items about employment (unemployment was common in this population) and items with complex phrases and wordings that could be difficult to understand for patients with SUDs when they were intoxicated or went through substance withdrawals. Furthermore, we calculated Cronbach’s α for the VAFS plus the shortened version of FSS-9 (FSS-3) at baseline and at the second health assessment. Due to strong inter-item correlations and good reliability in previous studies evaluating the FSS-9 alongside the VAFS [15, 16], we expected that the VAFS did not provide much added variability than captured by the FSS-9 and FSS-3 questionnaires at baseline and at the second health assessment.
2.5.2. Longitudinal confirmatory factor analysis for evaluating the fit of the FSS-3 and FSS-9, and these scales including the VAFS
We used CFA models to test the structure of the items in the FSS-3 and FSS-9, and these scales including the VAFS at baseline and at the second health assessment in order to evaluate the relationships between the items and their underlying latent factors [22, 25-28]. We expected that both the FSS-3 and FSS-9 should support one-dimensional models. The VAFS was added to the FSS-3 and FSS-9 to examine whether the VAFS provided much added variability in fatigue than captured by the FSS-3/FSS-9. This should be indicated by a less than a perfect correlation between the FSS-3/FSS-9 and VAFS. Further, we used longitudinal CFA in order to test measurement invariance for the FSS-3. First, we estimated a free model with all unique parameter values. We then tested for constraints in the model by setting the factor loadings within each item equal to each other at baseline and at the second health assessment. Third, we tested for equality within the residuals over time. The last model constrained the intercept values for the indicators. We used the Wald test to compare model restrictions. All CFA models were evaluated with standard fit measures: χ2, degrees of freedom, p-values, Comparative fit index, Tucker Lewis Index, Root Mean Square Error of Approximation with 90 % confidence interval, and the probability of close fit. A well-fitted model should have a statistically non-significant χ2, values of Comparative fit index and Tucker Lewis Index should be above 0.95, and Root Mean Square Error of Approximation should preferably be below 0.05 (close fit) [26]. Root Mean Square Error of Approximation above 0.10 is considered to be a poorly fitted model [26]. We used the modification index to explore model improvements if the goodness of fit measures indicated a poorly fitted model (χ2 difference test). We analysed all variables as continuous variables due to the relatively high number of categories in the ordinal variables (FSS-9 items ranged from 1 to 7, and the VAFS ranged from 0 to 10). The CFAs were run using the Robust Maximum Likelihood estimator. According to previous studies showing good reliability and strong inter-item correlations of the FSS-9 and this scale including the VAFS [15, 16], we expected increased support for the FSS-3 reflecting fatigue as one dimension with stronger levels and homogeneity in the factor loadings than for the FSS-9, also when including the VAFS at baseline and at the second health assessment. This should be the consequence of reducing the scale by using the most relevant measurement indicators in this population of respondents. Furthermore, we expected that the longitudinal CFA would support measurement invariance over time for the FSS-3 [26]. Measurement invariance is indicated if each measurement indicator is equally important for the underlying factor over time, with equal factor loadings and intercepts within each item over time. In addition, strict invariance would be supported if residuals for each item are equal over time.
2.5.3. Linear mixed model analysis for evaluating changes in the FSS-3 and FSS-9 sum scores and the VAFS score
We used a LMM analysis (Mplus multilevel modelling: TwoLevel) to evaluate linear changes from baseline in the sum scores of the FSS-3 and FSS-9, the scores in the separate FSS-9 items, and the VAFS score. We included all 917 health assessments. First, we estimated a full random intercept random slope model, which gave us the mean and individual variance in terms of both level and change together with the relationship between level and change [29]. We re-estimated the model as a random intercept fixed slope model if the covariance between the intercept and the slope variance was statistically non-significant. We used the Mplus Maximum Likelihood Robust estimator to correct standard errors for potential deviation from normality [30]. In addition, interclass correlations were estimated. We used full information maximum likelihood in order to use all available measurements. The full information maximum likelihood assumes ‘missing at random’ [31]. Based on separate variables as indicators of the same underlying construction, we expected similar changes over time in the indicators and as seen in the FSS-9 sum score.
2.6. Ethics approval and consent to participate
The study was reviewed and approved by the Regional Ethical Committee for Health Research (REC) West, Norway (reference number: 2017/51/REK Vest, dated 29.03.2017/20.04.2017). Each patient provided written informed consent prior to enrolling in the study.