Table 1
Inclusion and exclusion criteria
|
Inclusion criteria
|
Exclusion criteria
|
Population
|
Adults and children with acute rhinosinusitis
|
Chronic rhinosinusitis, allergic rhinosinusitis, allergic rhinitis
|
Study design
|
PROM development and/or validation study
|
All other study designs
|
Outcome
|
All patient-reported and proxy-reported outcomes
|
Non patient-reported outcomes, e.g. biomarkers, laboratory data
|
Type of measurement instrument
|
Patient-reported outcome measurement instruments
|
All others
|
Publication type
|
Articles with available full text
|
Abstracts
|
PROM patient-reported outcome measure |
Table 2
Boxes of the COSMIN Risk of Bias checklist
Content validity
|
Box 1
|
PROM development
|
Box 2
|
Content validity
|
Internal structure
|
Box 3
|
Structural validity
|
Box 4
|
Internal consistency
|
Box 5
|
Cross-cultural validity/measurement invariance
|
Remaining measurement properties
|
Box 6
|
Reliability
|
Box 7
|
Measurement error
|
Box 8
|
Criterion validity
|
Box 9
|
Hypotheses testing for construct validity
|
Box 10
|
Responsiveness
|
COSMIN Consensus-based Standards for the selection of health Measurement Instruments, PROM patient-reported outcome measure
Table 3
Criteria for good measurement properties
Measurement property
|
Rating
|
Criteria
|
Structural validity
|
+
|
CTT
CFA: CFI or comparable measure > 0.95 OR RMSEA < 0.06 OR SRMR
< 0.08a
IRT/Rasch
No violation of unidimensionalityb: CFI or TLI or comparable measure
> 0.95 OR RMSEA < 0.06 OR SRMR < 0.08
AND
no violation of local independence: residual correlations among the items after controlling for the dominant factor < 0.20 OR Q3’s < 0.37
AND
no violation of monotonicity: adequate looking graphs OR item scalability > 0.30
AND
adequate model fit
IRT: χ2 > 0.001
Rasch: infit and outfit mean squares ≥ 0.5 and ≤ 1.5 OR Z-standardized values > -2 and < 2
|
|
?
|
CTT: not all information for ‘+’ reported
IRT/Rasch: model fit not reported
|
|
-
|
Criteria for ‘+’ not met
|
Internal consistency
|
+
|
At least low evidencec for sufficient structural validityd” AND Cronbach’s alpha(s) ≥ 0.70 for each unidimensional scale or subscalee
|
|
?
|
Criteria for “At least low evidencec for sufficient structural validityd” not met
|
|
-
|
At least low evidencec for sufficient structural validityd and Cronbach’s alpha(s) < 0.70 for each unidimensional scale or subscalee
|
Reliability
|
+
|
ICC or weighted Kappa ≥ 0.70
|
|
?
|
ICC or weighted Kappa not reported
|
|
-
|
ICC or weighted Kappa < 0.70
|
Measurement error
|
+
|
SDC or LoA < MICd
|
|
?
|
MIC not defined
|
|
-
|
SDC or LoA > MIC
|
Hypotheses testing for construct validity
|
+
|
The result is in accordance with the hypothesisf
|
|
?
|
No hypothesis defined (by the review team)
|
|
-
|
The result is not in accordance with the hypothesisf
|
Cross-cultural validity/measurement invariance
|
+
|
No important differences found between group factors (such as age, gender, language) in multiple group factor analysis OR no important DIF for group factors (McFadden’s R2 < 0.02)
|
|
?
|
No multiple group factor analysis OR DIF analysis performed
|
|
-
|
Important differences between group factors OR DIF was found
|
Criterion validity
|
+
|
Correlation with gold standard ≥ 0.70 OR AUC ≥ 0.70
|
|
?
|
Not all information for ‘+’ reported
|
|
-
|
Correlation with gold standard < 0.70 OR AUC < 0.70
|
Responsiveness
|
+
|
The result is in accordance with the hypothesisf OR AUC ≥ 0.70
|
|
?
|
No hypothesis defined (by the review team)
|
|
-
|
The result is not in accordance with the hypothesisf OR AUC < 0.70
|
The criteria are based on Terwee et al. and Prinsen et al.
AUC area under the curve, CFA confirmatory factor analysis, CFI comparative fit index, CTT classical test theory, DIF differential item functioning, ICC intraclass correlation coefficient, IRT Item response theory, LoA limits of agreement, MIC minimal important change, RMSEA root mean square error of approximation, SEM standard error of measurement, SDC smallest detectable change, SRMR standardized root mean residuals, TLI Tucker-Lewis index
“+” = sufficient, “-“ = insufficient, “?” = indeterminate
aTo rate the quality of the summary score, the factor structure should be equal across studies.
bUnidimensionality refers to a factor analysis per subscale, while structural validity refers to a factor analysis of a (multidimensional) patient-reported outcome measure.
cAs defined by grading the evidence according to the GRADE approach.
dThis evidence may come from different studies.
eThe criteria ‘Cronbach’s alpha < 0.95’ was deleted, as this is relevant in the development phase of a PROM and not when evaluating an existing PROM.
fThe results of all studies should be taken together and it should then be decided if 75% of the results are in accordance with the hypotheses.
Table 4
Characteristics of the included instruments
|
PROMs for use in adults
|
PROMs for use in children
|
|
Sinonasal Outcome
Test-16 (SNOT-16)
|
Measurement of
Acute Rhinosinusitis (MARS)
|
Rhinosinusitis Quality of Life Questionnaire (RhinoQoL)
|
Pediatric Rhinosinusitis Symptom Scale (PRSS)
|
Sinus Symptom Questionnaire (S5)
|
Construct
|
Quality of life
(including symptoms)
|
Quality of life
(including symptoms)
|
Quality of life (including symptoms)
|
Symptoms
|
Symptoms
|
Target population
|
Adult patients with acute or chronic rhinosinusitis
|
Adult patients with acute rhinosinusitis
|
Adult patients with acute or chronic rhinosinusitis
|
Young children (2–12 years) with acute rhinosinusitis
◊ Parents (proxy-reported outcome measure)
|
Children with acute rhinosinusitis
◊ Parents (proxy-reported outcome measure)
|
Recall period
|
2 weeks
|
Present (?)
|
7 days
|
24 hours
|
Last few days
|
(Sub)scales (number of items)
|
0 subscales (16 items)
|
0 subscales (13 items)
|
3 subscales (Symptom frequency, symptom bothersomeness, symptom impact); 14 items
|
0 subscales (8 items)
|
0 subscales (5 items)
|
Response options and range of scores/scoring
|
0 to 3 (no problem; mild problem; moderate problem; severe problem); score 0–48 (sum of all items)
|
0 to 3 (no problem; mild problem; moderate problem; severe problem); score 0–39 (sum of all items)
|
Various response options: yes/no, 1 to 5 (none of the time, a little of the time, some of the time, most of the time, all of the time), 0 (not bothered at all) to 10 (bothered a lot)
|
no, almost none, a little, some, a lot, an extreme amount; scoring not reported
|
Item A-D: 0 to 3 (not present; small problem; medium problem; large problem) and don't know; Item E: 0 (none, clear), 3 (yellow, green), don't know; score 0–15 (sum of all items)
|
Available translations
|
English + 90 translations (including German)
|
Czech + English
|
English, German, French, Persian
|
English
|
English
|
PROM patient-reported oucome measure |
Table 5
Characteristics of the included study populations
PROM
|
Reference
|
Sample size
|
Age mean (SD)
or median in years
|
Setting
|
Country (Language)
|
Measurement properties
|
Quality of life
|
SNOT-16
|
Garbutt et al. (2011)
|
N = 166
|
32 (range 18–69)
|
Primary care practices
|
USA (English)
|
Internal consistency, test-retest reliability, construct validity, responsiveness
|
|
Quadri et al. (2013)
|
N = 347
|
Treatment arm: 40.1 (13.8); Placebo arm: 40.3 (13.0)
|
Clinical sites
|
USA (English)
|
Internal consistency, construct validity, responsiveness
|
MARS
|
Hornáčková et al. (2014)
|
N = 100
|
Patient group: 40.4 (range 18–71); Control group: 22.8
|
Ears, nose, throat offices and outpatient department of a university hospital
|
Czech Republic (Czech)
|
PROM development, internal consistency, construct validity, responsiveness
|
RhinoQoL
|
Petrat (2020)
|
N = 81
|
≥ 18 years
|
Clinical site
|
Germany (German)
|
Internal consistency, construct validity, responsiveness
|
Symptoms
|
PRSS
|
Shaikh et al. (2019)
|
Development:
N = 258;
Validation: N = 185
|
Development: 6.4 (2.9), Validation: 5.6 (2.7)
|
Ambulatory pediatric clinics
|
USA (English)
|
PROM development, structural validity, internal consistency, test-retest reliability, responsiveness
|
S5
|
Garbutt et al. (1999)
|
Development:
N = 1611;
Validation: N = 93
|
46% <6 years; 26% 6–12 years; 27% >12 years
|
Community pediatric ambulatory care practice
|
USA (English)
|
PROM development, test-retest reliability, responsiveness
|
PROM patient-reported oucome measure, MARS Measurement of Acute Rhinosinusitis, PRSS Pediatric Rhinosinusitis Symptom Score, RhinoQoL Rhinosinusitis Quality-of-Life Questionnaire, SNOT-16 Sinonasal Outcome Test-16, S5Sinusitis Symptom Questionnaire
Table 6
Quality of studies on measurement properties and methodological rating of the instruments
PROM
|
Reference
|
Methodological quality (rating1,2)
|
|
Structural validity
|
Internal consistency
|
Test-Retest-Reliability
|
Construct validity (Comparator instrument)
|
Construct validity (Known-groups)
|
Responsiveness
|
SNOT-16
|
Garbutt
et al. 2011
|
-
|
Very good
(?)
|
Doubtful
(+)
|
-
|
Very good
(±)
|
Very good
(+)
|
Quadri
et al. 2013
|
-
|
Doubtful
(?)
|
-
|
Adequate
(+)
|
-
|
Very good
(+)
|
MARS
|
Hornáčková
et al. 2014
|
-
|
Doubtful
(?)
|
-
|
-
|
Doubtful
(+)
|
Very good
(+)
|
RhinoQoL
|
Atlas
et al. 2005
|
-
|
Doubtful
(?)
|
-
|
Adequate
(±)
|
Doubtful
(±)
|
-
|
Petrat
2020
|
-
|
Doubtful
(?)
|
-
|
Adequate
(+)
|
-
|
Comparator instrument: Adequate (-)
|
Known-groups: Doubtful (+)
|
PRSS
|
Shaikh
et al. 2019
|
Adequate
(?)
|
Doubtful
(?)
|
Adequate
(+)
|
-
|
-
|
Doubtful
(+)
|
S5
|
Garbutt
et al. 1999
|
-
|
-
|
Doubtful
(+)
|
-
|
-
|
Inadequate
(±)
|
PROM Patient-reported outcome measure; MARS Measurement of Acute Rhinosinusitis, PRSS Pediatric Rhinosinusitis Symptom Score, RhinoQoL Rhinosinusitis Quality-of-Life Questionnaire, SNOT-16 Sinonasal Outcome Test-16, S5 Sinusitis Symptom Questionnaire
1No study has analyzed cross-cultural validity/measurement invariance, measurement error and criterion validity
2Rating: (+) sufficient, (-) insufficient, (?) indeterminate, (±) inconsistent
Table 7
PROM/Measurement property
|
Summary or pooled result
|
Overall rating
|
Quality of evidence
|
Quality of life (including symptoms)
|
Sinonasal Outcome Test-16 (SNOT-16)
|
Internal consistency
|
Alpha = 0.82; sample size = 166; alpha = 0.874 ; sample size: 374; no evidence for sufficient structural validity
|
Indeterminate
|
-
|
Test-retest-reliability
|
ICC = 0.73; sample size: 166
|
Sufficient
|
Low (due to risk of bias)
|
Construct validity (comparator instruments)
|
6 out of 8 hypotheses confirmed; sample size: 374
|
Sufficient
|
High
|
Construct validity (known-groups)
|
3 out of 6 hypotheses confirmed; sample size: 166
|
Inconsistent
|
-
|
Responsiveness
|
1 of 1 hypothesis confirmed; sample size: 374
|
Sufficient
|
High
|
Measurement of Acute Rhinosinusitis (MARS)
|
Internal consistency
|
Alpha = 0.679; no evidence for sufficient structural validity; sample size: 50
|
Indeterminate
|
-
|
Construct validity (known-groups)
|
1 of 1 hypothesis confirmed; sample size: 100
|
Sufficient
|
Low (due to risk of bias)
|
Responsiveness
|
1 of 1 hypothesis confirmed; sample size: 50
|
Sufficient
|
High
|
Rhinosinusitis Quality of Life Questionnaire (RhinoQoL)
|
|
|
Internal consistency
|
Alpha = 0.75; sample size = 81; alphafrequency = 0.45, alphabothersomeness = 0.28, alphaimpact = 0.85; sample size: 47; no evidence for sufficient structural validity
|
Indeterminate
|
-
|
Construct validity (comparator instruments)
|
10 out of 12 hypotheses confirmed; sample size: 128
|
Sufficient
|
High
|
Construct validity (known-groups validity)
|
2 out of 3 hypotheses confirmed; sample size: 47
|
Inconsistent
|
Very low (due to risk of bias and imprecision)
|
Responsiveness (comparator instrument)
|
1 of 1 hypothesis not confirmed; sample size: 81
|
Insufficient
|
Low (due to risk of bias and imprecision)
|
Responsiveness (known-groups)
|
3 out of 4 hypotheses confirmed; sample size: 81
|
Sufficient
|
Very low (due to risk of bias and imprecision)
|
Symptoms
|
Pediatric Rhinosinusitis Symptom Scale (PRSS)
|
Structural validity
|
Not reported; sample size: 185
|
Indeterminate
|
-
|
Internal consistency
|
Alpha = 0.79; no evidence for sufficient structural validity; sample size: 185
|
Indeterminate
|
-
|
Test-retest-reliability
|
ICC = 0.75; sample size: 185
|
Sufficient
|
Moderate (due to risk of bias)
|
Responsiveness
|
2 out of 2 hypotheses confirmed; sample size: 185
|
Sufficient
|
Low (due to risk of bias)
|
Sinus Symptom Questionnaire (S5)
|
Test-retest-reliability
|
ICC = 0.94; sample size: 26
|
Sufficient
|
Low (due to risk of bias)
|
Responsiveness
|
2 out of 3 hypotheses confirmed; sample size: 29–31
|
Inconsistent
|
-
|
ICC intraclass correlation coefficient, PROM patient-reported outcome measure
Table 8
|
Category A
|
|
Category C
|
|
|
PROM
|
Sufficient content validity (any level)
|
At least low quality evidence for sufficient internal consistency
|
|
High quality evidence for an insufficient measurement property
|
|
Recommendation
|
SNOT-16
|
✔
|
✖
|
|
✖
|
|
B
|
MARS
|
✔
|
✖
|
|
✖
|
|
B
|
RhinoQoL
|
✔
|
✖
|
|
✖
|
|
B
|
PRSS
|
✔
|
✖
|
|
✖
|
|
B
|
S5
|
✔
|
✖
|
|
✖
|
|
B
|
B COSMIN category B, PROM Patient-reported outcome measure; MARS Measurement of Acute Rhinosinusitis, PRSS Pediatric Rhinosinusitis Symptom Score, RhinoQoL Rhinosinusitis Quality-of-Life Questionnaire, SNOT-16 Sinonasal Outcome Test-16, S5 Sinusitis Symptom Questionnaire, ⎫ fulfilled, ⎦ not fulfilled