"Applicability"-dimension of LEGEND
|
Clark et al. [74]
|
Applicability of results to treating patients
|
P1: RCTs and CCTs P2: reviewers and clinicians
|
3 items
|
3-point-scale
|
Deductive and inductive item-generation. Tool was pilot tested among an interprofessional group of clinicians.
|
“Applicability”-dimension of Carr´s evidence-grading scheme
|
Carr et al. [55]
|
Generalizability of study population
|
P1: clinical trials
P:2 authors of SRs
|
1 item
|
3-point-classification-scale
|
No specific information on tool development.
|
Bornhöft´s checklist
|
Bornhöft et al. [98]
|
External validity (EV) and Model validity (MV) of clinical trials
|
P1: clinical trials
P2: authors of SRs
|
4 domains with 26 items for EV and MV each
|
4-point-scale
|
Development with a comprehensive, deductive item-generation from the literature. Pilot-tests were performed, but not for the whole scales.
|
Cleggs´s external validity assessment
|
Clegg et al. [56]
|
Generalizability of clinical trials to England and Wales
|
P1: clinical trials
P2: authors of SRs and HTAs
|
5 items
|
3-point-scale
|
No specific information on tool development
|
Clinical applicability
|
Haraldsson et al. [59]
|
Report quality and applicability of intervention, study population and outcomes
|
P1: RCTs
P2: reviewers
|
6 items
|
3-point-scale and 4-point-scale
|
No specific information on tool development
|
Clinical Relevance Instrument
|
Cho & Bero [72]
|
Ethics and Generalizability of outcomes, subjects, treatment and side effects
|
P1: clinical trials
P2: reviewers
|
7 items
|
3-point-scale
|
Tool was pilot tested on 10 drug studies. Content validity was confirmed by 7 reviewers with research experience.
- interrater reliability of overall score: ICC = 0.41 (n = 10) for pilot version
|
“Clinical Relevance” according to the CCBRG
|
Van Tulder et al. [78]
|
Applicability of patients, interventions and outcomes
|
P1: RCTs
P2: authors of SRs
|
5 items
|
3-point-scale (Staal et al., 2008)
|
Deductive item-generation for Clinical Relevance. Results were discussed in a workshop. After two rounds, a final draft was circulated for comments among editors of the CCBRG.
|
Clinical Relevance Score
|
Karjalainen et al. [71]
|
Report quality and applicability of results
|
P1: RCTs
P2: reviewers
|
3 items
|
3-point-scale
|
No specific information on tool development.
|
EVAT (External Validity Assessment Tool)
|
Khorsan & Crawford [75]
|
External validity of participants, intervention, and setting
|
P1: RCTs and non-randomized studies P2: reviewers
|
3 items
|
3-point-scale
|
Deductive item-generation. Tool developed based on the GAP-checklist [57] and the Downs and Black-checklist [19]. Feasibility was tested and a rulebook was developed but not published.
|
"External validity"-dimension of the Downs & Black-Checklist
|
Downs & Black [19]
|
Representativeness of study participants, treatments and settings to source population or setting
|
P1: RCTs and non-randomised studies
P2: reviewers
|
3 items
|
3-point-scale
|
Deductive item-generation, pilot test and content validation of pilot version. Final version tested for:
- internal consistency: KR-20 = 0.54 (n = 20),
- test-retest reliability: k = -0.05-0.48 and 10–15% disagreement (measurement error) (n = 20),
- interrater reliability: k = -0.08-0.00 and 5–20% disagreement (measurement error) (n = 20) [19];
ICC = 0.76 (n = 20) [79]
|
"External validity"-dimension of Foy´s quality checklist
|
Foy et al. [58]
|
External validity of patients, settings, intervention and outcomes
|
P1: intervention studies
P2: reviewers
|
6 items
|
not clearly described
|
Deductive item-generation. No further information on tool development.
|
"External validity"-dimension of Liberati´s quality assessment criterias
|
Liberati et al. [62]
|
Report quality and generalizability
|
P1: RCTS
P2: reviewers
|
9 items
|
dichotomous and 3-point-scale
|
Tool is a modified version of a previously developed checklist [99] with additional inductive item-generation. No further information on tool development.
|
"External validity"-dimension of Sorg´s checklist
|
Sorg et al. [64]
|
External validity of population, interventions, and endpoints
|
P1: RCTs
P2: reviewers
|
4 domains with 11 items
|
not clearly described
|
Developed based on Bornhöft et al. [98] No further information on tool development.
|
“external validity”-criteria of the USPSTF
|
USPSTF Procedure manual [66]
|
Generalizability of study population, setting and providers for US primary care
|
P1: clinical studies P2: USPSTF reviewers
|
3 items
|
Sum-score- rating:
3-point-scale
|
Tool developed for USPSTF reviews. No specific information on tool development.
- interrater reliability:
ICC = 0.84 (n = 20) [79]
|
FAME (Feasibility, Appropriateness, Meaningfulness and Effectiveness) scale
|
Averis et al. [63]
|
Grading of recommendation for applicability and ethics of intervention
|
P1: intervention studies
P2: reviewers
|
4 items
|
5-point-scale
|
The FAME framework was created by a national group of nursing research experts. Deductive and inductive item-generation. No further information on tool development.
|
GAP (Generalizability, Applicability and
Predictability) checklist
|
Fernandez-Hermida et al. [57]
|
External validity of
population, setting, intervention and endpoints
|
P1: RCTs
P2: Reviewers
|
3 items
|
3-point-scale
|
No specific information on tool development.
|
Gartlehner´s tool
|
Gartlehner et al. [68]
|
To distinguish between effectiveness and efficacy trials
|
P1: RCTs
P2: reviewers
|
7 items
|
Dichotomous
|
Deductive and inductive item-generation.
- Criterion validity testing with studies selected by 12 experts as gold standard.:
Specificity = 0.83, sensitivity = 0.72 (n = 24)
- Measurement error: 78.3% agreement (n = 24)
- Interrater reliability:
k = 0.42 (n = 24) [68];
k = 0.11–0.81 (n = 151) [80]
|
Green & Glasgow´s external validity quality rating criteria
|
Green & Glasgow [70]
|
Report quality for generalizability
|
P1: trials (not explicitly described) P2: reviewers
|
4 Domains with 16 items
|
Dichotomous
|
Deductive item-generation. Mainly based on the Re-Aim framework.[100]
- interrater reliability:
ICC = 0.86 (n = 14) [84]
- discriminative validity: TREND studies report on 77% and non-TREND studies report on 54% of scale items
(n = 14) [84]
- ratings across included studies (n = 31) [86], no hypothesis was defined
|
"Indirecntess"-dimension of the GRADE handbook
|
Schünemann et al. [87]
|
Differences of population, interventions, and outcome measures to research question
|
P1: intervention studies
P2: authors of SRs, clinical guidelines and HTAs
|
4 items
|
Overall:
3-point-scale (downgrading options)
|
Deductive and inductive item-generation, pilot-testing with 17 reviewers [77].
- interrater reliability:
k = 0.00–0.82 (n = 12) of pilot-version [77]
|
Modified "Indirectness" of the Checklist for GRADE
|
Meader et al. [76]
|
Differences of population, interventions, and outcome measures to research question.
|
P:1 meta-analysis of RCTs
P:2 authors of SRs, clinical guidelines and HTAs
|
5 items
|
Item-level: 2-and 3-point-scale
Overall:
3-point-scale (grading options)
|
Developed based on GRADE method, two phase pilot-tests,
- interrater reliability:
kappa was poor to almost perfect on item-level [76] and
k = 0.69 for overall rating of indirectness (n = 29) [81]
|
external validity checklist of the NHMRC handbook
|
NHMRC handbook [67]
|
external validity of an economic study
|
P1: clinical studies
P2: clinical guideline developers, reviewers
|
6 items
|
3-point-scale
|
No specific information on tool development.
|
revised GATE in NICE manual (2012)
|
NICE manual [65]
|
Generalizability of population, interventions and outcomes
|
P1: intervention studies
P2: reviewers
|
2 domains with 4 items
|
3-point-scale and 5-point-scale
|
Based on Jackson et al. [101] No specific information on tool development.
|
RITES (Rating of Included Trials on the Efficacy-Effectiveness Spectrum)
|
Wieland et al. [69]
|
To characterize RCTs on an efficacy-effectiveness continuum.
|
P1: RCTs
P2: reviewers
|
4 items
|
5-point-likert-scale
|
Deductive and inductive item-generation, modified Delphi procedure with 69–72 experts, pilot testing in 4 Cochrane reviews, content validation with Delphi procedure and core expert group (n = 14) [69],
- interrater reliability:
ICC = 0.235–0.942 (n = 18) [69]
ICC = 0.54-1.0 (n = 22) [82, 85]
- convergent validity with PRECIS 2 tool:
r = 0.55 correlation (n = 59) [82]
|
Section A (Selection Bias) of EPHPP (Effective Public health Practice Project) tool
|
Thomas et al. [73]
|
Representativeness of population and participation rate.
|
P1: clinical trials
P2: reviewers
|
2 items
|
Item-level:
4-point-scale and 5-point-scale
Overall:
3-point-scale
|
Deductive item-generation, pilot-tests, content validation by 6 experts,
- convergent validity with Guide to Community Services (GCPS) instrument:
52.5–87.5% agreement (n = 70) [73]
- test-retest reliability:
k = 0.61–0.74 (n = 70) [73]
k = 0.60 (n = 20) [83]
|
Section D of the CASP checklist for RCTs
|
CASP Programme [102]
|
Applicability to local population and outcomes
|
P1: RCTs
P2: participants of workshops, reviewers
|
2 items
|
3-point-scale
|
Deductive item-generation, development and pilot-tests with group of experts.
|
Whole Systems research considerations´ checklist
|
Hawk et al. [60]
|
Applicability of results to usual practice
|
P1: RCTs P2: Reviewers (developed for review)
|
7 domains with 13 items
|
Item-level: dichotomous
Overall: 3-point-scale
|
Deductive item-generation. No specific information on tool development.
|
Abbreviations: CASP = Critical Appraisal Skills Programme; CCBRG = Cochrane Collaboration Back Review Group; CCT = controlled clinical trial; GATE = Graphical Appraisal Tool for Epidemiological Studies; GRADE = Grading of Recommendations Assessment, Development and Evaluation; HTA = Health Technology Assessment; ICC = intraclass correlation; LEGEND = Let Evidence Guide Every New Decision; NICE = National Institute for Health and Care Excellence; PRECIS = PRagmatic Explanatory Continuum Indicator Summary; RCT = randomized controlled trial; TREND = Transparent Reporting of Evaluations with Nonrandomized Designs; USPSTF = U.S. Preventive Services Task Force
|