Study design
This protocol is reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols (PRISMA-P) statement [33] and the corresponding checklist can be found in the additional file 1. This protocol was registered on the International Prospective Register of Systematic Reviews (PROSPERO) (registered, waiting for assessment, ID 203543; additional file 2) [34].
This study will systematically review the prognostic models for development and prognosis of KOA. The framing of the review question, study identification, data collection, critical appraisal, data synthesis and results interpretation and reporting will be conducted according to previous guidelines and several developments in prediction model research methodology [29–32, 35–40] (Table 1).
Table 1
Stage of the review
|
Started
|
Completed
|
Rescores
|
Protocol drafting
|
Yes
|
No
|
PROSPERO, PRISMA-P, CHARMS, PICOTS, PROGRESS-3
|
Preliminary searches
|
Yes
|
No
|
PICOTS
|
Piloting of the study selection process
|
Yes
|
No
|
CHARMS, TRIPOD
|
Developing of review tools
|
Yes
|
No
|
Data extraction tool, Critical appraisal tool
|
Formal searches
|
No
|
No
|
PRESS, De-duplication guideline
|
Formal screening of search results against eligibility criteria
|
No
|
No
|
CHARMS, PICOTS, TRIPOD
|
Data extraction
|
No
|
No
|
Modified data extraction tool
|
Critical appraisal
|
No
|
No
|
Modified critical appraisal tool
|
Data analysis
|
No
|
No
|
Cochrane Library
|
Reporting
|
No
|
No
|
GRADE, PRISMA
|
Note: CHARMS: CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies; GRADE: Grades of Recommendation, Assessment, Development, and Evaluation; PICOTS: Population, Intervention, Comparison, Outcome, Timing, Setting; PRESS: Peer Review of Electronic Search Strategies; PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses; PRISMA-P: Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols; PROBAST: Prediction model Risk Of Bias ASsessment Tool; PROGRESS: Prognosis Research Strategy; PROSPERO: International Prospective Register of Systematic Reviews; TRIPOD: Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis. |
Key items of this review are clarified with assistance of CHARMS checklist [31] (Table 2). A prognostic model will be defined as a combination of two or more predictors within statistical methods, machine learning methods or deep learning methods [35], which is used to predict the risk of the future outcome, and may help the health professionals and patients approach appropriate therapeutic decision. Studies investigated the association between a single risk factor and the outcome will be excluded, as they are limited in their utility for individual risk prediction. Specially, machine learning models in medical imaging, although they are usually based only on one modality, will be considered as multivariable, if multiple features have been extracted or deep learning methods have been employed. Studies reporting following types of prognostic models will be eligible for inclusion for our review: prediction models development with validation or external model validation. Studies that have developed prognostic models without validation will not be included into analysis, but records of these studies will be kept. We plan to systematically review prognostic models aiming (1) to predict KOA risk in general population; (2) to predict TKA risk in KOA patient; and (3) to predict TKA-related outcomes or complications in KOA patients intend to receive TKA, respectively; while studies reporting prognostic models with other objectives will not be considered.
Table 2
Framing of this systematic review using CHARMS key items
Key item
|
Model aim 1
|
Model aim 2
|
Model aim 3
|
1. Prognostic versus diagnostic prediction model
|
Future events: prognostic prediction models
|
2. Intended scope of the review
|
Models to inform physicians’ therapeutic decision making
|
3. Type of prediction modelling studies
|
All study types: (1) prediction model development studies with internal or external validation; (2) external model validation studies with or without model updating.
|
4. Target population to whom the prediction model applies
|
General population without KOA, with or without risk factors for KOA
|
KOA patient who has not receive TKA
|
KOA patient who plan to receive TKA
|
5. Outcome to be predicted
|
KOA risk
|
TKA risk
|
TKA outcomes
|
6. Time span of prediction
|
After the predictors collected, before the diagnosis of KOA
|
After the diagnosis of KOA, before the treatment of TKA
|
After the TKA
|
7. Intended moment of using the model
|
To predict the risk of KOA in general population
|
To predict the risk of receiving TKA after the diagnosis of KOA
|
To predict the TKA outcomes before TKA
|
Note: KOA, osteoarthritis; TKA, total knee arthroplasty |
Study inclusion
-Eligibility criteria
PICOTS (Population, Intervention, Comparison, Outcome, Timing, Setting) system will be used to fram the eligibility criteria and to guide selection of models with three different aims, separately [29, 30] (Table 3). PICOTS system is modified from PICO (Population, Intervention, Comparison, Outcome) system, additionally considers timing, i.e. specifically for prognostic models, when and over what time period the outcome is predicted; and setting, i.e. the intended role or setting of the prediction model.
Table 3
Eligibility criteria framed using the PICOTS system
PICTS system
|
Inclusion
|
Exclusion
|
Consideration
|
Model aim 1: To predict KOA risk in general population.
|
Population
|
General population without KOA, with or without risk factors, asymptomatic or symptomatic
|
Population with KOA diagnosed by any criteria, patients with other knee diseases if they are not a predictor defined by study author.
|
General population from community, out-patient department, or pre-collected dataset will be considered for inclusion. Some studies in population with symptoms, such as knee pain, will be considered for inclusion if a diagnosis of KOA have not been established. Patients with other knee diseases will be excluded unless the condition is defined by study authors as a predictor for future KOA risk.
|
Index
|
Development and/or validation of a prognostic model for population without KOA to predict KOA risk
|
Diagnostic models for KOA
|
Prognostic model development with or without validation, and validation with or without updating will be considered for inclusion, if they are intended to predict KOA risk for general population. Diagnostic models will be excluded as our concern is to prevent KOA.
|
Comparator
|
Not applicable
|
Not applicable
|
As far as we know, a widely-adapted model for predicting KOA risk has not been established yet. Therefore, a comparison seemed to be impossible.
|
Outcomes
|
Future KOA diagnosis, KOA risk within a time period defined by the study’s authors
|
Current KOA status
|
Most of current studies defined Kellgren and Lawrence grade ≥ 2 as KOA, while other studies may identify KOA patients with diagnostic codes. The effect measures for KOA will be as defined by the study’s authors, their reference standard will be recorded.
|
Timing
|
KOA occurring after the predictors collected
|
Undiagnosed KOA before or at the moment the predictors collected
|
Included studies need to report on prediction models for future KOA occurring after the predictors collected. Prediction models for occurred undiagnosed KOA will be excluded.
|
Setting
|
Prognostic models that are intended to be used by healthcare professionals, in any clinic setting, at any time before the KOA diagnosis established.
|
Prognostic models that are intended to be used after or at the moment of establishing a diagnosis of KOA
|
Prognostic models that are intended to inform clinicians’ therapeutic decision-making, i.e. prevention of KOA in high risk patients, will be included, to improve patient care. Prognostic models predicting progression of KOA patients will be excluded for this sub-question.
|
Model aim 2: To predict future TKA in KOA patient
|
Population
|
KOA patient who has not receive TKA
|
KOA patient who has received TKA, undiagnosed KOA patients, general population without KOA, patients with other knee diseases
|
KOA patient diagnosed by any criteria, receiving any therapy except for TKA will be considered for inclusion. Patients with other knee diseases or without an established KOA diagnosis will be excluded.
|
Index
|
Development and/or validation of a prognostic model for KOA patient who has not receive TKA to predict necessity of TKA
|
Prognostic model for patient with other knee diseases, or to predict necessity of other therapeutic options, or symptoms.
|
Prognostic model development with or without validation, and validation with or without updating will be considered for inclusion, if they are intended to predict necessity of TKA for KOA patients. Prognostic models for patients with other knee diseases, or to predict necessity of other therapeutic options, or symptoms, will be excluded.
|
Comparator
|
Not applicable
|
Not applicable
|
As far as we know, a widely-adapted model for predicting future TKA in KOA patients has not been established yet. Therefore, a comparison seemed to be impossible.
|
Outcomes
|
Future TKA due to KOA, TKA risk within a time period defined by the study’s authors
|
Necessity of other therapeutic options, or symptoms; TKA due to other knee diseases
|
As healthcare costs attributed to OA are driven largely by TKA, prognostic models of identifying patients with OA at high risk of future progression may be most useful for care healthcare professionals.
|
Timing
|
TKA after or at the moment of the diagnosis of KOA
|
TKA before the diagnosis of KOA, TKA in KOA patients who has received TKA
|
Included studies need to report on prediction models for future TKA after the diagnosis of KOA. Prediction models for general population will be excluded as they are less useful in practice. Prediction models for revision of TKA will also be excluded as our concern is to delay TKA.
|
Setting
|
Prognostic models that are intended to be used by healthcare professionals, in any clinic setting, at any time after the KOA diagnosis have been established, but before TKA
|
Prognostic models that are intended to be used before a diagnosis of KOA have been established
|
Prognostic models that are intended to inform clinicians’ therapeutic decision-making, i.e. management of KOA to delay TKA.
|
Model aim 3: To predict TKA-related outcomes or complications in KOA patients intend to receive TKA
|
Population
|
KOA patient who plan to receive TKA
|
KOA patient who has received TKA, undiagnosed KOA patients, general population without KOA, patients plan to receive TKA due to other knee diseases
|
KOA patient diagnosed by any criteria, planning to receive TKA will be considered for inclusion. Patients plan to receive TKA due to other knee diseases or without an established KOA diagnosis will be excluded.
|
Index
|
Development and/or validation of a prognostic model for KOA patient who plan to receive TKA to predict TKA-related outcomes or complications
|
Prognostic model for patient with other knee diseases who plan to receive TKA to predict outcomes or complications unrelated to TKA
|
Prognostic model development with or without validation, and validation with or without updating will be considered for inclusion, if they are intended to predict TKA-related outcomes or complications in KOA patients.
|
Comparator
|
Not applicable
|
Not applicable
|
As far as we know, a widely-adapted model for predicting TKA-related outcomes or complications in KOA patients plan to receive TKA has not been established yet. Therefore, a comparison seemed to be impossible.
|
Outcomes
|
TKA-related outcomes or complications
|
Outcomes or complications unrelated to TKA
|
As our aim is to select KOA patients suitable for TKA, only outcomes or complications related to TKA are useful for care healthcare professionals.
|
Timing
|
TKA-related outcomes or complications after the TKA
|
TKA-related outcomes or complications before the TKA
|
Psychological problems such as anxiety may occur before TKA; however, they are more likely to be recognized as predictors for poor outcomes or complications related to TKA.
|
Setting
|
Prognostic models that are intended to be used by healthcare professionals, in orthopedics setting, before TKA,
|
Prognostic models that are intended to be used after or at the moment of the TKA
|
Prognostic models that are intended to inform clinicians’ and patients’ therapeutic decision-making, i.e. to select KOA patients suitable for TKA, to prevent poor outcomes or complications in high risk patients.
|
Note: KOA, osteoarthritis; TKA, total knee arthroplasty |
We further established eligibility criteria concerning aspects other than the PICOTS system. (1) Study design: any study design including prospective or retrospective, randomized-controlled trial, observational study or case-control study, are acceptable. (2) Countries and regions: we will consider studies from all countries and regions. (3) Journal: we will consider studies from peer-reviewed journals of all research fields, which are representative of the high-quality studies on prognostic models for KOA. (4) Publish period: we will include only studies published after 2000, to display the current status of prediction modeling studies for KOA. Furthermore, the prediction model building approaches have significantly improved in the last two decades, particularly the machine learning methods and leading-edge deep learning methods. (5) Language: we will include studies published in English, Chinese, Japanese, German or French. One reviewer has expertise in those five languages. (6) Publication type: we will include only peer-reviewed full-text studies with original results, as they are expected to exhibit high-quality models and detailed methodology. Therefore, we will not consider abstracts only, conference abstracts, short communications, correspondences, letters or comments, and do not intend to search the grey literature. Any identified and relevant review articles will be used to identify eligible primary studies.
-Search strategy
We will search the following seven electronic databases from inception to 31 December 2020, including PubMed, Embase, the Cochrane Library, Web of Science, Scopus, SportDiscus, and Cumulative Index of Nursing and Allied Health Literature (CINAHL) [41-47]. SportDiscus is the leading bibliographic database for sports and sports medicine research; and CINAHL is the largest collection of full-text for nursing and allied health journals in the world. They will be included into electronic databases search because nursing and sports medicine professionals are also interested in the management of KOA patients; and these two databases were searched as routine in previous studies [48].
Search keywords will be selected from the MeSH terms and appropriate synonyms, based on the review question clarified by PICOTS system, including three concept terms: “knee”, “osteoarthritis” and “prediction model”. Each concept will be searched by MeSH term and free words combined with the OR Boolean operator, and then the three concepts will be combined with the AND Boolean operator. For each database, keywords will be translated into controlled vocabulary (MeSH, Emtree, and others), and will be chosen from free text. We will take search strategies in former studies as reference [48] and will co-design the search strategy. The search strategies will be tested for eligibility by two reviewers before formal search. The sample for the search strategy will be presented in additional file 3.
The formal search will be performed by two same reviewers according to the PRESS guideline [36]. In case of uncertainties, a third reviewer was consulted to reach final consensus. The reference list of included studies and relevant reviews will be hand-searched for additional potentially relevant citations. However, we do not intend to search grey literature due to concerns on their methodological quality.
-Data management
We will use Endnote reference manager software version X9.2 (Clarivate Analytics, Philadelphia, PA, USA) [49] to merge the retrieved studies. Duplicates will be removed using a systematic, rigorous and reproducible method utilizing a sequential combination of fields including author, year, title, journal and pages [37]. We will use a free online Tencent Document software (Tencent, Shenzhen, China) [50] to manage records through-out the review, to make sure all reviewers follow the latest status of the review process timely and to ensure two senior reviewers can supervise the process remotely during the difficult period of coronavirus disease 2019 pandemic.
-Study selection
Two independent reviewers will screen the titles and abstracts of all the potential records to identify all relevant studies using the pre-defined inclusion and exclusion criteria. In case of an unavailable abstract, full-text articles will be obtained unless the title is clearly irrelevant. Two same reviewers will obtain the full-text and supplementary materials of all selected records, and will thoroughly read them independently, to further determine their eligibility before extracting data. The corresponding authors of potential records may be contacted to request the full-text if it is not available otherwise. Disagreements will be resolved by consensus to reach the final decision, with assistance from our review group consist of a computer engineer with experience in prediction model building, an orthopedist with experience in OA management, and musculoskeletal radiologists.
Data collection
-Data extraction
We will develop a data extraction instrument for study data based on several previous systematic reviews of prediction model [51-53]. As the reviewers have different levels of experience and knowledge, the items listed will be reviewed and discussed to ensure that all reviewers had clear knowledge of the procedures. A training phase will be introduced before the formal extraction.
During the training phase, two randomly chosen articles from all articles fulfilled the inclusion criteria for discussion will be used to train two independent reviewers. They will thoroughly read the two randomly chosen articles including the supplementary materials, and will measured each study independently. A structured data collection instrument will be modified and used to help them reach agreement. Disagreements will be discussed in order to achieve a shared understanding of each parameter. This pre-defined and piloted data extraction instrument will be used in the formal data extraction phase.
During the formal extraction phase, two independent reviewers will thoroughly read all articles including the supplementary materials, to extract the data from the studies to describe their characteristics. Any disagreement will be resolved by discussion to reach a consensus and consultation with other members of our review group if required. Missing data will be obtained from the authors wherever possible; studies with insufficient information will be noted.
-Critical appraisal
We will develop a critical appraisal instrument according to TRIPOD statement, CHARMS checklist and PROBAST tool [30-32]. The TRIPOD is a set of recommendations, deemed essential for transparent reporting of a prediction model study, and allows the quality evaluation and potential usefulness analysis. The CHARMS checklist identifies eleven domains to facilitate a structured critical appraisal of primary studies on prediction models, mainly focus on the methodological quality of included models. The PROBAST tool is designed for assessing the risk of bias and applicability concerning four domains, i.e. participants, predictors, outcome, and analysis, with a total of 20 signaling questions. These three instruments, although focus on different aspects of prediction model studies, overlap each other in several domain and items. Therefore, we will merge them into a critical appraisal instrument to reduce the workload during the systemic critical evaluation.
During the development period of this instrument, we also considered machine learning and deep learning relevant checklists, e.g. radiomics quality score [54], Checklist for Artificial Intelligence in Medical Imaging [55], Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research [56], all of which are specialized assessment tools for cutting-edge artificial intelligence models. However, they include many items that may not available for prediction models built with traditional statistical methods based on clinical characteristics, laboratory examinations, or genetic factors. On the other hand, TROPOD, CHARMS and PROBAST have been already proved suitable for assessing prediction models using artificial intelligence methods [53]. Thus, we will choose three more widely-adapted and more extensively-accepted tools, to develop our critical appraisal instrument.
A similar training phase is introduced before the formal critical appraisal, to ensure its eligibility and to achieve a shared understanding of each parameter. During the formal evaluation phase, two independent reviewers will assess all the articles and corresponding supplementary materials, to measure and rate all studies according to established criteria. Any disagreement will be solved as described before.
-Data pre-processing
The necessary results or performance measures and their precision are needed to allow quantitative synthesis of predictive performance of the prediction model under study [29]. However, model performance measurements vary among reported prediction model studies, and sometimes are unreported or inconsistent for further analysis. In cases where pertinent information is not reported, efforts will be made to contact study authors to request this information. If there is any non-response, missing performance measures and their measures of precision will be calculated if possible according to the methods previously described [29]. If this is impossible due to limited data, the exclusion of the study will be determined by discussion among the reviewers.
Data synthesis
The data synthesis process will be guided by serval methodological reference books and guidelines [57-61]. Two reviewers of this study have significant expertise in statics and meta-analysis methods that would be used in this review. In case of doubt, the reviewers will discuss to approach consensus or consult a statistician for advice.
-Qualitative synthesis
All extracted data on prediction models will be narratively summarized and the key findings will be tabulated to facilitate comparison according to the PICOTS system [30], and in particular, what prediction factors were included in different models, when and how the included variables were coded, what the outcomes of models were, the reported predictive accuracy of the model and whether the model was validated internally and/or externally, and if so, how. Models relating to different aims will be considered separately. Two most common statistical measures of predictive performance, discrimination and calibration will be reported when published or approximated using published methods [30]. Other means such as sensitivity, specificity, positive predictive value and negative predictive value will also be included if reported [30]. Individual results of CHARMS, TRIPOD, and PROBAST and the overall reporting transparency, methodological quality, and risk of bias will be reported [30-32].
-Quantitative synthesis
The statistics analysis will be performed via SPSS software version 26.0 (SPSS Inc., Chicago, IL, USA) [62]. P-value < 0.05 will be recognized as statistical significance, unless otherwise specified. The elements of TRIPOD will be treated as binary categorical variables, with their inter-rater agreement assessed by the Cohen’s kappa statistic [63]. The elements of CHARMS and PROBAST includes ordinal categories with more than two possible ratings, therefore, Fleiss’ kappa statistic will be used to assess their inter-rater agreement [64]. The summed TRIPOD rating will be treated as a continuous variable, and their inter-rater agreement will be assessed using the interclass correlation coefficient (ICC) [65]. Further, we will provide correlation information among these three instruments to present whether they are complimentary critiques [66], where possible.
-Meta-analysis
The meta-analysis will be conducted via Stata/SE software version 15.1 (Stata Corp., College Station, TX, USA) with the metan, midas and metandi packages [67-70] and any other packages depending on the data we extract. The plan of meta-analysis will be dependent on the studies identified in the systematic review. If a similar clinical question was assessed repeatedly in a large enough subset of the included studies, meta-analysis will be considered to jointly summarize calibration and discrimination statics with their 95% confidence intervals to obtain average model performance. Relevant forest plots and a hierarchical summary receiver operating characteristic (HSROC) curve will be obtained to visually show the model performance [71].
-Heterogeneity assessment
For assessment of heterogeneity between the meta-analyzed studies, the Cochran’s Q and the I2 statistic will be calculated [72]. Difference between the 95% confidence region and prediction region in the HSROC curve was used to visually assess the heterogeneity, and a large difference indicate the presence of heterogeneity [71]. Potential sources of heterogeneity will be investigated by means of meta-regression or subgroup analysis if there are > 10 studies included in the meta-analysis [73].
-Publication bias assessment
Publication biases arise when the dissemination of research findings is influenced by the nature and direction of results. A Deeks funnel plot will be generated to visually assessed publication bias if there are > 10 studies included in the meta-analysis [74]. An Egger’s test was performed to assess the publication bias and a p-value > 0.10 indicated a low publication bias [75]. A Deeks funnel plot asymmetry test was also constructed to explore the risk of publication bias, and a p-value > 0.10 indicated a low publication bias [76]. The trim and fill method will be conducted to estimate the number of missing studies [77].
-Subgroup analysis
We plan to carry out following subgroup analyses regardless of heterogeneity. (1) the type of model validation: internal validation or external validation; (3) the predictor of model: clinical characteristics, laboratory examinations, genetic factors, objective or quantitative-extracted imaging feature, or their combinations. (4) the method of prognostic model building: statistic method, machine learning method, or deep learning method, etc. These subgroups were selected to display the current strengths and limitation of studies in prediction model for KOA. Further subgroup analysis will depend on the data extracted.
-Sensitivity analysis
Sensitivity analyses will be performed by excluding studies with high risk of bias assessed by the PROBAST tool (at least 4/7 domain to be high), studies with high methodological quality assessed by the CHARMS checklist (at least 6/11 domain to be high), and studies with low reporting transparency assessed by the TRIPOD statement (at least half of available items not to be mentioned), to explore their influence on effect size. This analysis will be a narrative summary that covers the same elements as the primary analysis if appropriate.
Reporting and dissemination
The results of the review will be reported guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [38]. The confidence in estimates will be determined according to the GRADE approach (Grades of Recommendation, Assessment, Development, and Evaluation) [39,40]. The approval of ethics and consent to participate are not required for our study due to its nature of systematic review and meta-analysis. Our findings will be disseminated through peer-reviewed publications, and presentation at conferences if possible.