4.1. Translation
The present study is the first to translate CPAx from English to Chinese using the Brislin model to guarantee sufficient equivalency[16–17]. We not only included a multi-disciplinary committee to remedy content variance, but also included two Chinese nurses with English certifications studying, respectively, in the UK and Canada, as well as being native speakers of Chinese. In addition, we undertook tests for criterion validity and reliability for the completed translation.
4.2. Validity of CPAx-Chi
Validity is the degree that a measured result reflects the measured content. The more consistent the measured result is with the measured content, the higher is the validity[18–19]. According to the guide of the scale compilation, when the number of experts is more than 5, the good standard of I-CVI is more than 0.78, and the experts must be authoritative and coordinated[19–20].
The present study involved nine ICU multidisciplinary experts with deep theoretical knowledge and clinical experience. The Expert Authority Coefficient ranged from 0.75 to 0.95. The Kendall Synergy Coefficient was 0.061(p = 0.842) and I-CVI ranged from 0.889 to 1. Therefore, CPAx-Chi had good content validity[21–22].
Corner and colleagues demonstrated that the CVI of CPAx was 1 (p < 0.05)[11–12]. They also showed that CPAx has good predictive validity, and that the CPAx score could be used as an alternative indicator of functional prognosis in critically ill patients by analyzing the relationship between the CPAx score and patient outcomes[13]. Other colleagues demonstrated the criterion validity of CPAx taking the scores for the MRC, Short Form (SF)-36, Sequential Organ Failure Assessment (SOFA), and GCS as a standard[23]. They found that the correlation coefficient between the CPAx score and MRC-Score was 0.65 (p < 0.001). The correlation coefficient for the right upper limb, left upper limb, right lower limb, and left lower limb with the CPAx score was, respectively, 0.69, 0.64, 0.69 and 0.67. The correlation coefficient between the CPAx score and SOFA score was 0.68 (p < 0.001). The correlation coefficient between the CPAx score and GCS was 0.74 (p < 0.001). The correlation coefficient between the physical-function item of SF-36 and the CPAx score was 0.72 (p = 0.013). The correlation coefficient between the mental-function component of SF-36 and the CPAx score was 0.024 (p = 0.95). In the present study, the correlation coefficient between the CPAx-Chi score and the items of the MRC-Score ranged from 0.60 to 0.65 (p < 0.001). Therefore, CPAx-Chi had good validity.
4.3. Reliability of CPAx-Chi
Cronbach’s α mainly reflects the internal consistency of a scale[18–19]. In general, Cronbach's α should be > 0.7; a value < 0.6 indicates that the items of scale must be revised. From the perspective of psychometrics, the “ideal” Cronbach’s α should be > 0.8[24–26]. The inter-rater reliability mainly demonstrates the consistency of evaluation results among different evaluators, and the stability of scales used among different evaluators[27–28]. An inter-rater correlation coefficient > 0.7 indicates that the inter-rater reliability is good. The inter-rater correlation coefficient ranging from 0.8 to 0.9 indicates that the inter-rater reliability is high[14, 18–20, 28]. In the present study, Cronbach’s ɑ for CPAx-Chi was 0.939, and the inter-rater reliability of the CPAx-Chi score was 0.902 (p < 0.001). The inter-rater correlation coefficient was > 0.8 for the items of respiratory function, transfer from bed to chair, and grip strength. The inter-rater correlation coefficient of other items of CPAx-Chi were all > 0.7. Therefore, CPAx-Chi had good reliability.
4.4. Best cutoff point, sensitivity and specificity of CPAx-Chi
Typically, evaluation of diagnostic performance is based on the ROC curve and AUC. If the AUC of a certain scale is 1, then it is considered to be a “perfect” diagnostic tool, but the perfect tool does not exist in the real world. Hence, if the AUC of one scale ranges from 0.85 to 0.95, then the measurement effect of the scale is very good. If the AUC of one scale ranges from 0.5 to 0.7, then the measurement effect of the scale is considered to be undesirable. If the AUC of one scale is 0.5, then the measurement effect of the scale is barely functional[29–31]. Our experts regarded an MRC-Score ≤ 48 as the standard to diagnose ICU-AW. First, some studies have demonstrated the value of diagnostic ICU-AW using the Barthel Index[32], grip strength[33], ICU Mobility Scale[34], de Morton Mobility Index[35], and the Physical Function Intensive Care Test[36] using MRC-Score ≤ 48 as the standard. Second, the best cutoff point, sensitivity and specificity of neuromuscular ultrasound, electrophysiological recordings, electromyography, and other objective diagnostic methods used to diagnose ICU-AW have been verified using MRC-Score ≤ 48 as the criterion[23,37−39]. Third, scholars have constructed several models of early prediction of ICU-AW by taking MRC-Score ≤ 48 as a diagnostic criterion[40–42]. In the present study, the best cutoff point for the diagnosis of ICU-AW with CPAx-Chi was 31 points. This was verified by taking MRC-Score ≤ 48 as the criterion, and the sensitivity and specificity were good.
The kappa statistic quantifies inter-rater reliability for ordinal and nominal measures. In general, a kappa value between 0.40 and 0.60 indicates “moderate” agreement, 0.61 and 0.80 denotes “substantial” agreement, and > 0.81 reflects “excellent” agreement; a negative value for kappa represents disagreement[43–44]. The concordance of the kappa value was high when taking the MRC-Score ≤ 48 and CPAx-Chi ≤ 31 as the best cutoff points to diagnose ICU-AW for Researcher A and Researcher B.
4.5. Strengths of our study
First, two researchers assessed and collected data independently, which improved the reference value of the validation data. Second, the best cutoff point for the diagnosis of ICU-AW using CPAx-Chi was determined to be 31 points according MRC-Score ≤ 48.
4.6. Weaknesses of our study
First, our findings were limited by use of a non-randomized pool of participants chosen primarily by their availability during the study period: this may have reduced the generalizability of our findings. Second, there were specific exclusion criteria that may have stopped the potential “ceiling and floor” effects of CPAx-Chi to be tested. Therefore, to further confirm the clinical value of CPAx in assessing and diagnosing ICU-AW, it must be applied together with the MRC-Score, ultrasound, electrophysiology, and electromyography. Also, multicenter, large-sample, and randomized trials are needed to verify the best cutoff point for CPAx.