In this study, we compared the accuracy and reliability of grading intensity in STIR sequence in patients with axSpA with the computer-generated ADC parameters.
DWI and ADC are new MRI sequences and measurements in spinal inflammation in axSpA. They have been validated in previous studies (8–9, 17). In contrast to STIR sequence, they allow quantitative assessment (1, 8) of disease activity. Measurement of ADC however, is not without limitation. ADC has a wide degree of variability as a result of instrumental variation and errors, and biological variations. Therefore, a proposed solution is the normalized ADC, which calculated the ratio between the abnormal ADC and normal ADC values to eliminate the variations. At the present moment, there is still a lack of validation data on the two methods in axSpA. In this study, a higher ADC or normalized ADC is assumed to represent higher degree of inflammation.
We used the SPARCC MRI index as a reference method to score the intensity of MRI inflammatory lesions. The original definition was “The signal from cerebrospinal fluid constituted the reference for designating an inflammatory lesion as intense” (3). Our data shows human eye has ability in differentiating lesions with greater degree of inflammation from those with less degree of inflammation but different readers have different ways of MRI interpretation. Using this method, most (3 out of 4) readers were able to differentiate the image intensity of maximally inflamed areas. Two of the readers were also able to differentiate the intensity of the mean degrees of inflammation within the lesions. This suggests that readers tended to use the most inflamed area as the reference. As ADCmean would depend on the way the ROI was drawn, ADCmax could represent a more objective measurement.
Overall intensity grading of STIR MRI inflammation has poor reliability. Inter-readers agreement on intensity of lesions were only slight to fair. Significant different in ADCmax and ADCmean were only observed in the intensity grading by the rheumatologist inexperience in MRI reading (reader 3). There were also significant discrepancies in the number of “intense” lesions identified by different readers. When only the most and least inflamed lesions were included in the subgroup analyses, the reliability significantly improved. In the subgroup analyses including the lowest and highest 25th percentiles of maximum ADC only, “Intense” lesions identified by the musculoskeletal radiologist (reader 1) and rheumatologist experienced in reading MRI (reader 2) achieved substantial reliability. The number of “intense” lesions graded correctly also increased in the lowest and highest 25th percentiles. As the differences in intensity of inflammation (as reflected by the ADC parameters) between the “intense” and “non-intense” lesions were small, human eye would be inferior to computer in differentiating subtle differences in intensity. Our results showed STIR MRI could be inferior to ADC in identifying lesion intensity and is compatible with another international study (18). Experience of readers is a factor to improve the reliability of MRI interpretation.
ADC could be affected by a number of factors including the way the ROI was drawn, health of the spine and skeletal maturity (19). Age and osteoporosis have also been reported to affect the ADC (20). Although we did not directly evaluate the effect of osteoporosis in our analyses, we adjusted the ADC values for age and sex, two known risk factors for osteoporosis (21–23). Upon adjustments, we still found positive associations between ADCmax and “intense” lesions identified by 3 of the readers. Positive associations were also found between ADCmean and “intense” lesions identified by the rheumatologist inexperienced in reading MRI (reader 3), as well as nADCmean and “intense” lesions identified by the medicine trainee (reader 4). The results suggested STIR MRI could differentiate the degree of inflammation despite the effect of age and sex.
STIR MRI showed poor ability to differentiate different nADC values. As a matter of fact, no difference in nADCmax was observed between the “intense” and “non-intense” lesions by different readers. nADC is define as the ratio of lesion ADC to normal spine ADC. The value allows comparison between different machines. At present, we are still not sure the best way to perform the normalization. However, our study only involved one MRI machine, normalization would not be absolutely necessary. Our data also showed ADC acquired from a single MRI machine outperformed nADC as the former value appeared to be less affected by variability in interpretation.