This study underwent institutional review board (IRB) approval at our institution, which exempted the need for informed consent for data analysis (IRB number 30-2020-20). A waiver permission letter was obtained from the IRB administrators before data collection. Because the enrolled patients were not directly involved in this study (the data were obtained from chart reviews), informed consent was not required. Nevertheless, the extracted data from the medical records were stored confidentially. Our results were presented in accordance with the guidelines and recommendations established for AI research involving medical data [14, 15]. All MRI digital imaging and private information in medical files were anonymized before being used in the research.
Participant Selection
The MRI records and clinical data for a total of 294 patients with DCM, recorded between March 2010 and September 2022, were retrospectively reviewed. Individuals were included according to the following criteria: (a) age > 17 years, (b) had been diagnosed with mild DCM using spinal MRI, (c) had no previous history of cervical spine surgery, (d) had an exact date of diagnosis for the DCM, (e) had a baseline MRI for the same date as the diagnosis, and (f) had posterior cervical spine decompression surgery for the DCM within five years after diagnosis (74 cases) or had not (220 cases). To obtain a full range of MRI data, patients were not excluded based on (a) whether they had Parkinson’s disease or (b) the location of the DCM. More demographic details about participants are given in Table 1.
Table 1
Statistical characteristics of included patients.
Characteristics | Data |
Patients (n = 294) | |
Female: Male, number (%) | 99 (33.7): 195 (66.3) |
Mean age at diagnosis ± SD, years | 58.3 ± 14.0 (range, 17–90) |
Number of proceeding posterior surgery | 74 |
Primary Lesion history (n = 294) | |
Cervical spine surgery, number (%) | 0 (0) |
Different vertebrae surgery, number (%) | 16 (5.4) |
Cervical trauma, number (%) | 67 (22.8) |
Follow up duration | |
1st outpatient ~ Diagnosis, Mean F/U ± SD, months | 8.3 ± 22.2 (range, 0–125) |
1st outpatient ~ Operation, Mean F/U ± SD, months | 2.5 ± 21.2 (range, 0–118) |
1st outpatient ~ Last outpatient, Mean F/U ± SD, months | 25.4 ± 36.7 (range, 0–150) |
1st outpatient ~ Last outpatient, Mean F/U ± SD, months (without operation) | 20.9 ± 36.5 (range, 0–147) |
MRI findings | |
DCM location upper, number (%) | 20 (6.8) |
DCM location mid, number (%) | 23 (7.8) |
DCM location lower, number (%) | 37 (12.6) |
DCM location mix, number (%) | 45 (15.3) |
MCC, Mean ± SD, % | 57.4 ± 13.6 (range, 17–85) |
MSCC, Mean ± SD, % | 6.0 ± 15.3 (range, -5.3–68.8) |
Abbreviations: MRI, magnetic resonance imaging; SD, standard deviation; MSC, maximum canal compromise; MSCC, maximum spinal cord compression; |
MRI Examination
Cervical spine MRI scanning was performed with a 3T (Vida, Siemens Healthineers) MRI scanner. We used images from a turbo spin echo T2-weighted sequence in the sagittal plane (T2-TSE Sag). The MRI parameter values for the T2-TSE Sag were slices for a group = 15, distance factor = 10%, position = isocenter, phase encoding direction = head to feet, phase oversampling = 50%, field of view (FOV) = 200 × 200 mm, slice thickness = 3.0 mm, repetition time (TR) = 3,500.0 ms, echo time (TE) = 82.0 ms, flip angle = 110°, average = 3, and concatenation = 1. The number of slices was occasionally different from 15 because of the variation in spine shapes in diseased patients.
Data Preprocessing and Augmentation
We utilized a labeling procedure to create a bounding box for the region of interest (RoI) and binary masks for the spinal canal and cord in 3–5 frames from each MRI recording. The selected frames in this MRI sequence were the key frame (middle frame of the sequence) and the front and back frames of the sequence. Because the spinal canal image has the cord in the middle for a sagittal-view MRI, the binary mask of the canal was represented by integrating the canal and cord area for simplicity.
Each MRI was preprocessed using a series of steps: (a) N4 bias correction to mitigate inherent biases, (b) quantile clipping to eliminate outliers among pixel values, (c) min-max normalization, (d) RoI cropping, where the RoI bounding box was labeled during the labeling phase, and (e) resizing to a uniform resolution of 256 × 256 pixels, the size mode for the cropped frames.
To ensure the stability of the pixel distribution while seeking a consistent structure for the medical images, additional augmentation methods, including rotation and random cropping, were adopted. The total dataset was partitioned into a training subset (235 cases) and a testing subset (59 cases) in an 8:2 ratio. In this way, 1,164 frames and 280 frames were utilized in the training and test phases, respectively.
Development of the Models
As shown in Fig. 1, we developed two different models to enable comparisons of segmentation performance, namely an autosegmentation model and an interactive segmentation model. Both models predict binary masks for the spinal canal and cord separately because of overlapping areas in each ground-truth mask.
For DL, an NVIDIA RTX A5000 (NVIDIA, Santa Clara, CA, USA) graphics processing unit was utilized. DL was executed using Python 3.8.10 and the PyTorch 2.0.0 framework on the Ubuntu 20.04.5 operating system. The Visual Studio Code application (Microsoft Corp., Redmond, WA, USA) was also used in the experiments.
We leveraged U-Net for the autosegmentation model, which uses ConvNeXT-tiny pretrained on ImageNet as its encoder. The randomly initialized CNN-based decoder used features encoded from each layer of the encoder as skip connections.
Recent interactive segmentation models[14, 15, 16] are large models constructed by vision transformers. Despite having many parameter settings, fine-grained supervised fine-tuning enhances the vision foundation model[17]. In this context, we employed an interactive segmentation model structured by SimpleClick[15], which uses a plain vision transformer pretrained on COCO-LVLS[18] as a backbone. The set comprising an MRI frame, the previous mask, and the number of clicks were used as the input. One reason for choosing this model was its fine-tuning performance with an MRI dataset, giving an 88.98% mean intersection over union (mIoU) result with 10 clicks on BraTS, which has a similar modality to our dataset. We evaluated both the autosegmentation and interactive segmentation models via this fine-tuning approach.
Conventional Statistical Analysis
The results for the interactive segmentation model will be modified continuously as additional manual inputs are included. Therefore, its performance could be enhanced theoretically without limit. In our experiments, we limited the number of additional clicks to 10 and evaluated it in terms of both accuracy and efficiency.
First, we obtained a DICE score to evaluate the segmentation performance in terms of accuracy. We tracked changes in the DICE score over 10 clicks to conduct a self-evaluation of the interactive segmentation model. We then compared the DICE scores for autosegmentation and interactive segmentation after the 10th click.
Next, we used the number of clicks to evaluate the models’ efficiency. We recorded the number of clicks required to achieve mIoU values of 80%, 85%, and 90%. (To avoid infeasible numbers of clicks, we limited the number to 20.)