Many studies have shown that although it is important to quantify the degree of variation or uncertainty of the contouring, it is more important to determine the dose difference and clinical impact [10, 11, 14, 21]. In earlier work, van Rooij et al. [21] studied the accuracy of automatic delineation of dangerous head and neck organs based on deep learning techniques while using geometric indices and dosimetric indices, and they analyzed the correlation between the geometric index SDC (mean value of the DSC) and dose difference. That study found that there was a weak correlation between the SDC and ΔD for all of the organs in danger through automatic segmentation, r = − 0.24, P = 0.002, but the correlation was not specific to a certain organ in danger or a certain patient. This is similar to the results described in this study. We found that the geometric indices obtained by geometric transformation were significantly correlated with the dosimetric indices, but with a certain geometric transformation form, the situation was different.
In this study, there was a strong and significant correlation between the geometric indices and dosimetric indices in the translation, scaling, and rotation transformations of the PTV, but the results for the sine function transformation were not significant or weak; the correlation for the PTV down shift transformation was strong, but the correlation for the Core down shift transformation was weak and not significant. This indicates that there is only a statistically significant overall correlation between the geometric indices and dosimetric indices, but this correlation is not consistent for the different forms of geometric transformations and organ types. This shows that the correlation between geometric indices and dosimetric indices can be affected by many factors, such as the method of geometric transformation, the relative positions of the target and organs at risk, the shape and size of the ROI, and the constraint goals of the radiotherapy plan.
In this study, the correlation coefficient obtained with the PTV up shift transformation was lower than that of the other two translation transformation methods. In order to avoid high doses of radiation, the high dose area of the PTV should ideally be far away from the OAR in the plan design. Therefore, when the up shift transformation occurs in this area, the minimum dose (D98%) of the target basically does not change, thus resulting in a weaker correlation coefficient. This is in contrast to the study by Lim et al. [10], which found that the correlation between geometric indices and dosimetric indices was affected by the goals of the treatment plan. Feng et al. [22] considered that the contour changes of oropharyngeal carcinoma OAR had little effect on the dose, but Nelms et al. [23] believed that this had a great effect on the dose. The systematic simulation research in this study indicates that there is no definite correlation between geometric indices and dosimetric indices. The weak correlation and inconsistent relationship between geometric indices and dosimetric indices also help explain the contradictory content in the above literature.
Nowadays, most contour research only uses geometric indices to evaluate the acceptability of contouring results. With the aim of using an objective evaluation method (for geometric indices), it is generally believed that a DSC value greater than 0.7 means the contouring result is clinically acceptable [24–27], while for distance-type geometric indices (HD, HDmean, HD95), smaller values mean that the contouring result is clinically acceptable, and the closer the value is to zero, the better the contouring result is. However, this statement is ambiguous. The article about the study of automatic segmentation of OARs showed that an (Liver) HD = 15.770 ± 1.0 mm had a high accuracy [27], while another article indicated that an HD = 37.7 ± 13.8 mm also showed a high accuracy [28]. Although the former value is smaller than the latter, due to the lack of a standard reference, which one has the higher accuracy cannot be clearly stated. This makes HD results not comparable and not readily distinguishable.
In this study, we found that the changes in geometric indices and dosimetric indices did not correspond to each other. It can be seen from Fig. 2 that the HDmean changed rapidly during the PTV translation, rotation, and scaling. When HDmean was equal to 5 mm, the dose difference was not clinically acceptable. The distance-type geometric indices of scaling transformation were consistent, and the HD, HDmean, and HD95 were all indistinguishable from scaling differences. The distance-type geometric indices of the sine function transformation were very small, and the corresponding dose differences were also very small and all within the clinical acceptable range; although the distance-type geometric indices of up shift transformation exceeded 8 mm, the dose differences were still within the acceptable range. The geometric indices of left shift, up shift, and sine function transformation for dangerous organs were within the range of less than 2 mm, and D2% was within 5% of the dose difference, but the D2% of the up shift and scale transformation was outside the range, which shows that it is not reliable to evaluate the contour results only by geometric indices, which is consistent with the opinions of Kaderka et al. [11]. When the values of DSC and Jaccard are between 0.5 and 0.7, the corresponding dose difference is very small, and the difference is also clinically acceptable, which also proves that it is unreliable to set 0.7 as the clinically acceptable threshold of the DSC [29].
Beasley et al. [30] also reported that when measured with a suitable spatial metric, the higher the geometric accuracy of the contour, the smaller the dose difference should therefore be reflected, and vice versa. Our research showed that when the geometric index was within the acceptable threshold range, the corresponding dose difference can be clinically unacceptable, or when the geometric index was beyond the acceptable threshold range, the corresponding dose difference can be clinically acceptable. Although geometric indices reflect the difference between a specific contour and reference contour, the inconsistent response between geometric indices and dosimetric indices makes it meaningless to evaluate the clinical acceptability of contour results only by geometric indices, and geometric indices cannot predict the clinical dose difference.
Our research systematically introduced the geometric transformation of translation, scaling, rotation and sine function. Under the reference dose distribution, we analyzed the feasibility of clinical evaluation of geometric indices according to the geometric transformation type. And we discussed the influence of position distance change caused by multiple geometric errors of a structure between geometric and dosimetric indices, but the actual relationship between them is very complex. In addition, as this study is a simulation experiment, we should further explore the structures’ size and properties of actual patient data.