Several different segmentation algorithms have been used to determine baseline MTV in DLBCL patients. Ilyas at el investigated the SUV≥2.5, the 41%, and the „PERCIST” (≥1.5 x mean SUV + 2 standard deviations in a 3 cm3 right liver lobe VOI) methods. The three segmentation methods yielded different optimal cut-off points for predicting PFS, ranging 166-400 cm3 which is similar to our results of 123-345 cm3 [20]. The same tendency can be observed in MTV measurements of solid tumours as shown by Zhuang et al. who performed eight different segmentations in non-smal cell lung cancer patients that yielded significantly different MTV values [21].
Our data indicate that although MTV and TLG yielded only moderately promising prognostic performance and areas-under-the-curve on ROC analyses, the gradient-based segmentation algorithm resulted in the best values, especially in terms of sensitivity and diagnostic accuracy. However, as this latter algorithm is vendor-specific, its wide-spread use might be limited. TLG did not have better prognostic performance than MTV with the corresponding segmentation methods.
Apart from optimal cut-off points varying in the same patient cohort, MTV also shows a sample-dependency as markedly different values can be found among studies performed with the same (or highly similar) segmentation methodology, as in standalone studies referenced in the Ilyas paper and in a meta-analyses by Xie et al. and Guo et al., with optimal cut-off points ranging between 66 and 601.2 cm3 for the SUV≥2.5 method and between 16.1 and 550 cm3 for the 40-41% methods [4, 20, 22–26].
As radiomics become more prevalent in several imaging research fields, standardization is paramount and the authors would recommend and support collaborations similar to the Image Biomarker Standardization Initiative to make PET imaging parameters more reliable and comparable among centres [27]. Still, as a basis of nearly all calculations, SUVs are also highly variable among studies and this points to a limitation of the current multicentric study as devices had not been cross-calibrated. At present, the reproducibility of SUVs can be supported by the implementation of EARL Harmonization Programme, however, our study had been concluded before its introduction [28].
To the authors’ best knowledge, it is the first time that body weight-adjusted (bwa) MTV and TLG values are published. The aim behind the introduction of this normalization was to enable a personalized and more accurate measurement of the impact of tumour burden (normalization to body surface area or lean body mass would also be a feasible option, however, our current dataset did not include patient height in all cases thus making such calculations impossible). Despite bwaMTV and bwaTLG not yielding improved prognostic values over MTV and TLG, respectively, there were a selected few cases where body weight-adjusted MTV stratified the patient in the correct risk-group as opposed to regular MTV (Figure 6). These values could be further investigated in larger cohorts as their calculation can be easily carried out. Moreover, body surface area could also serve as a parameter for MTV normalization.
ΔSUVmax as a prognostic factor has gained a wider presence in the literature in recent years, the majority of the studies finding optimal cut-off points around 66% which our finding of 71.22% is close to [12]. Interestingly, in our study, ΔSUVmax evaluation did not result in better prognostic values than the visual Deauville-score method in the whole patient cohort.
Semiquantitative „Deauville-like” parameters may be more robust than ΔSUVmax in a multicentric setting as the variability in SUVs is at least partially mitigated by using ratios with a reference region. Neither qPET nor rPET values have an extensive literature in DLBCL, especially not in multicentric studies [13–18]. The optimal cut-off for mqPET was 1.32 in our DLBCL cohort which is highly similar to the established qPET cut-off in pediatric Hodgkin’s lymphoma patiens based on a 4-voxel-SUVpeak. The optimal cut-off for rPET of 1.54 was higher than the 1.14 and 1.4 values published by Annunziata et al. and Toledano et al., respectively, and close to Fan and coworkers’ finding of 1.6 [16–18]. In our study, both mqPET and rPET evaluation yielded moderately more accurate prognostic results than DS stratification.
Interim parameters had a higher hazard ratio in univariate Cox-regression analyses than baseline volumetric parameters while multivariate Cox-regression analysis resulted in rPET as the only independent predictor of PFS. Also, combined analyses showed that good early treatment response (i.e. DS 1-3) has a higher impact on PFS than baseline MTV. This finding is contradictory to that of published by Mikhaeel et al. who found that patients with MTV≥400 cm3 had a worse prognosis, irrespective of DS on interim scans [2]. Furthermore, in the present study the combination of baseline MTV and ΔSUVmax enabled to define a group with particularly poor prognosis (i.e. patients with high baseline MTV and DS4-5 on interim scan).