Feasibility of training federated deep learning oropharyngeal primary tumor segmentation models without sharing gradient information

doi:10.21203/rs.3.rs-4644605/v1

Download PDF

Article

Feasibility of training federated deep learning oropharyngeal primary tumor segmentation models without sharing gradient information

https://doi.org/10.21203/rs.3.rs-4644605/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Federated deep learning is a method for training a deep learning neural network model on vast amounts of privacy-sensitive patient-related data without having to exchange the data itself. A vulnerability exposed during federated model training relies on intercepting gradients during the training process. In this work, we show that it is feasible to train a global model for segmenting an oropharyngeal tumor in a simulated federated consortium, without any sharing of gradients, only the local model weights. This work investigates the effects of federated averaging of model weights on model performance, and describes the model evolution during federated learning. We show that a federally-trained model is functionally equivalent to a centrally-trained one. In conclusion, the preferred mode of federated deep learning chosen was synchronous federated averaging of partial model at the end of every epoch. In terms of future work, we surmise that segmentation performance could be significantly improved by using multi-modality co-registered images, such as PET and CT, in federated deep learning for automated segmentation using the current work as basis.

Biological sciences/Cancer/Head and neck cancer

Physical sciences/Mathematics and computing/Information technology

Physical sciences/Mathematics and computing/Computer science

Oropharyngeal cancer is one type of a family of rare malignancies that arise from mutations in cells lining the upper aerodigestive tract. Over 50,000 cases are diagnosed in Europe each year, with roughly one in five of these occurring in persons aged under 55 years [1]. Accurate segmentations of the oropharyngeal gross tumor volume (GTV) are needed to plan and deliver a successful radiotherapy (RT) intervention. This is a time-consuming task being done presently by experienced radiation oncologists, however deep learning neural networks (DLNNs) have recently demonstrated some promising results for automatically segmenting the GTV [2] .

Real world radiotherapy (RT) data consisting of clinically approved as-treated GTV delineations are an essential raw material for training robust DLNNs. The highly standardized RT workflow mandates that GTVs, plus metastatic lymph nodes, and nearby organs that could be unintentionally damaged by radiation must be delineated by clinical experts. These expert-made delineations are done on standardized and quality-controlled pre-treatment RT planning Computed Tomography (CT) images, wherein calibrated image intensities (i.e., Hounsfield scale units) are used quantitatively to compute absorbed radiation dose in biological matter.

Federated learning [3] is a paradigm of working within the limitations of sharing vast amounts of privacy-sensitive real-world data to train models, including DLNNs. An external attack surface has been identified, during a federated deep learning (DL) model training process, that suggests a method of breaching data privacy that relies on gradient information interception [4]. Being able to train a DL model federally without exchanging any gradient information would reduce risk of privacy breaches. Inversion of a model remains a theoretical possibility, but this will be true for any model where the weights are accessible, irrespective of whether it was trained federally or centrally.

In this work, we aim to prove the feasibility of training a deep learning model for oropharyngeal GTV segmentation in a federated learning simulation, but without transmitting any model gradients between cooperating parties. Furthermore, we evaluate the performance of the models in terms of geometric accuracy, and confirm that a federally trained model is functionally equivalent to a centrally trained one.

After selecting non-metastatic, primary oropharyngeal squamous cell carcinomas (OPSCC), from four datasets on The Cancer Imaging Archive (TCIA) [5], the training set consisted of 1223 volumetric CTs with associated primary GTV reference segmentations made by radiation oncologists – please see Data Availability. A holdout validation subset of 20% out of each collection was created by random sampling without replacement (holdout sample sizes : HN1 = 18, Montreal = 48, OPC = 101 and MDA = 80). A single private institutional dataset named “HN3” (n = 154) was used entirely as an independent test set; HN3 was demographically the same population as HN1 – please refer to the Data Availability statement.

A deep learning architecture was built on top of a squeeze-and-excitation normalization model that has been experimented elsewhere for HNC GTV segmentation [2,6]. Summarizing, the model was a 3D U-Net architecture with ResNet elements, with squeeze and excitation normalization blocks following each convolutional block that allowed the network to find the ideal normalization for Hounsfield units during training. We added attention gates to each spatial resolution level after each skip connection, aiming to accelerate the localization of the GTV and enhance neuron activations in important areas [7]. Extra up-sampling paths were added to allow low-resolution features to have an impact further along in the network. At the end, network attention was visualized using Grad-CAM++ [8].

To simulate model development and updating of weights via multi-institutional federated deep learning, we intentionally isolated each of the aforementioned training datasets. Identical copies of the deep learning architecture were trained entirely separately on each dataset, then only the model weights of the four partial models were combined into one global model using the synchronous federated averaging (FedAvg) [9] algorithm. Averaged weights were then used as starting state for the next subsequent epoch of training. During each epoch of training, each partial model iteratively ran through all of its training samples. The experiments thus consisted of performing the federated averaging (i) every 1 epoch - “FedAvg1”, (ii) every 5 epochs – “FedAvg5” and (iii) every 10 epochs – “FedAvg10”, up to a grand total of 100 epochs in each experiment.

The median and interquartile range of the Dice Similarity Coefficient (DSC) scores are summarized in Table 1. The holdout-validation performance was marginally best for FedAvg1 in HN1 and Montreal, and for FedAvg10 in OPC and MDA. In the external test dataset HN3, FedAvg1 performed best. Overall, the performance of FedAvg1, it was functionally equal to the situation where all the training data were combined into one single set and trained on it in entirety (i.e. “centralized”). The best 95th percentile Hausdorrf Distance (HD95) in holdout validation was observed for FedAvg10–7.2, 9.0, 9.9, 9.7 and 11.9 mm, respectively - for HN1, OPC, MDA, Montreal and HN3 (test). In comparison, the metrics from centralized training were 7.9, 9.2, 10.4, 13.8 and 9.2 mm, respectively.

Table 1

Median Dice similarity coefficients (DSC) of predicted versus reference GTV segmentations in holdout validation subjects, as a function of synchronous federated averaging every 1, 5 or 10 epochs. Interquartile range in DSC given in parentheses under median values. The asterisk (*) indicated the best median DSC in each dataset. The centralized training DSC results are given for comparison. HN1, OPC, MDA and Montreal are publicly available on TCIA. HN3 is a completely separate independent test set, which was a private dataset that may be requested from the institution.
	HN1	OPC	MDA	Montreal	HN3 (test)
FedAvg1	0.73* (0.16–0.77)	0.65 (0.55–0.73)	0.67 (0.45–0.78)	0.63* (0.46–0.75)	0.62* (0.33–0.76)
FedAvg5	0.72 (0.60–0.74)	0.66 (0.55–0.73)	0.65 (0.49–0.79)	0.52 (0.41–0.72)	0.55 (0.30–0.73)
FedAvg10	0.72 (0.59–0.79)	0.68* (0.54–0.76)	0.69* (0.50–0.77)	0.62 (0.46–0.73)	0.59 (0.30–0.74)
Centralized	0.68 (0.20–0.77)	0.69 (0.52–0.76)	0.69 (0.42–0.72)	0.61 (0.42–0.72)	0.65 (0.41–0.74)

The DSC of each subject in the holdout and test sets are presented in Fig. 1. Statistically significant differences (p < 0.05) in DSC were not detected, neither within datasets across all methods of training nor when comparing centralized versus any of these federated averaging methods. The p-value was calculated through two-sided Mann-Whitney-Wilcoxon tests with Bonferroni correction for multiple testing.

However, the differences in performance as a function of total size of GTV was statistically significant. Figure 2 shows the DSC for holdout validation subjects (and test subjects in the case of HN3) in each dataset, grouped according to total GTV size. Only results of the FedAvg1 model are presented in the figure. The GTV size groupings were GTV ≥ 10cm³ versus GTV < 10cm³. Model performance in the smaller GTVs were poorer in terms of median DSC and interquartile spread, and has the overall effect of dragging the median DSC results downwards. The median scores for the smaller versus the larger tumors were summarized in Table 2. It is clear the effect is unrelated to federated learning, since the same size-dependence is observed with centralized training. In terms of DSC, the FedAvg1 model is functionally the same as the centrally-trained model, across both GTV ≥ 10cm³ and GTV < 10cm³ sub-groups.

Table 2

Median Dice similarity coefficients (DSC) of the FedAvg1, FedAvg1, FedAvg1 and centrally-trained models in the holdout validation subjects, grouped according to GTV size ≥ 10cm³ versus GTV < 10cm³. The differences in DSC between larger versus smaller tumors are statistically significant, but differences due to averaging every 1, 5 or 10 epochs for a given tumor size subgroup were not statistically significant.
	GTV size	HN1	OPC	MDA	Montreal	HN3 (test)
FedAvg1	≥ 10cm³	0.77 (0.74–0.80)	0.70 (0.61–0.76)	0.72 (0.56–0.80)	0.67 (0.53–0.76)	0.72 (0.55–0.78)
FedAvg1	< 10cm³	0.01 (0.01–0.20)	0.51 (0.40–0.60)	0.28 (0.20–0.42)	0.45 (0.30–0.66)	0.36 (0.01–0.59)
FedAvg5	≥ 10cm³	0.74 (0.72–0.77)	0.69 (0.64–0.77)	0.69 (0.59–0.79)	0.61 (0.42–0.72)	0.70 (0.55–0.76)
FedAvg5	< 10cm³	0.21 (0.10–0.66)	0.52 (0.40–0.61)	0.45 (0.15–0.50)	0.45 (0.36–0.53)	0.27 (0.12–0.44)
FedAvg10	≥ 10cm³	0.77 (0.75–0.79)	0.69 (0.64–0.78)	0.73 (0.60–0.80)	0.65 (0.54–0.74)	0.69 (0.53–0.76)
FedAvg10	< 10cm³	0.48 (0.14–0.66)	0.52 (0.42–0.62)	0.36 (0.29–0.58)	0.45 (0.40–0.62)	0.29 (0-0.54)
Centralized	≥ 10cm³	0.76 (0.69–0.79)	0.71 (0.62–0.77)	0.71 (0.61–0.78)	0.61 (0.48–0.75)	0.72 (0.60–0.78)
Centralized	< 10cm³	0.06 (0.05–0.35)	0.52 (0.40–0.66)	0.30 (0.21–0.44)	0.45 (0.27–0.68)	0.41 (0.24–0.63)

Examples of automated segmentations produced by model FedAvg1 in validation subjects are provided in Fig. 3. Overall, the model accurately locates the middle of the GTV, but the extreme outer boundaries of the tumor were not often in exact agreement with the radiation oncologists’ delineation. Grad-CAM activation maps were only partially helpful, such that the vast majority of subjects showed activations near the GTV, but the GTV was not always consistently contained within the strongly activating area.

Lastly, in Fig. 4, we show the training and validation loss curves obtained during the centrally-trained, FedAvg1, FedAvg5 and FedAvg10 epoch experiments. There will always be a very small periodically repeating spike located every 1, 5 or 10 epochs due to the federated averaging step itself, where all the partial models trained on the disparate datasets are combined into a single global model, which is then used as the starting state for the next training iteration. In addition to this, large extra spikes occurs periodically, roughly every 10 epochs as can be seen in the training and validation loss curves of FedAvg1. This large spike is independent of the frequency of the federated averaging, and appears to be a feature of the training mechanics that is not settable with the typical hyperparameters. This pattern overlaps with the intentional averaging every 10 epochs in the FedAvg10 training, but manifests itself quite differently in the FedAvg5 training with a significant jump near epoch 60. Such transients seemed to be highly reproducible and was seen over three repetitions of all these experiments, and was furthermore independent of weight initialization at the beginning. Nonetheless, the training transient did not appear to affect the actual performance of the final selected model.

This study shows that it is possible to train a GTV segmentation model for oropharyngeal carcinoma in a federated learning fashion, without the need to exchange gradients between institutions. This removes the opportunity to eavesdrop on gradients, thus eliminating a potential threat to privacy breach in federated deep learning. Furthermore, with synchronized federated averaging (FedAvg) of partial model weights into a global model every epoch, we showed that we can obtain a federated model that is functionally equivalent to training on all the data centrally. In terms of geometric performance, it is clear that GTVs smaller than 10cm³ in total volume poses a significant challenge for automated segmentation. Future work will explore whether splitting the deep learning model for large and small GTVs, or considering an ensemble of models, are feasible options.

There remain a number of clinical issues in a real-world federated deep learning training run that we did not address in the present study. Inter-observer delineation differences and contouring quality disparities persist between radiotherapy clinics. Our approach for multi-institutional collaboration suggests we can train a deep learning model towards an average global consensus, but we cannot address the underlying inter-clinician disagreements solely by federated learning alone.

Secondly, we have only used CT images in this work, which is by far the most common imaging modality in radiotherapy, but other studies [10] show that combining multi-imaging inputs, such as PET, CT and MRI, lead to significantly better geometric agreement with radiation oncologists’ delineations of GTV. In practice, it is probable that clinicians use multiple sources of information, including PET and MR images as well as text reports by other clinical specialists, when delineating the GTV. However, these non-CT sources of information that influence the clinician are not systematically explicated in the available datasets, therefore the sole use of CTs is a limitation of the work in this manner.

Thirdly, we surmise that while inversion from updated model weights could be possible for fully-connected neural networks trained on very small sample sizes [11], this kind of model inversion would be significantly harder to execute for a realistic training run with a 3D U-Net on a large sample size, at least in comparison to intercepted training gradients.

Other than technical challenges, clinical pathways are likely different among institutions. Each institution follows its own set of guidelines and protocols leading to differences in treatment. These differences propagate into data distributions, model inputs and segmentations. Data imbalance is an issue in federated learning where some participants might have a larger dataset compared to others or a different distribution of tumor sizes and locations. A center with small sample size will have an equal impact on the average model when compared to a center with a larger data sample size. Training a deep learning model on a small dataset can lead to overfitting on that particular dataset, resulting in a high performance on that smaller dataset however that performance might not even translate to a larger, expanded in-house dataset.

Since gathering additional data might not always be a viable option, one potential solution is a weighted average global model, where larger centers will have more importance during the calculation of the average model. A potential downside of these weights, if not set carefully, is that the data from smaller centers could quickly become ignored in the training process. Inverse Distance Aggregation (IDA) and a weighting scheme based inverse training accuracy (INTRAC) have been proposed to minimize the impact of non-i.i.d. data and reduce the impact of overfitted models on the global average model [12]. Note however, that a weighting scheme for data dissimilarity is due to using multi-institutional datasets during training; this is not a problem limited to federated deep learning. Similarly, an alternative SplitAVG algorithm has been proposed as a method to overcome performance degradations due to variability in data distributions across institutions [13]. Although similar filters can be learned during the training process, coordinate-wise averaging could be detrimental to model performance as learned filters might change locations. Federated Matched Averaging has been proposed to construct a global model by matching similar features before averaging [14]. This method outperforms FedAvg in simple, shallow networks but might be difficult to extend to more advanced networks with greater number of network layers.

In a real-world training run, we may recommend the following as potential safeguards – (i) that partial model weights from each institution should consist of learning on a reasonably large but arbitrarily chosen number of subjects (e.g., 100 subjects or greater per institution), and (ii) that global model weights for each averaging epoch should be aggregated on a dedicated trusted third-party device, with deletion of the partial weights from that device after running FedAvg. In this way, neither the training institutions nor the researcher that initiated the deep learning run would have access to any partial model weights, only the globally-averaged weights on the total training sample would be available to the training run participants.

Lastly, obviating the need to exchange gradients between training institutions defeats one well-known attack surface, but other security risks still exist. In a real world situation, it might still be possible to steal access credentials, execute man-in-the-middle attacks or attempt data re-engineering from a final fully-trained model. Additional privacy-enhancing techniques such as differential privacy and homomorphic encryption may be added to boost security, but one ought to be wary of privacy enhancement technology alone as a single line of protection against potential privacy breaches.

Data extraction

Training of the deep learning segmentation network made use of four public open access datasets available on The Cancer Imaging Archive (TCIA) - Head-Neck-Radiomics-HN1 [15] (hereafter “HN1”), Head-Neck-PET-CT [16] (i.e., “Montreal”), OPC-Radiomics [17] (i.e., “OPC”) and HNSCC [18] (i.e., “MDA”). Detailed descriptions and richly descriptive metadata about these sources are available in the respective collections.

Private data was used with permission from the internal ethics review board. The data was gathered in the native Digital Imaging and Communications in Medicine (DICOM) format. Segmentations by expert clinicians were used as reference; these were downloaded as DICOM RT structure set (RTSTRUCT) files.

Pre-processing

Volumetric CT was converted from DICOM to Neuroimaging Informatics Technology Initiative (NIfTI) format using Plastimatch [19] and thereafter resampled to an isotropic voxel spacing of 1mm x 1mm x 1mm using linear interpolation with SimpleITK [20]. No editing adjustments were made to the contours defined in the RTSTRUCT files; all contours were used “as is”. Nearest-neighbor interpolation was used to isotropically resample the primary GTV binary mask into the same 1mm³-sized grid as the CT. In the event that more than one region had been drawn separately but each labelled as primary tumor, we took the union of these regions as the total primary GTV. The GTV was extracted as a binary segmentation mask in NIfTI format using Plastimatch. CT Hounsfield units were clipped between − 1024 and 1024, and the CT intensity rescaled within the range (-1,1). Cubic 144³ patches were repeatedly sampled for training, such that 75% contained the GTV and 25% did not include any part of GTV. To increase robustness, data augmentation was randomly chosen then applied to each sampled cube, consisting of cranio-caudal mirror, translation (0/30mm), rotation (-30/30 degrees), shear (-0.1/0.1), gamma intensity scale (0.8/1.2) and Gaussian blur (0/3mm). The matching volumes of CT image and GTV binary mask were either cropped or padded to 512 x 512 x N pixels, where N denoted a varying total number of CT “slices” per subject.

Deep learning hyper-parameters

ReLU activation functions were applied throughout, and an Adam optimizer with betas set to 0.9 and 0.99. Weight decay was 1e-4. A batch of 1 was used to reduce GPU memory usage. A compound function of Dice loss and Focal loss was used during training since the GTV represented only a small part of the input CT [21]. False negative predictions are thus penalized less than false positive predictions, therefore we added a weight to over-penalize false negative predictions. Dropout rate was 25% to reduce over-fitting. The learning rate followed a cosine annealing plus warm restart schedule, starting at amplitude 1e-4 decaying to 1e-7 every 10 epochs. Learning rate schedule was the same for all training runs. Starting weights were set with He initialization through a constant seed number for all training runs [22].

Performance evaluation

For inference, a sliding window method was applied with an overlap factor of 0.5. Resulting segmentation class probabilities were dichotomized as a binary mask with a threshold set at 0.5 to remove low probability predictions. No post-inference cleanup processing was applied. The geometric accuracy metric used was Dice Similarity Coefficient (DSC) and the 95th percentile Hausdorff distance (H95). Additionally, the precision and recall were calculated to quantify the model’s ability to identify tumors and the false positive regions, respectively. Statistical significance of DSC between groups was calculated using a two-sided Mann-Whitney-Wilcoxon test including Bonferroni correction for multiple testing.

System requirements

All deep learning work was conducted in Python (v3.8) and PyTorch (v1.9.0, The Linux Foundation) on Ubuntu 20.04. All experiments were performed on a computing cluster with 128GB RAM, two Intel Xeon Silver 4210 @2.2GHz CPUs (Intel, USA) and two NVIDIA GeForce 2080TI Turbo 11GB GPUs (NVidia, USA).

The source code for this project is made publicly open access: https://github.com/MaastrichtU-CDS/Federated_DeepLearn_Oropharynx_GTV

Author Contribution

L.V. and L.W. wrote all of the manuscript text. The deep learning code was produced by L.V.A.C., L.V. and L.W. jointly contributed to the design of the experiments. A.D. assisted with getting legal access to datasets.L.V. prepared the figures. L.V. and L.W. prepared the tables. A.L.G. cross-checked the experimental methods, results and discussion in detail.All authors reviewed and/or suggested revisions to the manuscript before submission.

Data Availability

Four public open access datasets were used in this study, available from The Cancer Imaging Archive. One single private institutional dataset named “HN3” can be made available upon reasonable request and is subject to legal data sharing agreement in compliance with the European Union (EU) General Data Protection Regulations (GDPR); please contact the corresponding author of this study by email to request the private data access.

F Bray, M Laversanne, H Sung, J Ferlay, RL Siegel, I Soerjomataram, A Jemal. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians. 2024; 74:205–313. DOI : 10.3322/caac.21834.
V Oreiller, V Andrearczyk, M Jreige, S Boughdad, H Elhalawani, J Castelli, et al. Head and neck tumor segmentation in PET/CT: The HECKTOR challenge. Med Image Anal. 2022 Apr 1;77:102336.
S Gupta, S Kumar, K Chang, C Lu, P Singh, J Kalpathy-Cramer. Collaborative Privacy-preserving Approaches for Distributed Deep Learning Using Multi-Institutional Data. RadioGraphics 43 (2023). DOI : 10.1148/rg.220107.
P Liu, X Xu and W Wang. Threats, attacks and defenses to federated learning: issues, taxonomy and perspectives. Cybersecurity 5, 4 (2022). DOI : 10.1186/s42400-021-00105-6.
K Clark, B Vendt, K Smith, J Freymann, J Kirby, P Koppel, et al. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. J Digit Imaging. 2013 Dec 1;26(6):1045–57.
A Iantsen, V Jaouen, D Visvikis, M Hatt. Squeeze-and-Excitation Normalization for Brain Tumor Segmentation. In: Crimi A, Bakas S, editors. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. Cham: Springer International Publishing; 2021. p. 366–73. (Lecture Notes in Computer Science).
O Oktay, J Schlemper, LL Folgoc, M Lee, M Heinrich, K Misawa, et al. Attention U-Net: Learning Where to Look for the Pancreas [Internet]. arXiv; 2018. http://arxiv.org/abs/1804.03999.
A Chattopadhay, A Sarkar, P Howlader, VN Balasubramanian. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). 2018. p. 839–47.
B McMahan, E Moore, D Ramage, S Hampson and AA Blaise. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics. PMLR, y. 2017;1273–1282.
J Ren, JG Eriksen, J Nijkamp, SS Korreman. Comparing different CT, PET and MRI multi-modality image combinations for deep learning-based head and neck tumor segmentation. Acta Oncol. 2021 Nov;60(11):1399–1406. doi: 10.1080/0284186X.2021.1949034.
D Enthoven and Z Al-Ars, Reconstructing Private Training Samples from Weight Updates in Federated Learning. [Internet] arXiv ; 2022. https://arxiv.org/pdf/2101.00159.
Y Yeganeh, A Farshad, N Navab, S Albarqouni. Inverse Distance Aggregation for Federated Learning with Non-IID Data. In: Albarqouni S, Bakas S, Kamnitsas K, Cardoso MJ, Landman B, Li W, et al., editors. Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning. Cham: Springer International Publishing; 2020. p. 150–9. (Lecture Notes in Computer Science).
M Zhang, L Qu, P Singh, J Kalpathy-Cramer, DL Rubin. SplitAVG: A Heterogeneity-Aware Federated Deep Learning Method for Medical Imaging. IEEE J Biomed Health Inform. 2022 Sep;26(9):4635–44.
H Wang, M Yurochkin, Y Sun, D Papailiopoulos and Y Khazaeni. Federated Learning with Matched Averaging [Internet]. arXiv; 2020 [cited 2023 Aug 1]. http://arxiv.org/abs/2002.06440.
Wee L, Dekker A. et al. Data from HEAD-NECK-RADIOMICS-HN1 [Internet]. The Cancer Imaging Archive; 2019. https://wiki.cancerimagingarchive.net/x/iBglAw.
Vallières M, Kay-Rivest E, Perrin L, Liem X, Furstoss C, Khaouam N, et al. Data from Head-Neck-PET-CT [Internet]. The Cancer Imaging Archive; 2017. https://wiki.cancerimagingarchive.net/x/24pyAQ.
Kwan JYY, Su J, Huang SH, Ghoraie LS, Xu W, Chan B, Yip KW, Giuliani M, Bayley A, Kim J, Hope AJ, Ringash J, Cho J, McNiven A, Hansen A, Goldstein D, de Almeida JR, Aerts HJ, Waldron JN, Haibe-Kains B, O’Sullivan B, Bratman SV, Liu FF. (2019). Data from Radiomic Biomarkers to Refine Risk Models for Distant Metastasis in Oropharyngeal Carcinoma. The Cancer Imaging Archive. https://doi.org/10.7937/tcia.2019.8dho2gls.
Grossberg A, Elhalawani H, Mohamed A, Mulder S, Williams B, White AL, et al. HNSCC [Internet]. The Cancer Imaging Archive; 2020. https://wiki.cancerimagingarchive.net/x/sIN5Ag.
J Shackleford, N Kandasamy and G Sharp. Chapter 6 - Plastimatch—An Open-Source Software for Radiotherapy Imaging. in High Performance Deformable Image Registration Algorithms for Manycore Processors (eds. Shackleford, J., Kandasamy, N. & Sharp, G.) 107–114 (Morgan Kaufmann, 2013). doi:10.1016/B978-0-12-407741-6.00006-2.
BC Lowekamp, DT Chen, L Ibáñez and D Blezek. The Design of SimpleITK. Front. Neuroinform. 2013 Dec 30;7:45. doi: 10.3389/fninf.2013.00045.
TY Lin, P Goyal, R Girshick, K He, P Dollár. Focal Loss for Dense Object Detection. IEEE Trans Pattern Anal Mach Intell. 2020 Feb;42(2):318–27.
K He, X Zhang, S Ren and J Sun, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 1026–1034, DOI: 10.1109/ICCV.2015.123.

No competing interests reported.

Download PDF

Reviewers invited by journal
12 Jul, 2024
Editor assigned by journal
10 Jul, 2024
Editor invited by journal
02 Jul, 2024
Submission checks completed at journal
01 Jul, 2024
First submitted to journal
26 Jun, 2024

You are reading this latest preprint version

Feasibility of training federated deep learning oropharyngeal primary tumor segmentation models without sharing gradient information

Status:

Version 1

Abstract

Figures

Introduction

Results

Discussion

Materials and Methods

Data extraction

Pre-processing

Deep learning hyper-parameters

Performance evaluation

System requirements

Declarations

Author Contribution

Data Availability

References

Additional Declarations

Status:

Version 1