Deep learning models predicting hormone receptor status in breast cancer trained on females do not generalize to males: further evidence of sex-based disparity in breast cancer

doi:10.21203/rs.3.rs-2996566/v1

Download PDF

Article

Deep learning models predicting hormone receptor status in breast cancer trained on females do not generalize to males: further evidence of sex-based disparity in breast cancer

https://doi.org/10.21203/rs.3.rs-2996566/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 08 Nov, 2023

Read the published version in npj Breast Cancer →

You are reading this latest preprint version

Breast cancer prognosis and management for both men and women are reliant upon estrogen receptor alpha (ERα) and progesterone receptor (PR) expression to inform therapy. Previous studies have shown that there are sex-specific binding characteristics of ERα and PR in breast cancer and, counterintuitively, ERα expression is more common in male than female breast cancer. We hypothesized that these differences could have morphological manifestations that are undetectable to human observers but could be elucidated computationally. To investigate this, we trained attention-based multiple instance learning prediction models for ERα and PR using H&E-stained images of female breast cancer from the Cancer Genome Atlas (TCGA) (n = 1085), and deployed them on external female (n = 192) and male breast cancer images (n = 245). Both targets were predicted in the internal (AUROC for ERα prediction: 0.86 ± 0.02, p < 0.001; AUROC for PR prediction = 0.76 ± 0.03, p < 0.001) and external female cohorts (AUROC for ERα prediction: 0.78 ± 0.03, p < 0.001; AUROC for PR prediction = 0.80 ± 0.04, p < 0.001) but not the male cohort (AUROC for ERα prediction: 0.66 ± 0.14, p = 0.43; AUROC for PR prediction = 0.63 ± 0.04, p = 0.05). This suggests that subtle morphological differences invisible upon visual inspection may exist between the sexes, supporting previous immunohistochemical, genomic, and transcriptomic analyses.

Health sciences/Oncology/Cancer/Breast cancer

Health sciences/Medical research/Biomarkers/Prognostic markers

Male breast cancer (MBC) is a rare condition that accounts for approximately 1% of all breast cancer cases worldwide^1,2. Its clinical management generally follows established strategies evidenced from female breast cancer (FBC). However, this may not be an optimal approach, as mounting evidence shows sex-specific differences in the molecular make-up, prognostic factors, and clinical demographics in BC^3–5.

For both sexes, prognostication and treatment decision making is dependent upon the expression profiles of the nuclear hormone receptors ERα and PR, currently determined by immunohistochemistry. High expression of ERα and PR are both predictors of improved outcome in MBC, associated with improved overall and disease-free survival, older age of diagnosis, low mitotic index, and lower pathological stage^6–13 The expression of these receptors is notably different between MBC and FBC. Contrary to females, BC in males is almost universally ERα positive (95% in MBC vs. 75% in FBC). PR positivity is observed in 82% of MBC and 65% of FBC cases^1,5,6,14,15.

Chromatin binding characteristics of ERα and PR differ by sex. In MBC, PR binding sites often lack ERα, while in females, PR can modulate ERα binding¹⁶. Adding to this evidence, hierarchical clustering studies have shown independent PR clusters in males, whereas in females, ERα and PR activity cluster together¹⁷. Additionally, mathematical modeling of immunohistochemical staining has failed to show any continuous dependence effect of PR on ERα for MBC, in direct contradiction to FBC¹⁸.

Although sex-specific molecular differences in breast cancer have been demonstrated in multiple studies, there are no obvious morphological differences between MBC and FBC following visual inspection of hematoxylin and eosin (H&E) stained BC tissue sections. Consequently, MBC is classified and reported in the same way as FBC^1,19, despite evidence that the well-known molecular subtypes in FBC may not be reflected in males. Such a non-specific approach is discordant with the differences in the distribution of histological subtypes. This also calls into question the existence of morphological disparities that manifest due to the sex-specific regulatory nature of BC which are not obvious to a human observer but may be elucidated using deep learning (DL) methods.

H&E-stained tissue sections are the primary diagnostic tool for cancer patients with solid tumours, and are commonly available and accessed with relative ease^20,21. Recent work has shown that digitally scanned whole slide images (WSIs) of H&E-stained slides contain a wealth of previously hidden information which are not obvious to a human observer, but may be elucidated using computational models and could be of prognostic value^22–24. The development of such artificial intelligence (AI) based algorithms means it is now possible to extract and quantify this information^23–25.

Convolutional neural networks (CNNs) have been able to predict a range of clinical characteristics in FBC, such as grade, histological subtype, PAM50 intrinsic subtype, and ERα status^26–28 directly from H&E-stained WSIs. Historically, biomarker prediction in computational pathology has employed the training of DL networks from pathological tumor annotated regions on the whole slide images. Only this region-of-interest (ROI) is then tiled, with each tile retaining the “tumor” annotation. Thus, “healthy” tissue or background is excluded from the analysis. However, this method may not be optimal due to two reasons. First, the ROI may contain regions that are not morphologically important for the target prediction^20,29−31. Second, the tissue architectures surrounding the ROI that get rejected as background tiles may contain essential information for improved performance of the prediction model. To address these issues, our study employed a weakly supervised learning pipeline using slide-level annotations of biomarker status, which consider all types of tissue architectures in the WSI without any information loss and can be accessed with relative ease from patient records. For tile-to-slide level aggregation, we used a multiple instance learning pipeline with an attention component (attMIL)³².

In view of the evident sex-based differences of BC, we evaluated the efficacy of attMIL pipelines in predicting ERα and PR status in both MBC and FBC patients aiming to provide evidence of possible morphological differences between the sexes. We hypothesized that sex-based molecular differences may manifest in the morphological features contained in the tissue architecture which could be predictive of the hormone receptor status of the tumor in H&E-stained slides. To our knowledge, this is the first time DL-based investigations in BC have been applied specifically to a male cohort.

attMIL models can predict ERα and PR status from H&E WSIs in FBC

We investigated whether attMIL-based DL models can predict hormone receptor status for ERα and PR in FBC WSIs. To do this, we used patient-level training and 5-fold cross-validation on the TCGA-BRCA FBC cohort (n = 1085). Our predictions for ERα and PR showed mean area under the receiver operating characteristics (AUROCs) > 0.6 (0.86 ± 0.02, p < 0.001 and 0.76 ± 0.03, p < 0.001), respectively.

Next, we tested the hormone receptor prediction models on FBC WSIs independent of the training set by deploying them on a validation cohort of 192 FBCs. Performance of the models was assessed by the detection ability of both ERα and PR. The AUROCs were 0.78 ± 0.03 (p < 0.001) for ERα and 0.80 ± 0.04 (p < 0.001) for PR.

Collectively, these data show that attMIL-based prediction models for ERα and PR status in FBC can be predicted directly from H&E-stained WSIs. AUROCs for FBC cohorts are shown in Fig. 1a-b and 1d-e and accuracy metrics are provided in Supplementary Table S1.

Prediction models trained on FBC images do not generalize to MBC

To test whether the attMIL-based prediction models are sex-invariant, we deployed the previously trained DL models on a combined set of MBC cases (n = 183). For both ERα and PR, large performance drops were observed, with AUROCs of 0.66 ± 0.14 (p = 0.43) and 0.63 ± 0.04 (p = 0.05), respectively. This indicated that the discriminatory power of prediction models for both ERα and PR trained on FBC images were poor when applied to males. ROCs for the MBC cohort are shown in Fig. 1c and 1f. Accuracy metrics are provided in Supplementary Table S1.

Hormone receptor prediction models in FBC are sensitive to the target they were trained to detect

We evaluated the sensitivity of DL-based prediction models to the biomarker target they were trained to detect by applying an ERα prediction model to detect PR status and vice versa on the external validation dataset of FBC. The AUROC for the ERα model detecting PR status was 0.56 ± 0.03 (p = 0.45). For the PR model detecting ERα status, it was 0.60 ± 0.03 (p = 0.06). Neither model achieved statistical significance nor exceeded the 0.6 baseline AUROC, indicating poor discriminatory power for the target they were not trained to detect. Figure 2 shows the ROCs for both experiments.

attMIL model predictions for ERα and PR are validated by immunohistochemistry in FBC

To better understand how the attMIL-based prediction models make decisions, we investigated if the spatial distribution of prediction and attention scores aligned with immunohistochemistry, using the FBC WSIs. These scores for ERα and PR were visualized separately on matched immunohistochemistry (IHC) WSIs. The spatial resolution of the prediction score heatmaps were not focused on any specific region of the WSIs. They represented the probability of each constituent tile being classified as positive or negative, resulting in a diffuse color map (red or blue). Representative examples, in which both targets were correctly classified, are shown in Fig. 3a(i) for ERα and 3b(ii) for PR.

The heatmaps showing the distribution of attention scores were more specific to certain regions in each WSI. High attention regions were concentrated on tumor tissue for both ERα and PR, although to a lesser extent for PR. Matched IHC WSIs showed that the attention heatmaps are concordant with the staining patterns, especially for ERα. For PR, the attention score distribution was more diffuse than in the ERα map, and the corresponding PR IHC staining revealed less positivity compared to ERα. Representative attention heatmaps are shown in Fig. 3a-b adjacent to the respective prediction heatmaps and IHC WSIs. The H&E-stained WSI used for these predictions is shown in Fig. 3c.

Tissue architectures with highest attention scores are concordant with receptor expression profiles in both sexes

We hypothesized that the histological features associated with ER and PR expression profiles should be similar and investigated whether the prediction models recognised this for both targets. To do this, image tiles with the highest attention scores were identified and collated for each target’s positive and negative classes for FBC internal and external validation cohorts, and the MBC cohort. We observed that the features returning top attention scores for both targets were not only similar but were also conserved for both sexes. Both ERα and PR positive tiles displayed clearly differentiated tumor and stromal regions, while ERα and PR negative tiles showed poorly differentiated cells, high levels of immune infiltration, and necrosis. Collated tiles with top attention scores for both targets in both FBC and MBC cohorts are shown in Fig. 4.

attMIL-based prediction models are invariant to color normalization and do not exhibit domain shift

AUROC values can sometimes misrepresent the performance of a prediction model as they do not provide any information regarding domain shift³³. To investigate whether either of our prediction models contained domain shift, we visualized the distribution of the model prediction scores for each hormone receptor target in all patient cohorts. Prediction scores for both targets were similarly distributed and free of domain shift for each cohort, regardless of Macenko normalization. Prediction score distributions for each target in each cohort both with and without normalization are summarized in Supplementary Figures S1-2.

Applications of DL-based techniques in BC pathology have been studied since 2010, including diagnostic (e.g., detection of primary tumor tissue and metastatic deposits, grading, subtyping, assessment of tumor microenvironment etc.), prognostic (e.g., assessment of tumor morphological features with respect to outcome), and predictive (e.g., assessment of therapy response in relation to morphological features) targets/biomarkers³⁴. Concerning prediction of hormone receptor status, patch-based²⁸, and tissue microarray-based²⁷ algorithms have been explored with varying degrees of success. Multiple instance learning (MIL) without an attention component on full-face WSIs has been used to determine ERα status achieving an AUROC of 0.92 – a considerable improvement over the patch-based approach²⁶. These techniques could be insightful in understanding the biological behavior of BC in males and females. However, these have previously been unexplored for that purpose.

We aimed to investigate the generalizability of DL-based techniques in MBC, specifically exploring their applicability across both sexes. Our hypothesis was rooted in the notion that the distinct binding characteristics of ERα and PR could manifest as morphological variances. Consequently, we hypothesized that if there were no substantial variations in morphological features between FBC and MBC, an attMIL model trained on an FBC dataset should perform equally well and exhibit similar accuracy in predicting hormone receptor status in an MBC dataset. Conversely, if there were discernible sex-specific differences in morphological features, predictive models trained on FBC images would likely demonstrate suboptimal performance in an MBC dataset.

We used the attMIL approach with a self-supervised learning-based RetCCL-trained feature extractor to predict ERα and PR in both MBC and FBC. Prediction models were trained on FBC images from the TCGA-BRCA dataset, and their performances were investigated on external FBC and MBC cohorts. When applied to the male cohort, performance drops were observed in both models by a large margin, indicating that ERα and PR status in MBC cannot be predicted with confidence using attMIL models trained on FBC images. This disparity supports the growing recognition that male and female BC differs at many levels, including genetic, transcriptomic, and epigenetic^{3, 16–18,35}, and that these differences may have subtle histopathological manifestations.

In FBC, we showed that our ERα model achieved an AUROC of 0.86 during internal validation and was generalizable to the external FBC cohort. Previous research has suggested that AUROCs approaching 0.9 and exhibiting strong generalizability are highly discriminative^20,36−38. This standard of performance was achieved by the ERα prediction model in FBC. The prediction model for PR status did not perform to this standard, although PR was predictable during both internal and external validation with statistical significance. This could indicate that either the PR prediction model failed to learn to specifically focus on tumor tissue, or that the tissue architecture surrounding tumor regions could influence making a prediction of PR status. It is worth noting here that for both targets, our attMIL models were free of domain shift in all cohorts, and invariant to Macenko color normalization, as shown in Supplementary Figures S1-2.

To ensure the quality and sensitivity of the models towards their respective biomarkers, we conducted a quality control exercise by applying prediction models trained to detect ERα on PR-positive cases, and vice versa, in the external FBC validation cohort. Our approach was grounded on the hypothesis that a reliable biomarker prediction model should exhibit specificity by solely identifying the intended target and not detecting other biomarkers, irrespective of their subcellular localization of expression. In this regard, our results showed exquisite sensitivity; the ERα prediction model had poor power of discrimination in detecting PR status and the reverse was also true. Given that both ERα and PR are classified as nuclear receptors, it is plausible that a predictive model developed for one receptor could potentially identify the other receptor as well. However, this was not the case, providing further evidence for the ability of DL-based techniques to detect subtle morphological changes which are invisible to the human eye.

In both FBC and MBC, ERα and PR positivity is associated with favorable outcomes. ERα and PR negativity, on the other hand, tends to be associated with features of aggressive disease, e.g., poor differentiation, high degree of immune infiltration, and necrosis. We showed that the morphological features that returned the highest attention scores for positive or negative expression of ERα and PR were congruent with the existing pathology. This was true for both sexes. Our algorithm was robust against artefacts (e.g., folding, tearing, pathologists’ ink) in the WSIs, returning low attention scores for both ERα and PR prediction. However, we sporadically observed high attention scores being returned for morphological features external to the breast tissue, such as the skin edge.

We acknowledge that our study was limited by the lack of an MBC validation cohort. A further limitation of our study was not evaluating HER2 (human epidermal growth factor receptor 2), which is part of the clinical management workflow in BC. HER2 expression is quantified primarily by IHC with scores of 0/1+ (negative), 2+ (equivocal) and 3+ (positive). Cases with equivocal expression need to undergo fluorescent/bright-field in-situ hybridization assays (ISH) to confirm gene amplification, which then ultimately classifies these cases as positive or negative. While an important biomarker in BC, HER2 poses a challenge for DL-based predictions directly from H&E-based images. Its expression is seen in around 15% of women³⁹, and is especially rare in males (0–9%)¹. Furthermore, most FBC clinical datasets with HER2 data include equivocal cases that lack confirmatory ISH testing. Therefore, they introduce a degree of ambiguity in the ground truth. This is exacerbated in MBC due to the small number of cases that express HER2. Taking these challenges into account, testing the predictability of HER2 status in BC of either sex using DL-based techniques would require improved curation of datasets, large multi-centric cohorts, and multimodal approaches which takes both proteomic and genetic data into account.

To conclude, we showed that attMIL workflows have the potential to predict ERα status in FBC with accuracy levels that are clinically relevant, and notably, for the first time, that spatial resolution of attention scores is concordant with IHC staining patterns of both ERα and PR. However, attMIL-based prediction models trained on FBC images were ineffective when applied to MBC datasets. These results align with the growing recognition that sex can differentially influence the behavior of cancers in general, and breast cancer in particular^40,41. Our findings support previous evidence that male and female BC are different on many levels, and suggest that subtleties in BC tissue architecture that are invisible to the human eye but detectable by DL may also be sex specific.

Patient cohorts

This study is a retrospective analysis of digital images of anonymized archival tissue samples and was performed in accordance with the Declaration of Helsinki. Two cohorts of FBC patients were used: a training set from The Cancer Genome Atlas - Breast Cancer (TCGA-BRCA) dataset (n = 1085), followed by a combined validation set of FBC cases (n = 192) compiled from: Breast Cancer Now Tissue Bank (n = 58) and the Clinical Proteomic Tumor Analysis Consortium - Breast Cancer (CPTAC-BRCA) dataset (n = 134). For MBC, 6 cohorts were used, totaling n = 245 cases from: the Male Breast Cancer Consortium (n = 126), NHS Greater Glasgow and Clyde Biorepository (n = 40), NHS Grampian Biorepository (n = 21), Northern Ireland Biobank (n = 25)⁴², Wales Cancer Biobank (n = 10)⁴³, Breast Cancer Now Tissue Bank (n = 11), and TCGA-BRCA dataset (n = 12). After checking for data completeness, a total of n = 183 cases of MBC were suitable for inclusion.

Image preprocessing

All H&E-stained WSIs used in our analyses were preprocessed following the “Aachen protocol for deep learning histopathology”⁴⁴. All WSIs underwent tessellation into tiles with edge lengths of 256 µm, and pixel area of 224 px * 224 px with an effective resolution of 1.14 µm/px. Blurry tiles and tiles containing background were removed automatically using the canny edge detection technique within the OpenCV package in Python⁴⁵. These tiles were then color-normalized following the Macenko method to remove any bias arising from differences in staining between cohorts⁴⁶. We did not apply any manual annotations and our analysis was not restricted to the tumor region alone. All models were trained solely on the basis of slide-level target labels.

Experimental setup

Attention-based multiple instance learning (attMIL)^47,48 models were used to predict ERα and PR status in both female and male breast cancer patient samples. Models were trained on FBC H&E-stained WSIs from the TCGA-BRCA cohort (n = 1085) using biomarker-stratified five-fold cross-validation. A quarter of the patients in each training fold were reserved as a validation dataset to monitor overfitting during the training process. Trained models were externally validated on two cohorts: the external female breast cancer validation cohort (n = 192) and the male breast cancer cohort (n = 183).

Feature extraction and implementation of attMIL)

Feature vectors for images within the attMIL procedure were extracted using a ResNet50 trained via the RetCCL self-supervised learning (SSL) algorithm (https://github.com/Xiyue-Wang/RetCCL)^48,49. During training, model parameters were updated using the Adam optimizer⁵⁰ with 1% weight decay. Momenta and learning rates were scheduled using the “fit one cycle” procedure over a total of 32 epochs as made available in fastai (https://docs.fast.ai/callback.schedule.html)^51,52. The maximal learning rate was 1e-4. Over the first eight epochs, the learning rate sinusoidally increased from 1/25 of the maximum to the maximum and sinusoidally decreased to 1e-6 of the maximum over the remaining epochs. With the same modulation, the optimizer’s momentum was increased from 0.85 to 0.95 and returned to 0.85. The batch size used for updating model weights incrementally was 64 patients.

To implement attMIL, a fully connected layer followed by a Rectified Linear Unit (ReLU) were used to embed feature vectors in a 256-dimensional space. Then, these embedded vectors were passed through a linear layer to output a further 256-dimensional feature vector (h_k), where k is the index of each tile. The attention score (a_k) for the k-th tile was calculated as:

$${a}_{k}=\frac{exp\left\{{w}^{T} tanh\right(V{h}_{k}\left)\right\}}{{\sum }_{j = 1}^{K}exp\left\{{w}^{T} tanh\right(V{h}_{j}\left)\right\}}$$

(Eq. 1)

where h ∈ R²⁵⁶, V ∈ R^128x256, w ∈ R¹²⁸, and K is the maximum number of tiles resampled per epoch per patient. We used K = 512 tiles per patient. Then, MIL pooling operation was applied as follows:

$${h}_{sum }= {\sum }_{i = 1}^{K}{a}_{i}{h}_{i}$$

(Eq. 2)

where h_i is the i-th tile’s embedding. The final prediction score for each patient was obtained by passing each batch of h_sum values through a BatchNorm1D layer first, and then a Dropout layer with p = 50%. Then, h_sum values were passed through a fully connected layer with 2-dimensional output, followed by a softmax layer to obtain the final prediction scores.

The full experimental strategy is outlined in Fig. 5.

Explainability and biological validation with immunohistochemistry

For easy visualization of our prediction models, we generated spatially resolved heatmaps showing the distribution of attention and classification scores for each tile within each WSI, for each target. Feature vectors for 32 x 32-pixel fields were extracted from the WSI using the RetCCL algorithm⁴⁹. Attention and classification scores were calculated for each image region, and normalized within each patient cohort. Based on the resulting scores, heatmaps for each patient were generated with red indicating high attention score or positive classification and blue indicating low attention score or negative classification. Each heatmap was overlaid on its corresponding H&E WSI, allowing visual interpretation of underlying morphological features, correlating with classification types and high attention scores. We also matched classification heatmaps to immunohistochemically stained sections for ERα and PR from these cases.

Code availability

All code used in this article is open source and available at https://github.com/KatherLab/marugoto.

Statistics

The primary statistical endpoint for our analyses was the AUROC determined at patient-level. Since we only performed binary classification, AUROCs were identical for both “positive” and “negative” classes for each target. Therefore, we only reported AUROCs for “positive” classes within each target. Distribution of patient level prediction scores for each target was further visualized using density plots, which were also used to quantify domain shift between models trained and tested on normalized vs. unnormalized tiles. All statistical tasks were performed using Python 3.11 and R 4.3.0.

Ethical approval and consent to participate

The experiments in this study were carried out according to the Declaration of Helsinki and the International Ethical Guidelines for Biomedical Research Involving Human Subjects by the Council for International Organizations of Medical Sciences (CIOMS). The Ethics Board at the Medical Faculty of the Technical University of Dresden approved of the overall analysis in this study. The patient sample collection in each cohort was separately approved by the respective institutional ethics boards as follows: the Leeds (West) Research Ethics Committee (06/Q125/156), NHS Grampian Tissue Bank Committee (TR000292), Greater Glasgow Health Board (TR000269), Northern Ireland Biobank (NIB22-0007), Wales Cancer Biobank (22-005), and Breast Cancer Now Tissue Bank Access Committee (TR0249).

Data availability

All images included in the training set (n = 1085) are available at https://portal.gdc.cancer.gov/ and information about their hormone receptor status is available at https://www.cbioportal.org/. Part of the female breast cancer external validation set (n = 134) images and their associated clinical information are available at https://www.cancerimagingarchive.net/collections/. All other data are available from the principal investigators upon reasonable request.

Acknowledgements

We thank Dr Ehab Husain of the Department of Pathology, Aberdeen Royal Infirmary for critically reviewing the manuscript. The results are in part based on data generated by the TCGA Research Network (http://cancergenome.nih.gov/) and the Clinical Proteomic Tumor Analysis Consortium (https://proteomics.cancer.gov/data-portal). We received samples used in this research from the Northern Ireland Biobank which has received funds from the HSC Research and Development Division of the Public Health Agency in Northern Ireland. Biosamples were also obtained from the Wales Cancer Biobank which is funded by Health and Care Research Wales. Other investigators may have received specimens from the same subjects. This study was supported by the University of Aberdeen Development Trust through an Elphinstone Scholarship, and the Scottish Funding Council through a Saltire Emerging Researcher European Exchange Award (SC), Breast Cancer Now and NHS Grampian Endowments (VS), German Federal Ministry of Health (DEEP LIVERα, ZMVI1-2520DAT111), the Max-Eder-Programme of the German Cancer Aid (grant #70113864), and Gemeinsamer Bundesausschuss (Transplant.KI, 01VSF21048) (JNK). We are grateful to the NHS Grampian Biorepository, NHS Greater Glasgow and Clyde Biorepository, Northern Ireland Biobank, Wales Cancer Biobank, and Breast Cancer Now Tissue Bank for kindly providing the cases for this work. Finally, we are grateful to the patients and their families for allowing us to use their tissues in this work.

Competing interests

JNK declares consulting services for Owkin, France; DoMore Diagnostics, Norway and Panakeia, UK and has received honoraria for lectures by Eisai, Roche, MSD, and Fresenius. VS is one of the founders of the Breast Cancer Now Tissue Bank. No other potential conflicts of interest are reported by any of the authors.

Contributions

SC, VS, and JNK designed the study; SC, JMN, MvT, OLS, GPV, and DC performed the experiments; SC and JMN performed the statistical analysis; SC, JMN, CMLL, ZIC, RA-E, VS, and JNK wrote the original manuscript; RA-E, VS, and JNK supervised the study.

Fox, S., Speirs, V. & Shaaban, A. M. Male breast cancer: an update. Virchows Arch 480, 85–93 (2022). https://doi.org:10.1007/s00428-021-03190-7
Zheng, G. & Leone, J. P. Male Breast Cancer: An Updated Review of Epidemiology, Clinicopathology, and Treatment. J Oncol 2022, 1734049 (2022). https://doi.org:10.1155/2022/1734049
Chatterji, S. et al. Defining genomic, transcriptomic, proteomic, epigenetic, and phenotypic biomarkers with prognostic capability in male breast cancer: a systematic review. Lancet Oncol 24, e74-e85 (2023). https://doi.org:10.1016/S1470-2045(22)00633-7
Ferzoco, R. M. & Ruddy, K. J. The Epidemiology of Male Breast Cancer. Curr Oncol Rep 18, 1 (2016). https://doi.org:10.1007/s11912-015-0487-4
Gucalp, A. et al. Male breast cancer: a disease distinct from female breast cancer. Breast Cancer Res Treat 173, 37–48 (2019). https://doi.org:10.1007/s10549-018-4921-9
Cardoso, F. et al. Characterization of male breast cancer: results of the EORTC 10085/TBCRC/BIG/NABCG International Male Breast Cancer Program. Ann Oncol 29, 405–417 (2018). https://doi.org:10.1093/annonc/mdx651
Qiu, S. Q. et al. High hepatocyte growth factor expression in primary tumor predicts better overall survival in male breast cancer. Breast Cancer Res 22, 30 (2020). https://doi.org:10.1186/s13058-020-01266-x
Andre, S. et al. Male breast cancer: Specific biological characteristics and survival in a Portuguese cohort. Mol Clin Oncol 10, 644–654 (2019). https://doi.org:10.3892/mco.2019.1841
Yadav, S. et al. Male breast cancer in the United States: Treatment patterns and prognostic factors in the 21st century. Cancer 126, 26–36 (2020). https://doi.org:10.1002/cncr.32472
Sas-Korczynska, B. et al. The biological markers and results of treatment in male breast cancer patients. The Cracow experience. Neoplasma 61, 331–339 (2014). https://doi.org:10.4149/neo_2014_043
Leone, J. et al. Tumor subtypes and survival in male breast cancer. Breast Cancer Res Treat 188, 695–702 (2021). https://doi.org:10.1007/s10549-021-06182-y
Fonseca, R. R., Tomas, A. R., Andre, S. & Soares, J. Evaluation of ERBB2 gene status and chromosome 17 anomalies in male breast cancer. Am J Surg Pathol 30, 1292–1298 (2006). https://doi.org:10.1097/01.pas.0000213354.72638.bd
Vermeulen, J. F., Kornegoor, R., van der Wall, E., van der Groep, P. & van Diest, P. J. Differential expression of growth factor receptors and membrane-bound tumor markers for imaging in male and female breast cancer. PLoS One 8, e53353 (2013). https://doi.org:10.1371/journal.pone.0053353
Humphries, M. P. et al. Characterisation of male breast cancer: a descriptive biomarker study from a large patient series. Sci Rep 7, 45293 (2017). https://doi.org:10.1038/srep45293
Lukasiewicz, S. et al. Breast Cancer-Epidemiology, Risk Factors, Classification, Prognostic Markers, and Current Treatment Strategies-An Updated Review. Cancers (Basel) 13 (2021). https://doi.org:10.3390/cancers13174287
Severson, T. M. et al. Characterizing steroid hormone receptor chromatin binding landscapes in male and female breast cancer. Nat Commun 9, 482 (2018). https://doi.org:10.1038/s41467-018-02856-2
Shaaban, A. M. et al. A comparative biomarker study of 514 matched cases of male and female breast cancer reveals gender-specific biological differences. Breast Cancer Res Treat 133, 949–958 (2012). https://doi.org:10.1007/s10549-011-1856-9
Kornegoor, R., van Diest, P. J., Buerger, H. & Korsching, E. Tracing differences between male and female breast cancer: both diseases own a different biology. Histopathology 67, 888–897 (2015). https://doi.org:10.1111/his.12727
Brierley, J. D., Gospodarowicz, M. K. & Wittekind, C. TNM Classification of Malignant Tumours. (John Wiley and Sons, 2017).
Shmatko, A., Ghaffari Laleh, N., Gerstung, M. & Kather, J. N. Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. Nat Cancer 3, 1026–1038 (2022). https://doi.org:10.1038/s43018-022-00436-4
Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V. & Madabhushi, A. Artificial intelligence in digital pathology - new tools for diagnosis and precision oncology. Nat Rev Clin Oncol 16, 703–715 (2019). https://doi.org:10.1038/s41571-019-0252-y
Heinz, C. N., Echle, A., Foersch, S., Bychkov, A. & Kather, J. N. The future of artificial intelligence in digital pathology - results of a survey across stakeholder groups. Histopathology 80, 1121–1127 (2022). https://doi.org:10.1111/his.14659
Echle, A. et al. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br J Cancer 124, 686–696 (2021). https://doi.org:10.1038/s41416-020-01122-x
Cifci, D., Foersch, S. & Kather, J. N. Artificial intelligence to identify genetic alterations in conventional histopathology. J Pathol 257, 430–444 (2022). https://doi.org:10.1002/path.5898
Ghaffari Laleh, N., Ligero, M., Perez-Lopez, R. & Kather, J. N. Facts and Hopes on the Use of Artificial Intelligence for Predictive Immunotherapy Biomarkers in Cancer. Clin Cancer Res 29, 316–323 (2023). https://doi.org:10.1158/1078-0432.CCR-22-0390
Naik, N. et al. Deep learning-enabled breast cancer hormonal receptor status determination from base-level H&E stains. Nat Commun 11, 5727 (2020). https://doi.org:10.1038/s41467-020-19334-3
Couture, H. D. et al. Image analysis with deep learning to predict breast cancer grade, ER status, histologic subtype, and intrinsic subtype. NPJ Breast Cancer 4, 30 (2018). https://doi.org:10.1038/s41523-018-0079-1
Gamble, P. et al. Determining breast cancer biomarker status and associated morphological features using deep learning. Commun Med (Lond) 1, 14 (2021). https://doi.org:10.1038/s43856-021-00013-3
Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat Med 25, 1054–1056 (2019). https://doi.org:10.1038/s41591-019-0462-y
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med 24, 1559–1567 (2018). https://doi.org:10.1038/s41591-018-0177-5
Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat Cancer 1, 800–810 (2020). https://doi.org:10.1038/s43018-020-0085-8
Ilse, M., Tomczak, J. & Welling, M. in Proceedings of the 35th International Conference on Machine Learning Vol. 80 (eds Dy Jennifer & Krause Andreas) 2127–2136 (PMLR, Proceedings of Machine Learning Research, 2018).
Kleppe, A. Area under the curve may hide poor generalisation to external datasets. ESMO Open 7, 100429 (2022). https://doi.org:10.1016/j.esmoop.2022.100429
Ibrahim, A. et al. Artificial intelligence in digital breast pathology: Techniques and applications. Breast 49, 267–273 (2020). https://doi.org:10.1016/j.breast.2019.12.007
Humphries, M. P. et al. A Case-Matched Gender Comparison Transcriptomic Screen Identifies eIF4E and eIF5 as Potential Prognostic Markers in Male Breast Cancer. Clin Cancer Res 23, 2575–2583 (2017). https://doi.org:10.1158/1078-0432.CCR-16-1952
Echle, A. et al. Clinical-Grade Detection of Microsatellite Instability in Colorectal Tumors by Deep Learning. Gastroenterology 159, 1406–1416 e1411 (2020). https://doi.org:10.1053/j.gastro.2020.06.021
Mandrekar, J. N. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol 5, 1315–1316 (2010). https://doi.org:10.1097/JTO.0b013e3181ec173d
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med 25, 1301–1309 (2019). https://doi.org:10.1038/s41591-019-0508-1
Wolff, A. C. et al. Human Epidermal Growth Factor Receptor 2 Testing in Breast Cancer: American Society of Clinical Oncology/College of American Pathologists Clinical Practice Guideline Focused Update. Arch Pathol Lab Med 142, 1364–1382 (2018). https://doi.org:10.5858/arpa.2018-0902-SA
Rubin, J. B. The spectrum of sex differences in cancer. Trends Cancer 8, 303–315 (2022). https://doi.org:10.1016/j.trecan.2022.01.013
Dong, M. et al. Sex Differences in Cancer Incidence and Survival: A Pan-Cancer Analysis. Cancer Epidemiol Biomarkers Prev 29, 1389–1397 (2020). https://doi.org:10.1158/1055-9965.EPI-20-0036
Lewis, C. et al. The Northern Ireland biobank: A cancer focused repository of science. Open J Bioresour 5 (2020). https://doi.org:10.5334/OJB.47
Parry-Jones, A. & Spary, L. K. The Wales Cancer Bank (WCB). Open J Bioresour 5 (2018). https://doi.org:10.5334/ojb.46
Muti, H. S. The Aachen Protocol for Deep Learning Histopathology: A hands-on guide for data preprocessing., 2020).
Ghaffari Laleh, N. et al. Benchmarking weakly-supervised deep learning pipelines for whole slide classification in computational pathology. Med Image Anal 79, 102474 (2022). https://doi.org:10.1016/j.media.2022.102474
Macenko, M. et al. A method for normalizing histology slides for quantitative analysis. 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 1107–1110 (2009).
Seraphin, T. P. et al. Prediction of heart transplant rejection from routine pathology slides with self-supervised Deep Learning. medRxiv, 2022.2009.2029.22279995 (2022). https://doi.org:10.1101/2022.09.29.22279995
Saldanha, O. L. et al. Self-supervised attention-based deep learning for pan-cancer mutation prediction from histopathology. NPJ Precis Oncol 7, 35 (2023). https://doi.org:10.1038/s41698-023-00365-0
Wang, X. et al. RetCCL: Clustering-guided contrastive learning for whole-slide image retrieval. Med Image Anal 83, 102645 (2023). https://doi.org:10.1016/j.media.2022.102645
Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. arXiv[cs.LG] (2014). https://doi.org:https://doi.org/10.48550/arXiv.1412.6980
Smith, L. N. & Topin, N. Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates. arXiv[cs.LG] (2018). https://doi.org:https://doi.org/10.48550/arXiv.1708.07120
Smith, L. N. in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 464–472.

There is a conflict of interest JNK declares consulting services for Owkin, France; DoMore Diagnostics, Norway and Panakeia, UK and has received honoraria for lectures by Eisai, Roche, MSD, and Fresenius. VS is one of the founders of the Breast Cancer Now Tissue Bank. No other potential conflicts of interest are reported by any of the authors.

SupplementaryFiguresandtable.docx

Download PDF

Journal Publication

published 08 Nov, 2023

Read the published version in npj Breast Cancer →

Editorial decision: revise
30 Jun, 2023
Review #2 received at journal
28 Jun, 2023
Review #1 received at journal
21 Jun, 2023
Reviewer #2 agreed at journal
16 Jun, 2023
Reviewer #1 agreed at journal
05 Jun, 2023
Reviewers invited by journal
03 Jun, 2023
Submission checks completed at journal
02 Jun, 2023
First submitted to journal
31 May, 2023
Unknown event
31 May, 2023
Editor assigned by journal
29 May, 2023

You are reading this latest preprint version

Deep learning models predicting hormone receptor status in breast cancer trained on females do not generalize to males: further evidence of sex-based disparity in breast cancer

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Results

attMIL models can predict ERα and PR status from H&E WSIs in FBC

Prediction models trained on FBC images do not generalize to MBC

attMIL model predictions for ERα and PR are validated by immunohistochemistry in FBC

Tissue architectures with highest attention scores are concordant with receptor expression profiles in both sexes

attMIL-based prediction models are invariant to color normalization and do not exhibit domain shift

Discussion

Methods

Patient cohorts

Image preprocessing

Experimental setup

Feature extraction and implementation of attMIL)

Explainability and biological validation with immunohistochemistry

Code availability

Statistics

Declarations

Ethical approval and consent to participate

Data availability

Acknowledgements

Competing interests

Contributions

References

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1