1. Patients
We collected the clinicopathological data and pathological images of preoperative core needle biopsy of 4038 female invasive breast cancer patients in the Fourth Hospital of Hebei Medical University from January 2015 to December 2018. Additionally, the clinicopathological data and whole slide imagin (WSIs) of 190 female invasive breast cancer patients from four medical centers in Hebei Province were collected for external validation of the proposed method. The inclusion criteria were as follows: 1) three experienced pathologists confirmed that all breast biopsy specimens were invasive breast cancer; 2) no neoadjuvant treatment (NAT) pre-operation was performed; 3) histopathology and immunohistochemistry were used to postoperatively confirm lymph node metastasis; and 4) complete clinical pathological data was obtained. The exclusion criteria were as follows: 1) microinvasive carcinoma (invasive lesions < 1 mm); 2) special types of invasive carcinoma; 3) poor/blurred scanned pathological image quality; 4) preoperative treatment (NAT, chemotherapy, radiotherapy and chemotherapy, ablation, etc.); and 5) incomplete clinical pathological data. Finally, 3701 patients were selected for this study.
Patients’ clinicopathological data of biopsy tissues were collected and evaluated, including age, menopausal status, tumor size, histological grade,nuclear atypia, mitosis counts, tumor-infiltrating lymphocytes (TILs), histological grade, ER (estrogen receptor) status, PR (progesterone receptor) status, HER2 (human epidermal growth factor receptor 2 ) status, lymph node metastasis postoperatively.
2. Pathological evaluation
Histological grading was based on the World Health Organization classification of breast tumors (5th Edition)[28] and the Nottingham grading system. All cases were classified as grade I, grade II, or grade III. TILs evaluation criteria: area occupied by mononuclear inflflammatory cells over total stromal area[29–30]. More than 1% of positive tumor cell nuclei are considered hormone receptor-positive for ER and PR. IHC (Immunohistochemistry) score of 3 + or FISH (Fluorescence in situ hybridization) amplification was defined HER2 positivity. All cases divided into three subtypes: luminal (hormone receptor-positive, including luminal A and luminal B), HER2 over-expression (hormone receptor negative, HER2 positive), and triple negative breast carcinoma (both hormone receptor and HER2 negative, TNBC).
3. Structure and standardization of the data.
Clinicopathological parameters were extracted from this report using a text pattern-matching algorithm. For the categorical variables, the LabelEncoder function in the scikit-learn package was used to encode the target categorical variables into numerical variables. Thus, our algorithm generated structured data for each patient. Multivariate imputation via chained equations was applied to impute missing data[31]. Color normalization was performed on all scales of histopathological images using an enhanced cycle-consistent generative adversarial network[32].
4. Data partitioning, image preprocessing, and data augmentation.
The dataset was stratified at the patient level and randomly divided into training (60%), validation (20%), and test (20%) sets. Given the gigantic size (typically 130,000 × 50,000 pixels) of a WSI, the WSIs were tiled into 512 × 512 patches in the form of a grid for subsequent processing. In this study, three magnification scales (5×, 10×, and 20×) were explored, under which tiling was performed[33]. The threshold of overlap varied among different magnifications. Data augmentation was applied to the patches during the training process to improve the generalization.
5. Development, validation and interpretation of the model.
MIL-based representation of WSI.
Each WSI was tiled into patches, and the prediction of lymph node metastasisv (LNM) relies on the entire Region of Interest(ROI) of WSIs instead of individual patches[34]. EfficientNet[35] pre-trained on the ImageNet dataset[36] was applied to extract patch-level features, and attention layers on the instance-level and feature-level were applied as the WSI modality network backbone.
Tabular learning-based representation of the clinicopathological parameters.
We adopted an attentive interpretable tabular learning network, TabNet[25], to generate a representation of the clinicopathological parameters. The network employed sequential attention on features for inference in each decision step and learned the salient features from the structured clinicopathological parameters.
Integrating the representation of WSI and clinicopathological parameters.
Deep learning, as a form of representation learning, transforms raw data into a suitable representation for pattern recognition in specific tasks[37].We developed a new multi-modal multi-instance (MMMI) fusion module comprising multi-modal joint instance aggregate learning and global-aware instance aggregation. The representation of WSIs and clinicopathological parameters were input to the module and embedded as the global multi-modal feature, which was used to guide the learning process of each modality in turn.
Model training and testing.
Because WSIs in the MIL method have a variable patch number, the model was designed to accept different instance numbers as input.Label smoothing was used to prevent the model from learning the label-related bias. A weighted sampling method was applied to the distributed training to achieve an imbalanced distribution of samples across the four categories. The final loss was computed as follows:
$$\mathcal{ℒ}=-\sum _{\text{i}=1}^{\text{n}}\left\{\left(1-{\epsilon }\right)\left[-\sum _{\text{y}=1}^{\text{K}}\text{p}\left(\text{y}|{\text{x}}_{\text{i}}\right)\text{log}{\text{q}}_{{\theta }}\left(\text{y}|{\text{x}}_{\text{i}}\right)\right]+{\epsilon }\left[-\sum _{\text{y}=1}^{\text{K}}\text{u}\left(\text{y}|{\text{x}}_{\text{i}}\right)\text{log}{\text{q}}_{{\theta }}\left(\text{y}|{\text{x}}_{\text{i}}\right)\right]\right\}$$
where \({\text{q}}_{{\theta }}\left(\text{y}\right|{\text{x}}_{\text{i}})\)denotes the predicted likelihood from the model for sample \({\text{x}}_{\text{i}}\), \(\text{n}\) is the number of samples, \(\text{K}\) is the number of candidate labels, and \({\epsilon }\in \left[\text{0,1}\right]\) is a weight factor. In practice, \(\text{u}\left(\text{y}|{\text{x}}_{\text{i}}\right)\) is not dependent on data; thus, we set \(\text{u}\left(\text{y}|\text{x}\right)= \frac{1}{\text{K}}\).
Feature importance.
Both MIL and tabular methods are based on the attention mechanism. We investigated the feature importance based on the learned weights of the instances in the MIL and the features of the clinicopathological parameters after the joint learning process.
6. Statistical analysis
The area under the receiver operating characteristic (ROC) curve was calculated using the pROC in R (version 3.6.1), and the Delong test was applied to compare ROC curves. Cutpointr was used to estimate the optimal cutoff points of the ROC curves. The Wilcoxon rank-sum test was used to compare the signatures. Pearson correlation coefficients were used for the correlation analysis.