Patients
This study was approved by the Ethics Committee of The affiliated Changzhou Second People’s Hospital of Nanjing Medical University (authorization number: [2020]KY154-01). Patients receiving complete NAC from January 2015 to December 2021 were selected for retrospective analysis. All eligible patients met the following inclusion criteria: (1) breast unilateral primary breast cancer with no distant metastasis as confirmed by biopsy; (2) pCR was confirmed by postoperative pathological examination; and (3) no previous treatment except NAC and no history of other malignant tumors. The exclusion criteria were as follows: (1) poor ultrasound image quality or large tumor such that the tumor boundary could not be fully displayed; (2) incomplete clinical or imaging data; and (3) multiple lesions of ipsilateral mammary glands or lesions of both mammary glands. A total of 211 patients were enrolled, aged 28–85 years, with an average age of 55.59 ± 11.77 years. All of the included patients with breast cancer had received 4–8 cycles of NAC before surgery. The NAC regimen combined taxane, anthracycline, or taxane with anthracycline, while Her2-positive patients received targeted therapy (an initial dose of 8 mg/kg, followed by a maintenance dose of 6 mg/kg).
Ultrasound examinations and image interpretation
All imaging was performed by physicians with more than 5 years of experience in breast ultrasound diagnosis using a Philips EPIQ5, IU22 (Philips, the Netherlands), Esaote Mylabe Twice (Esaote, Italy) with 7–12 MHz high-frequency linear array probe. Before data collection, the breast ultrasound mode was selected and the machine was uniformly set up. During the examination, the patient remained in the supine position with her arms raised to fully expose the breast. All quadrants of both breasts were scanned, with a focus on the lesion area. The static images of the tumor in the longest axis were selected and stored in DICOM format for subsequent evaluation and analysis. Ultrasound examinations were completed in all patients within 2 weeks before NAC treatment.
Images were analyzed by two physicians with 3 and 10 years of experience in breast ultrasound according to the US BI-RADS [24]. In cases of disagreement, the conclusion was reached by a physician with 20 years of experience in breast ultrasound diagnosis to eliminate the differences between the two sonographers. No physician was aware of the clinical data before the image analysis. The following nine ultrasound features were evaluated for each mass: maximum diameter, location (outer upper quadrant, outer lower quadrant, inner upper quadrant, inner lower quadrant, or posterior nipple), shape (round, oval, or irregular), growth direction (parallel or vertical), echo (hypoechoic or heterogeneous), boundary (clear, blurred, angular, differential blade or burr), calcification (positive or negative), posterior echo (no attenuation or attenuation), and color Doppler flow (positive or negative).
Pathological evaluation and clinical data collection
The preoperative tumor biopsy tissues and postoperative specimens were made into slides. Two pathologists with more than 10 years of experience and no knowledge of the clinical data diagnosed the pathological tissues, and the samples with differences in evaluation were discussed repeatedly until a consensus was reached. The efficacy of NAC was evaluated according to the American Joint Committee on Cancer eighth edition cancer staging manual. pCR was defined as no residual invasive cancer tissue or only residual carcinoma in situ after pathological examination of the breast and lymph nodes, while non-pCR was defined as residual infiltrating cancer tissue after pathological examination of the breast and lymph nodes [25].
Clinical data included age, clinical stage, estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor2 (HER2), tumor expression of the proliferation antigen (Ki67) index, ultrasonic BI-RADS grading, and two-dimensional ultrasonic characteristics. The ER, PR, and HER2 status and the Ki67 index were assessed by immunohistochemistry (IHC). Positive ER/PR was defined as at least 1% invasive tumor cells as detected by IHC. IHC HER2 status was defined as positive when IHC (3+) or IHC (2+) was amplified by fluorescence in situ hybridization (FISH). Contrary to the above data, is defined as negative. Ki67 index defined high expression (≥ 20%) and low expression (< 20%).
Image segmentation and feature extraction
DICOM images were imported into ITK-SNAP software, and regions of interest (ROIs) were manually drawn along the tumor contour on the ultrasound images by two experienced ultrasound diagnostic physicians (1, 2) who were unaware of the pathological results. In cases of disagreement surrounding the segmented ROIs, another senior superficial ultrasound diagnosis-experienced physician participated in the discussion until a consensus was reached. The segmentation effect is shown in Fig. 1. Intra-class correlation coefficients (ICC) were used to assess intra-observer and inter-observer consistency to ensure repeatability of radiomic feature extraction. First, 20 ultrasound images were randomly selected from the training set to assess the inter-observer repeatability. Second, 1 week later, the images were repeatedly segmented by ultrasound diagnostic physicians to evaluate the intra-observer repeatability; only features with an ICC > 0.8 were selected for further analysis.
Based on the Pyradiomics (V3.0.1) package in the Python3.6 environment, manually defined radiomics features were extracted for each segmented image (https://pyradiomics.readthedocs.io/en/latest/)[27]. Radiomics features included first-order statistic feature, two-dimensional shape feature, texture feature, and small wave feature, which were divided into two dimensions and four frequency bands (HH, HL, LH, LL).
ResNet50[28] was used as the basic model for deep learning feature extraction, and pre-training was conducted on the large-scale, well-annotated ImageNet dataset to automatically learn the differences between image features. The global maximum pooling layer was used to obtain the maximum value of the feature graph at each layer to convert the feature graph to the original value. All deep learning programs were implemented under the TensorFlow framework using an Intel Core i7-11700F processor and NVIDIA GeForce RTX 3060GPU. Its over-all architecture is shown in Fig. 2 .
Data pre-processing
The original data were randomly grouped in the ratio of 7:3 using the model_selection module in the sklearn library. Among the patients, 147 patients (including 54 pCR patients) were included in the training set for training and adjusting the model, and 64 patients (24 pCR patients) were included in the validation set for verifying the stability of the model. Z-score normalization of the overall data was conducted to convert the feature data of different orders of magnitude into the same order of magnitude to ensure the comparability between features and facilitate the subsequent application of screening algorithms.
$$Z=\frac{x-\stackrel{-}{x}}{s}$$
1
where x represents initial data, x ̅ represents the average number, and s represents the standard deviation.
Radiomics feature selection and radiomics score construction
The random forest-based recursive elimination algorithm and the least absolute shrinkage and selection operator (LASSO) of 10-fold cross validation were used to conduct dimensionality reduction in the training set; the dataset with the least cross-validated binomial bias was selected as the optimal feature set. The radiomics score (Rad-Score) and deep learning score (DL-Score) were constructed to achieve the best model.
$$Rad-Score (DL-Score) =intercept+\sum _{n=1}^{features}\left({feature}_{n}{Coef}_{n}\right)$$
2
where the intercept is the intercept obtained after fitting the training set data using the LASSOCV model, feature is the feature filtered by LASSO, Coef is the characteristic regression coefficient, and features denote the number of filtered features.
The machine learning classifier used in this study is Logistic Regression (LR), which avoids overfitting by choosing L2 regularization. As the samples of the pCR and NpCR groups were not balanced, we used the class_weight = “balanced” attribute to balance the weight of the two types of samples. The receiver operating characteristic curve (ROC), area under the curve (AUC), accuracy, specificity, sensitivity, positive predictive value, and negative predictive value were calculated to evaluate the performance of the model. The 95% confidence interval (CI) of the AUC was obtained through 1000 re-sampling, and the Delong test was used to compare the differences between different models of AUC. The calibration capability of the model was checked by a calibration curve. Finally, decision curve analysis (DCA) was used to calculate the standard net income under the probability of 0–1 threshold to evaluate the clinical value of the model.
Statistics analysis
Statistical analysis was performed using SPSS 23.0 (IBM SPSS) software. Quantitative data conforming to a normal distribution are expressed as the mean ± SD, and independent sample t-test was used for comparison between groups. Data that did not conform to a normal distribution were represented by M(Q1, Q3), and the Mann–Whitney U test was used for comparison between groups. Qualitative data are represented as examples, and the Chi-square test was used for comparison between groups. P-values < 0.05 were considered to indicate a significant difference.