Breast cancer is the second largest form of cancer overall, and the leading malignancy among women. Breast cancers are those that begin in the breast itself, typically in the lobules that produce the milk that flows via the ducts. As the world's second-largest frequent non-skin carcinoma among women (after lung cancer) and the fifth leading reason of cancer death worldwide, accounting for 10.4% of all cancer diagnoses among women[11]. 13.5% (178361) of all cancer cases and 10.6% (90408) of cancer fatalities were attributable to breast cancer in India in 2020[41]. DNA and RNA in cancer cells are identical to, but not identical to, those of the organism from which they came. As a result, the immune system, especially if compromised, does not typically recognize them.
Cancer forms when either the immune system isn't functioning properly or when there are too many cells for the immune system to destroy. An unfavorable surrounding (due to radiation, toxins, etc.), bad food (unhealthy cell environment), individuals with genetic mutation predispositions, and people of old age (over 80) can all contribute to an abnormally high mutation rate in DNA and RNA[41], [42]. Several distinct malignancies can arise in the various breast tissues. Benign (not malignant) breast alterations cause the vast majority of tumours. Fibrocystic change is a noncancerous condition that can affect women and is characterised by cysts (collections of fluid), fibrosis (the production of scar-like connective tissue), lumpiness, areas of thickening, discomfort, or breast pain[43].
3.2 Techniques of screening for Breast Cancer
Methodological, clinical, and ethical difficulties arise when attempting to evaluate screening techniques, especially in a community setting.
The best way to evaluate a novel screening test is through randomised clinical trials. Women who were randomly assigned to receive a novel screening test for breast cancer were compared with women who were given the standard of care.
However, it is challenging to carry out such trials. They need to track down tens of thousands of women for a minimum of 15 years. It is likely to be even more challenging to establish the added efficacy of new tests, given that the use of mammography in health found to be successful in some instances. Last but not least, it may be harder to determine screening's impact on Breast Cancer death rates given breast cancer treatment has improved through time[47]–[49].
Because of these obstacles, researchers generally focus on characterizing new screening assays before determining their impact on patient outcomes like breast cancer mortality. Sensitivity, specificity, safety, affordability, simplicity, patient and clinician acceptability, and so on are all crucial test properties. We summarise the research done on the various new testing modalities and the results. We also detail the research methodology and outcomes evaluated for each screening tool. Although cost-effectiveness analyses should be considered when contemplating a population-based diagnostic exam, they are not.
If a screening test is going to be utilized in a community environment, it needs to be evaluated there first.
Women having a greater than average chance of breast cancer, or those who present to a diagnostic setting with breast complaints or called anomalies in the breasts, are typically used to evaluate the test characteristics of new modalities. It's possible that a test's demonstrate significant levels of sensitivity and specificity in these high-risk female patients will differ from those reported when the same test is utilised in a general screening sample[50]. In this way, you can see whether or not a test has been studied for diagnosis, and if it has been studied for screening, whether or not the study was conducted on women who were deemed to be at an elevated risk, or on women who were not[51].Various Breast Cancer Screening Techniques are shown in Fig. 20.
3.2.1 Mammography
Common methods of breast cancer screening include self-examination, a medical professional's examination, and mammography detection[46], [52]. Digital mammograms have become the standard for diagnosing breast cancer. Instead of using X-rays, digital mammograms employ detectors made of solid state that transform X-rays into electrical impulses. Full-field digital mammography (FFDM) is other name for this technique. All digital cameras use the same kinds of detectors. Images of the breasts, created using electrical impulses, can then be shown[53].
In order to interpret mammograms, experts develop computer-aided detection (CAD) systems. Mammograms are typically interpreted by CAD systems, which then flag suspicious areas for further examination by a radiologist[54].
Faster R-CNN was the target of a proposed CAD program by Ribli et al[55]. This software can automatically determine if a tumor on a mammogram is malignant or benign. An end-to-end strategy for mammographic diagnostics was proposed by Wang et al[56]. The need for manual preparation was avoided by using this method. In one scenario, a new method centered on both a Multi-Instance (MI) and Multi-Scale (MS) system was introduced for the treatment of mammograms.
The MS section chooses the most essential characteristics of mammograms, while the MI module considers the big picture. The improved outcomes come from combining the output of various components.
By analyzing global mammographic picture characteristics, Heidari et al[57]. established a novel CADx technique. This study suggests that an advanced CADx mammography is feasible to develop an efficient worldwide method of processing images. This method is a significant improvement over its predecessors. Thermographic breast cancer screening was accomplished by Ekici et al[56]. using a CNN(Convolutional Neural Network).
Data collection, image processing, feature extraction, feature segmentation, and feature classification were the five methods used. In terms of accurate predictions, CNN is superior to other methods[58].
Table 2
Review of the researches based on mammography datasets
Dataset | Author | DL Technique | Performance Measure | Year |
INbreast CBIS-DDSM | Shu et al.[59] | CNN | INbreast: Accuracy = 92.2% CBIS: Accuracy = 76.7% | 2020 |
DDSM | Li et al.[60] | CNN-RNN (Recurrent Neural Network) | AUC = 0.968 Accuracy = 94.7%, Recall = 94.1% | 2021 |
MIAS | Agnes et al.[61] | Multiscale All CNN | Accuracy = 96.47% | 2020 |
DDSM | Boumaraf et al.[62] | DBN (Deep Belief Network) | Accuracy = 84.5% | 2020 |
MIAS | Zhang et al.[63] | GNN (Graph Neural Network) + CNN | Accuracy = 96.1% | 2021 |
INbreast DDSM-BCRP | Zhu et al.[64] | FCN + CRF | DDSM-BCRP: Dice = 91.3% INbreast: Dice = 90.97% | 2018 |
MIAS CBIS-DDSM | Ahmed et al.[65] | DeepLab/mask RCNN | Mask RCNN: C: Accuracy = 98% S: MAP = 80% DeepLab: C: Accuracy = 95% S: MAP = 72% | 2020 |
MIAS | Saber et al.[66] | Transfer learning/CNN | F-score = 99.3% Accuracy = 98.87% | 2021 |
MIAS CBIS-DDSM Inbreast | Soleimani et al.[67] | CNN | INbreast: Dice = 96.39% CBIS: Dice = 97.69% MIAS: Dice = 97.59% | 2020 |
INbreast CBIS-DDSM | Chen et al.[68] | Modified U-Net | CBIS: Dice = 82.16% INbreast: Dice = 81.64% | 2020 |
INbreast DDSM | Al-antari et al.[69] | YOLO | S: INbreast: F1-score = 98.02% DDSM: F1-score = 99.28% C: DDSM: Accuracy = 97.5% INbreast: Accuracy = 95.32% | 2020 |
Figure 21 presents some mammography images of breast cancer which are taken from kaggle subset dataset of Curated Breast Imaging DDSM Dataset.
3.2.2 Ultrasound
When it comes to finding malignancies, ultrasound is far superior to other methods, and it helps cut down on unnecessary biopsies[71]. Thus, it comes as no surprise that scientists employ such images in DL models for the detection of cancer
A GoogleNet[23]-based CNN, for instance, has been educated on the potentially malicious ROIs of Photographs of the United States. The AUC for the suggested method in[78] is 96%, which is 6% better than the CAD-based method using manually generated features[72]–[74].
Datasets of US pictures are scarcer and often comprise fewer photos than mammography datasets. As a result, most suggested DL models employ some type of data augmentation approach, such as rotation, to expand the number of training data and enhance the working of the model. However, care must be taken when enhancing US images, as doing so incorrectly can reduce the model's accuracy. It has been demonstrated, for instance, that conducting the shifting or turning of the image along the horizontal axis might have detrimental effects on the performance of the model[74]. Synthetic ultrasound (US) images, with or without tumors, can be generated with the use of generative adversarial networks (GANs). The accuracy of the model can be enhanced by including these pictures in the training set.
Some techniques have integrated the processes of categorization and lesion detection in ultrasound images[75]. U.S. Image Detection and Classification Using a Variety of DL Architectures are compared in depth[76]. Accuracy levels of 85% for entire picture classification and 87.5% for pre-defined ROIs demonstrate that the DenseNet is a promising option for US image classification study. The authors in[77] trained VGG16, ResNet34, and GoogleNet on a dataset of 1000 unlabelled US pictures to create a weakly supervised DL system. They found an AUC of 88% on average.
Some studies verify DL algorithm[78]–[80] performance with expert inference, demonstrating DL algorithms' usefulness to radiologists. Most commonly, this occurs after an expert has recognized a lesion and uses a DL model to categorise it. In contrast to mammography research, however, the majority of these studies do not demonstrate the generalizability of their approach on numerous datasets or undergo independent physician validation.
Table 3
Review of the researches based on ultrasound datasets
Dataset | Year | Author | DL Technique | Performance measure |
OASBUD | 2019 | Byra et al.[81] | Transfer learning on InceptionV3 and VGG-19 | VGG19: AUC = 0.822 InceptionV3: AUC = 0.857 |
SNUH BUSI | 2020 | Moon et al.[82] | ResNet+ VGGNet + DenseNet (Ensemble loss) | SNUH: Accuracy = 91.1% AUC = 0.9697 BUSI: Accuracy = 94.62% AUC = 0.9711 |
BUSI | 2020 | Vakanski et al.[83] | CNN | Accuracy = 98% Dice score = 90.5% |
Mendeley UDIAT | 2020 | Singh et al.[84] | CNN | UDIAT: Dice = 86.82% Mendeley: Dice = 0.9376 |
1-Ultrasoundcases.info and BUSI 2- UDIAT 3- Radiopaedia | 2021 | Wang et al.[85] | Residual Feedback Network | 1-Dice = 86.91% 2-Dice = 81.79% 3-Dice = 87% |
Private | 2019 | Byra et al.[74] | VGG 19 by Transfer learning | AUC = 0.936 |
Ultrasoundcases.info BUSI STUHospital | 2021 | Wang et al.[86] | CNN | BUSI: Dice = 83.76% Ultrasoundcases: Dice = 84.71% STUHospital: Dice = 86.52% |
Private | 2020 | Fujioka et al.[87] | CNN | AUC = 0.87 |
Private | 2020 | Wu et al.[88] | Random Forest | Accuracy = 86.97% |
Private | 2020 | Gong et al.[89] | Multi-view Deep Neural Network Support Vector Machine (MDNNSVM) | AUC = 0.908 Accuracy = 86.36% |
Private | 2022 | Byra et al.[90] | Y-Net | S: Dice = 64.0% C: AUC = 0.87 |
Some ultrasound images shown in Fig. 22 of breast cancer classification which are taken from kaggle dataset.
Benign image Malignant image normal image
Figure 22 Images of ultrasound[91].
3.2.3 Magnetic Resonance Imaging
In MRI, DL is most often utilized to perform or assist with the categorization, detection, and segmentation of breast lesions, much like DM, DBT, and US. However, the dimensionality of an MRI scan sets it apart from these other modalities. MRI generates 3D scans, while DM, DBT, and US only generate 2D images. Dynamic contrast-enhanced (DCE) MRI sequences, which track temporal changes such as the introduction and removal of contrast agents, take MRI to a fourth dimension. Most DL models built outside of the medical field are geared for working with 2D images, which can cause problems when applied to these 3D or 4D MRI scans.
Several potential answers to this problem have been offered. The most frequent way is to transform the 3D images to 2D so that regular 2D DL models may be used. Both the maximum intensity projection (MIP) and slicing the 3D MR image into 2D slices can be used for this purpose. Many of the industry-standard DL models, however, were originally built for color images, such as those with three channels for red, green, and blue (RGB). These models require a three-dimensional image as input, with the extra dimension provided by the three color channels. Due to the monochrome nature of MR scans, 3 slices or MIPs can be joined to form a single input image.
This opens the door to the use of several postcontrast slices or MIPs in a single input image, or the use of 3 consecutive slices to form an input image of semi-3D MRI.
Some more approaches include using real-world 3D MRI scans, extending the use of 2D DL models to 3D data, or turning to dedicated 3D DL models like DenseNet. Artificial intelligence (AI) algorithms that use the full 3D or 4D breast MRI data set are hypothesized to outperform baseline methods that resort to dimension reduction.
Each of the aforementioned methods has been used at some point in lesion categorization research using MRI. A number of research teams fed in 2D slices of the ROIs[92]–[94]. They were able to get AUCs between 0.908 and 0.991. Using MIPs, other research[95] has found AUCs of 0.88 and 0.895. Researchers found AUCs between 0.84 and 0.92 when using the three RGB channels in conjunction with multiple post-contrast slices[96]–[98]. Last but not least, research using genuine 3D MRI images has yielded AUC values of 0.852 and 0.859[98], [99]. All of the aforementioned studies used different datasets, so while the areas under the curves are comparable and may even appear to decrease as one moves from conventional 2D methods to improved 3D animation techniques, it is mandatory to notice that these quantities are unable to be compared with each other. However, studies comparing their results with radiologists' interpretations did use a variety of methods[97]–[99].
The particularity of AI models was greater than the rate for radiologists, on average, although their sensitivity was around the same or even lower.
Table 4
Review of the researches based on MRI datasets
Dataset | Year | Author | DL Technique | Performance measure |
TCIA | 2020 | Zheng et al.[100] | CNN | Acc = 97.2% |
Private | 2022 | Liu et al.[101] | Weakly ResNet-101 | Accuracy = 94% AUC = 0.92 |
Private | 2022 | Wu et al.[102] | CNN | Acc = 87.7% AUC = 91.2% |
QIN Breast DCE-MRI | 2021 | Carvalho et al.[103] | SegNet and UNet | IOU = 95.3% Dice = 97.6% |
Private | 2021 | Wang et al.[104] | CNN | Dice = 76.4% |
TCGA-BRCA | 2022 | Khaled et al.[105] | 3D U-Net | Dice = 68% |
Private | 2022 | Zhu et al.[106] | V-Net | C: Avg. AUC = 0.84 S: Dice = 86% |
Private | 2022 | Rahimpour et al.[107] | 3D U-Net | Dice = 78% |
Private | 2022 | Yue et al.[108] | Res_U-Net | Dice = 89% |
Private | 2021 | Dutta et al.[109] | Multi-contrast D-R2UNet | F1 score = 95% |
Private | 2022 | Verburg et al.[110] | CNN | AUC = 0.83 |
3.2.4 Digital Breast Tomosynthesis:
DBT has become the standard breast imaging technique because of its excellent rates of cancer detection. The rate of cancer detection (CDR) is higher with DBT compared to FFDM, although the recall rate (RR) is lower[111]–[113]. Several DL algorithms for cancer detection on DBT images have been proposed, following the same logic[61], [114]–[117]. For instance, to determine if an image is normal, benign, high-risk, or cancerous, scientists in[118] developed a ResNet - based DL model. The model was initially learned using the FFDM dataset and subsequently fine-tuned using 2D reconstructions of DBT images gathered using the Highest Density Emission in two dimensions technique. On the DBT dataset, their method had an AUC of 84.7%. In order to categorize large datasets, a deep CNN was designed[114] that makes use of DBT volumes. The AUC for their proposed procedure was 84.7%, that is around 2% better than the standard CAD technique using manually created features.
Medical image analysis is one area where DL models excel, but they have a significant limitation: a lack of suitable training datasets. Data collection and labeling is a costly endeavor in the medical industry. Some research has attempted to address this issue by employing transfer learning. The authors[119] of the study created a two-step transfer learning strategy to determine whether or not DBT images should be classified as mass or normal. Before moving on to train a model with DBT photos, the authors modified a pre-trained AlexNet model with FFDM data. The DBT pictures' features were retrieved using a model of CNN in its second phase, and then the random forest classifier decided if the features were abnormally large or normal. Their AUC on the test dataset was 90%. To classify FFDM and DBT images as cancerous or benign, the authors of another study[120] employ a VGG19 network learned on the ImageNet dataset to get the features.
An SVM was then used to evaluate the probability of malignancy depend on the retrieved features. On the DBT images, their approach achieved an AUC of 98% in the CC view and 97% in the MLO view. These techniques demonstrate that DL models can achieve satisfactory results even with a modest training dataset and the incorporation of transfer learning strategies. The majority of these researches contrast their DL algorithms with more conventional CAD techniques. However, direct comparison with a radiologist is the gold standard for assessing a DL method's efficacy. The effectiveness of DL models on DBT And FFDM, for instance, has been studied. This research demonstrates that a DL system can reduce recall rates for FFDM images while maintaining or improving sensitivity to that of radiologists. Even if the recall rate has improved, an AI system can achieve the same behavior as radiologists on DBT images.
Table 5
Review of the researches based on DBT datasets
Dataset | Year | Author | DL Technique | Performance measure |
DBTex challenge | 2022 | Hossain et al.[121] | CNN | Avg. sensitivity = 0.84 |
DBTex challenge | 2022 | Hossain et al.[122] | CNN | Avg. sensitivity = 0.815 |
Private | 2022 | Buda et al. | CNN | Sensitivity = 65% |
BCS-DBT Private | 2022 | Bai et al.[123] | GCN (Graph Convolutional Network) | Acc = 84% AUC = 0.87 |
VICTRE | 2022 | Mota et al.[124] | CNN | AUC = 0.941 |
Private | 2021 | Matthews et al.[125] | Transfer learning based on ResNet | AUC = 0.9 |
Private | 2020 | Singh et al.[118] | CNN | AUC = 0.85 |
DBTex challenge | 2021 | Shoshan et al.[126] | CNN | Avg. sensitivity = 0.91 |