Due to its best capacity to handle enormous amounts of data, DNN, an ANN with several layers, has been demonstrated as one of the most powerful tools in recent years. This is shown in [6]. In several fields, deep neural networks have proven to be more effective than conventional techniques, most significant in terms of pattern recognition. The most popular deep learning model is the convolution neural network, and significant layers include convolution, max pooling, flatten layers, and dense.
2.1 Convolution Neural Network and its Performance
In recent years, the ML discipline has greatly benefited from the use of DL-based methods. In addition, it is steadily overtaking other computation techniques in the field of machine learning. Because it also offers the best accuracy for a variety of complex models and, in many circumstances, crosses some highly developed human performances. The ability of DL models to handle enormous amounts of data is their key advantage. This is proven in [7] The DL sector has experienced an explosion in popularity recently and is successfully used in a variety of applications. DL has outperformed well-known ML approaches in a variety of disciplines, including cyber security, bioinformatics, language processing, robotics, and medical information processing, among others.
The authors of [8] have created an automatic detection method as a quicker diagnosis option. This study aims to use a CNN and LSTM (long short term memory)-based DL approach to diagnose corona from x-rays. Deep features are extracted using CNN, and the extracted features are detected using LSTM. The dataset for the system contains 1526 photos of the corona in 4575 xray images. They have suggested a method in this study that uses the testing findings to derive an accuracy of 99.4, an AUC of 99.9, and a specificity of 99.9.
Nowadays, it is becoming more and more important to optimize CNN for accuracy, training duration, and scalability. For optimizing CNN, it is important to understand the CNN structure. In [1], an investigation of CNN's structure on various CPU, TPU, and GPU devices is carried out for benchmark applications. This article establishes a relationship between performance and CNN structure. Layer construction, activation function, job loss, familiarity, efficiency, and speedy calculation are just a few of the areas in which Li Wang and Jason Kuen examined CNN's evolution in [2]. Additionally, they examined CNN applications in voice, NLP, and computer vision in this article. In [3], the authors discussed the emergence and advancement of deep learning, neural CNN, transformation factor, and convolution neural network integration. They concluded by predicting a novel approach to future development.
Authors in [4] have presented a new approach based on Wavelet Network by employing the ambiguous code method to enhance the representation of the image and coding. The method in [5] that is suggested for measuring biorepositories in a low-exposure setting is based on statistical analysis and the use of the neural network isolation method. The findings demonstrate the proportion of various biorepositories in the sample.
2.2 Accelerating CNN
In recent years, use of CNN has increased manifold in many domains. One major issue with CNN is the training time, especially if the dataset is very large. There has been a lot of research done to reduce the training time of CNN without compromising its accuracy. For accelerating CNN training, use of GPUs GPU clusters, TPUs, and FPGAs have been used in the past. In [6], a survey of accelerating techniques to improve the performance of CNN using hardware architectures in ASIC is described. FPGA-based acceleration for CNN is proposed in [7]. GPU-based acceleration for real-time processing of CNN is proposed in [8]. In [9], improving the efficiency of CNN and accelerating the training time with respect to various hardware architectures and networks are analyzed. The performance of CNN in GPU and TPU is benchmarked in [10]. In [11], a detailed survey of different methods to accelerate CNN is proposed. Even though there are many hardware-centric approaches to accelerate the performance of CNN, distributed CNN is one of the methods to scale CNN. In recent times, Distributed CNN has gained popularity because of the following reasons.
- Applications are inherently distributed in nature
- Availability of distributed resources in cloud environments for accelerated performance
- Big data and IoT has made distributed CNN vital
In the next section, the current state of distributed CNN implementations especially in a cloud environment is discussed.
2.3 Distributed CNN in Cloud Environment
A distributed CNN for inference distribution called DistrEdge is proposed in [12]. Results in [12] show 1.1 to 3 times speedup. A Distributed Deep CNN-LSTM Model for Intrusion detection is proposed in [13]. A distributed approach to decentralize learning in micro clouds is proposed in [14]. In [15], a method to accelerate distributed deep learning is proposed. A hybrid model with LSTM and CNN is proposed in [16] to forecast the short-term load. With many research techniques proposed to distribute CNN, the main challenges of balancing training time and accurate results still remain. In addition to that as CNN has increased a number of features, tuning hyperparameters is again a challenging task, especially in a distributed environment.
2.4 SCOPE for Improvement
Even though there has been significant research in the areas of distributed deep learning, the following remains an open challenge.
- Autotuning of hyperparameters of a CNN in distributed neural network to achieve an accelerated learning curve is still a challenge
- Improving training time without compromising accuracy is also a challenge that is open in a distributed environment.
- Analysis of the impact of scaling a distributed neural network is not done yet and this study will bring upon techniques to solve the problem.