DConvLSTM: An Autotuning Distributed CNN Model in Cloud Environment for Optimized Accuracy, Scaling, and Execution Time

doi:10.21203/rs.3.rs-1924139/v1

Download PDF

Research Article

DConvLSTM: An Autotuning Distributed CNN Model in Cloud Environment for Optimized Accuracy, Scaling, and Execution Time

https://doi.org/10.21203/rs.3.rs-1924139/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Artificial Intelligence (AI) and deep learning have altered the digital landscape. Data grows in lockstep with technological advancements. The training procedure for Deep Learning algorithms is becoming increasingly complex as the amount of data available grows. In the deep learning process, having superior technology to speed up the training and testing time is critical. CNN (Convolutional Neural Networks) is a deep learning algorithm for image processing that is widely utilized. In this paper, a novel distributed CNN with LSTM is proposed to optimize accuracy, execution time, and scalability. Different methodologies and tactics for improving CNN training time through distributed models are studied. To comprehend the neural network structure of parallelizing and spreading the execution, each layer of the CNN is investigated and analyzed. Computations on the CPU, GPU, and TPU are also investigated, as well as settings like Google Colab, AWS SageMaker, and others. According to the analysis, the use of an L2 regularization, dropout layer, and a ConvLSTM for autotuning of hyperparameters improves speed in all settings, The final CNN, with these additional layers for performance acceleration, is subsequently deployed in Google Cloud Platform Virtual Machine (VM) instances. This makes it easier to investigate the performance of the proposed distributed deep learning model at scale. Due to the communication between multiple nodes involved in distributing a deep learning model, the model's accuracy usually suffers. But in this proposed work utilizing an autotuning LSTM and Distributed CNN algorithm that distributes optimally and balances accuracy and speed-up is proposed in this proposed work. The proposed models achieved a 2.206 percent boost in speed with respect to data parallel DConvLSTM and a 1.4 percent improvement in model parallel training time of DConvLSTM. The maximum accuracy achieved was 93.3825 and 89.59 percent in data-parallel and model-parallel executions respectively.

Distributed Deep Learning

Deep Neural Network in Cloud

Accelerated Convolution Neural Network

Distributed Autotuning of Hyperparameters

Acceleration

and Optimization of Deep Neural Network

Deep Learning (DL) has recently gained popularity in a variety of domains, including image processing and classification, medical imaging, speech recognition, NLP (Natural Language Processing), and translations. With the growth of data and models, better and more powerful software and high-performance computer hardware are required for data processing and to improve the training of complicated models. Such systems are in high demand in both industry and academia, and much research is being done in this area. For image processing applications, the Convolutional Neural Network (CNN) is a frequently used deep learning network. As the amount of data grows, the CNN, which is commonly employed in Deep learning algorithms for image processing applications, will need to improve. As technology and data increase, there is a need for CNN to improve its performance accordingly. I've experimented with many approaches to improve the CNN. Every layer of the CNN, as well as the impact of each layer on the CNN has to be analyzed so that a better distributed deep learning model can be designed.

A neural network is a collection of neurons that can learn and alter on their own, resulting in the greatest results. It is hypothesized that it functions and processes in the same way as the human brain. There are various neural networks available to solve DL-related challenges and applications. However, in order to increase performance in both training time and accuracy, one must understand the predefined designs and provide the most appropriate architecture in terms of accuracy and performance. The convolutional neural network (CNN) is a type of neural network that uses is used for Object Detection, Facial Recognition, and many more. The primary challenges of CNN include the requirement of a big trained dataset and, fast training speed. If the dataset isn't large enough, the model's accuracy is more likely to be low. CNN is one of the most complex neural networks.

There are numerous methods for enhancing the accuracy and training time of a neural network. Dropout Layer and early stopping are two optimization methods for cutting training time. The dropout approach is one of the best optimization methods. After the max-pooling layer, a dropout layer is added to optimize the network's weights. Dropout largely helps the model avoid over-and underfitting the data. The dropout probability value will, in accordance with the models, vary between 0 and 0.5 for various datasets and models. Nodes that experience dropout often have dropout probability values less than or equal to 1. According to recent research, LSTM can be used in conjunction with CNN to tune hyperparameters. The model obtains the best accuracy with the aid of LSTM and CNN. A unique variation of RNN called LSTM aids models in resolving issues with short-term memory. It is primarily utilized in NLP applications in real life.

Even though the aforementioned optimization methods will significantly enhance CNN's performance and accuracy, scaling up optimization for deep neural networks is still a work in progress. This research proposes a cloud-based distributed deep learning approach with autotuning hyperparameter. Accuracy, training time, and scaling are examined in the results. The results are encouraging.

Due to its best capacity to handle enormous amounts of data, DNN, an ANN with several layers, has been demonstrated as one of the most powerful tools in recent years. This is shown in [6]. In several fields, deep neural networks have proven to be more effective than conventional techniques, most significant in terms of pattern recognition. The most popular deep learning model is the convolution neural network, and significant layers include convolution, max pooling, flatten layers, and dense.

2.1 Convolution Neural Network and its Performance

In recent years, the ML discipline has greatly benefited from the use of DL-based methods. In addition, it is steadily overtaking other computation techniques in the field of machine learning. Because it also offers the best accuracy for a variety of complex models and, in many circumstances, crosses some highly developed human performances. The ability of DL models to handle enormous amounts of data is their key advantage. This is proven in [7] The DL sector has experienced an explosion in popularity recently and is successfully used in a variety of applications. DL has outperformed well-known ML approaches in a variety of disciplines, including cyber security, bioinformatics, language processing, robotics, and medical information processing, among others.

The authors of [8] have created an automatic detection method as a quicker diagnosis option. This study aims to use a CNN and LSTM (long short term memory)-based DL approach to diagnose corona from x-rays. Deep features are extracted using CNN, and the extracted features are detected using LSTM. The dataset for the system contains 1526 photos of the corona in 4575 xray images. They have suggested a method in this study that uses the testing findings to derive an accuracy of 99.4, an AUC of 99.9, and a specificity of 99.9.

Nowadays, it is becoming more and more important to optimize CNN for accuracy, training duration, and scalability. For optimizing CNN, it is important to understand the CNN structure. In [1], an investigation of CNN's structure on various CPU, TPU, and GPU devices is carried out for benchmark applications. This article establishes a relationship between performance and CNN structure. Layer construction, activation function, job loss, familiarity, efficiency, and speedy calculation are just a few of the areas in which Li Wang and Jason Kuen examined CNN's evolution in [2]. Additionally, they examined CNN applications in voice, NLP, and computer vision in this article. In [3], the authors discussed the emergence and advancement of deep learning, neural CNN, transformation factor, and convolution neural network integration. They concluded by predicting a novel approach to future development.

Authors in [4] have presented a new approach based on Wavelet Network by employing the ambiguous code method to enhance the representation of the image and coding. The method in [5] that is suggested for measuring biorepositories in a low-exposure setting is based on statistical analysis and the use of the neural network isolation method. The findings demonstrate the proportion of various biorepositories in the sample.

2.2 Accelerating CNN

In recent years, use of CNN has increased manifold in many domains. One major issue with CNN is the training time, especially if the dataset is very large. There has been a lot of research done to reduce the training time of CNN without compromising its accuracy. For accelerating CNN training, use of GPUs GPU clusters, TPUs, and FPGAs have been used in the past. In [6], a survey of accelerating techniques to improve the performance of CNN using hardware architectures in ASIC is described. FPGA-based acceleration for CNN is proposed in [7]. GPU-based acceleration for real-time processing of CNN is proposed in [8]. In [9], improving the efficiency of CNN and accelerating the training time with respect to various hardware architectures and networks are analyzed. The performance of CNN in GPU and TPU is benchmarked in [10]. In [11], a detailed survey of different methods to accelerate CNN is proposed. Even though there are many hardware-centric approaches to accelerate the performance of CNN, distributed CNN is one of the methods to scale CNN. In recent times, Distributed CNN has gained popularity because of the following reasons.

Applications are inherently distributed in nature
Availability of distributed resources in cloud environments for accelerated performance
Big data and IoT has made distributed CNN vital

In the next section, the current state of distributed CNN implementations especially in a cloud environment is discussed.

2.3 Distributed CNN in Cloud Environment

A distributed CNN for inference distribution called DistrEdge is proposed in [12]. Results in [12] show 1.1 to 3 times speedup. A Distributed Deep CNN-LSTM Model for Intrusion detection is proposed in [13]. A distributed approach to decentralize learning in micro clouds is proposed in [14]. In [15], a method to accelerate distributed deep learning is proposed. A hybrid model with LSTM and CNN is proposed in [16] to forecast the short-term load. With many research techniques proposed to distribute CNN, the main challenges of balancing training time and accurate results still remain. In addition to that as CNN has increased a number of features, tuning hyperparameters is again a challenging task, especially in a distributed environment.

2.4 SCOPE for Improvement

Even though there has been significant research in the areas of distributed deep learning, the following remains an open challenge.

Autotuning of hyperparameters of a CNN in distributed neural network to achieve an accelerated learning curve is still a challenge
Improving training time without compromising accuracy is also a challenge that is open in a distributed environment.
Analysis of the impact of scaling a distributed neural network is not done yet and this study will bring upon techniques to solve the problem.

Due to its extensive application in real-time scenarios, deep learning has been expanding dramatically over the past few years. The performance of a Deep Neural Network (DNN) is influenced by computing power, the size of the trainable parameters, and network complexity. We need greater hardware requirements since high-level DNN has significant computational needs that cannot be handled effectively by a simple, single CPU. In this work, the effects of the CPU, GPU, and TPU on CNN have been researched. The GPU operates on DNN with a high number of ALUs, however, data access is very high and memory access becomes challenging since a single command might lead to several data creation issues.

More floating-point statistics are used in CNN models with increased DRAM access. Large integration shortens the borders, increases matrix size, and minimizes the number of parameters. Even if the image is shot from a different perspective, rotated, or cropped, high resolution is still employed to acquire exact details. The model is helped by rounding and reducing the matrix by 75%, which helps prevent overloading by releasing more information. From the pooling layer, we will have a 22 or 44 matrix from the images once more, and it notes the large values from the obtained matrix so that high-level elements can be equally noted.

The direct communication ratio (i.e., the fully connected layers) to complete layers and indirect communication (i.e., the layers that are connected over a specific time) of CNN are the two main concepts that have been considered in this work and they are the most significant factors that determine model performance. An analysis is also done to establish the use of the max-pooling layer and variations in the parameters, accuracy, and training period affected the results. This work's investigation found that, whether running on single or multiple GPU nodes, convolution neural networks are significantly impacted by the dropout layer. This was demonstrated both with and without the max-pooling layer.

3.1 Objectives of Proposed Work

The proposed work, DConvLSTM aims to achieve the following objectives.

To perform distribution of autotuning CNN for the accelerated learning curve
To perform distribution of CNN with optimized distributed training time and accuracy of results

In addition to the above objectives, a detailed analysis to identify the effect of scaling on the proposed DConvLSTM model in the cloud environment is also explained in this article.

3.2 Distributed ConvLSTM

When compared to other networks, it has been found that using LSTM for NLP-based applications like hate speech results in the highest accuracy. This is as a result of the LSTM's specific memory cell characteristic, which sets it apart from other models. This can be extended for image identification also. The memory cell will retain a sequence from each image that passes through the LSTM layer, which makes image prediction easier. Additionally, it increases the model's accuracy. LSTM is usually used for text-based datasets but in this proposed work, LSTM is integrated into CNN for autotuning of hyperparameters in image prediction applications.

There are different methods to add LSTM to CNN. In this proposed work, LSTM is added to CNN with the help of model classes. After the convolution neural networks layer, the output from the flatten layer goes to the next model class containing the LSTM layer followed by a dense layer and an output layer. Overview of DConvLSTM Data-Parallel model and DConvLSTM Model-Parallel Model are shown in Figures 1 and 2 respectively. Proposed Algorithm for DConLSTM architecture is shown in Table 1. given below.

Table 1. Proposed Pseudo-code for DConvLSTM Architecture

For verifying the performance improvement, three benchmark application datasets were chosen to be applied to the CNN. They include a pneumonia detection dataset, face mask detection dataset, diseased leaf detection dataset, and breast cancer detection dataset. Details about the datasets considered are given in Table 2.

Several strategies were tested in this work, starting with hyperparameter optimization to balance accuracy with execution time. The following strategies were examined: CNN without max-pooling, CNN with max-pooling, CNN with max-pooling and dropout, and CNN with dropout and regularization. The next subsections go into further detail about various implementations.

Table 2. Datasets Considered for Implementing Proposed CNN Model

S.No.	Dataset	Details
1.	Pneumonia detection dataset	• Total Images: 3934 • Binary Classification: All • Preprocessing and resizing done: Images resized for values between 0 to 255
2.	Face mask detection dataset	• Total Images: 2605 • Binary Classification: 1905 • Multiclassification:700 images • Testing: 702 images
3.	Diseased leaf detection dataset	• Total Images: 2850 • Binary Classification: All • Testing: 950 images • Preprocessing and resizing done: Images resized for values between 0 to 255
4.	Breast cancer detection dataset	• Total Images: 2800000 • Binary Classification: All • Testing: 80,000 images • Preprocessing and resizing done: Images resized for values between 0 to 255

4.1 CNN without Max-Pooling

Three Convolution layers—flatten, dense, and output—are used in this implementation. All of the photos in the dataset are downsized to the model’s optimal input shape of 50*50. Then grayscale versions of the images are created. A different number of filters are utilized in each stage of the convolution. 32 filters are used in the first layer, 64 filters are used in the second convolution layer, and 128 filters are used in the third convolution layer. The output neuron in the dense layer, which has 512 neurons, initiates a value of either 0 or 1. (binary classification). Depending on the number of classes, more decimal values are utilized in multi-classification cases. In the output layer, the activation functions “Relu” and “Softmax” are utilized. Details are shown in Table 3.

Table 3. CNN without Max-Pooling Layer Details

Layer (Type)	Output Shape	Param #
Conv2d_30 (Conv2D)	(None,46,46,32)	832
Conv2d_31 (Conv2D)	(None,42,42,64)	51264
Conv2d_32 (Conv2D)	(None,38,38,128)	204928
Flatten_10 (Flatten)	(None, 184832)	0
Dense_20(Dense)	(None, 512)	94634496
Dense_21(Dense)	(None, 1)	513

4.2 CNN with Max-Pooling

The CNN model in this case is given a max-pooling layer. Three convolutional layers—flatten, dense, and output—as well as one max-pooling layer make up this model. For the best compression between max-pooling and non-max-pooling CNN models, the same number of filters and neurons are kept in each convolution layer as in the prior instance. Details of this implementation is given in Table 4 .

Table 4. CNN with Max-Pooling Layer Details

Layer (Type)	Output Shape	Param #
Conv2d_12(Conv2D)	(None,46,46,32)	832
Max_Pooling2d (max-pooling)	(None, 23,23,32)	0
Conv2d_13(Conv2D)	(None, 19, 19,64)	51264
Max_Pooling2d_1 (max-pooling)	(None, 9, 9, 64)	0
Conv2d_32 (Conv2D)	(None, 5, 5, 128)	204928
Max_Pooling2d_2 (max-pooling)	(None, 2, 2, 128)	0
Flatten_4 (Flatten)	(None, 512)	0
Dense_8(Dense)	(None, 512)	262656
Dense_9(Dense)	(None, 1)	513
Total Params: 520,193 Trainable Params: 520,193 Non-Trainable Params: 0

4.3 CNN with max-pooling and dropout

Overfitting was a problem in the implementations mentioned above. This approach uses a regularizing mechanism to prevent overfitting. The CNN model gets a new “dropout” layer. The dropout probability is set at 0.2. The new model then consists of a set of three convolution layers, a max-pooling, a dropout, a flatten, a dense layer, and an output layer, as well as other layers. Details of the CNN model with max-pooling and dropout layers are given in Table 5.

4.4 CNN with LSTM, Dropout, and L2 Regularization

Along with the dropout layer in this approach, L2 regularisation is also used. Table 6 explains the implementation in more detail. The accuracy of the CNN model is improved by LSTM since it has a large short-term memory that can store data. Three sets of convolution LSTM (ConvLstm) layers make up the ConvLSTM model: max-pooling and a dropout layer are followed by a flatten layer, a hidden layer, and an output layer. This implementation yields the best outcomes out of all the methods.

Table 5. CNN Model with Dropout Layer Details

Layer (Type)	Output Shape	Param #
Conv2d_62(Conv2D)	(None,46,46,32)	832
Max_Pooling2d _20(max-pooling)	(None, 23,23,32)	0
Dropout_37(dropout)	(None, 23, 23, 32)	0
Conv2d_63(Conv2D)	(None, 19, 19,64)	51264
Max_Pooling2d_21 (max-pooling)	(None, 9, 9, 64)	0
Dropout_38(dropout)	(None, 9,9,64)	0
Conv2d_64 (Conv2D)	(None, 5, 5, 128)	204928
Max_Pooling2d_22 (max-pooling)	(None, 2, 2, 128)	0
Dropout_39(dropout)	(None, 2, 2, 128)	0
Flatten_31 (Flatten)	(None, 512)	0
Dense_61(Dense)	(None, 512)	262656
Dense_62(Dense)	(None, 1)	513
Total Params: 520,193 Trainable Params: 520,193 Non-Trainable Params: 0

Table 6. CNN Model with LSTM Dropout Layer Details

Layer (Type)	Output Shape	Param #
Conv_lstm2d_6(ConvLSTM2D)	(None,1, 46,46,32)	105728
Max_Pooling3d _45(max-pooling)	(None, 1, 23,23,32)	0
Time_Distributed_28(Time Distributed)	(None, 1, 23, 23, 32)	0
Conv_lstm2d_7(ConvLSTM2D)	(None, 1, 19, 19,64)	614656
Max_Pooling3d_46 (max-pooling)	(None, 1, 10, 10, 64)	0
Time_Distributed_29(Time Distributed)	(None, 1, 10, 10, 64)	0
Conv_lstm2d_8(ConvLSTM2D)	(None, 1, 6, 6, 128)	2458112
Max_Pooling3d_47 (max-pooling)	(None, 1, 3, 3, 128)	0
Time_Distributed_30(Time Distributed)	(None, 1, 3, 3, 128)	0
Flatten_19 (Flatten)	(None, 1152)	0
Dense_37(Dense)	(None, 512)	590336
Dense_38(Dense)	(None, 1)	513
Total Params: 33,769,345 Trainable Params: 3,769,345 Non-Trainable Params: 0

The chosen benchmark applications were run using all of the aforementioned models, and their training times were analyzed. The next subsections go into great detail about the distributed computing environment and the analysis of implementation results.

4.5 Distributed Data-Parallel and Model Parallel ConvLSTM in Google Cloud Platform

ConvLSTM in the Google Cloud Platform was tested for accuracy, execution speed, and scaling across several VM instances. The multi-node paradigm is built in this distributed implementation by treating one node as the master and all other nodes as workers. Benchmark datasets are divided into 80-20 groups, where 80% of the dataset is utilized for testing and 20% is used for training. The model is validated using ten percent of the data. The dataset is divided into several ratios after being shuffled. ConvLSTM was implemented using Python. For the communication between nodes, for synchronization, MPI4PY is used. A MPI cluster with the nodes considered is set up. Passwordless ssh between these VM instances is established using its external IP addresses. An NFS (Network File System) directory was created and shared between all the nodes to hold the code and datasets for execution. All the nodes in the cluster has the same version of all the software modules to support distributed execution. Initially, the cluster was set with 3 nodes, 1 master and 2 workers. Later the implementation was scaled to support 5 , 7, 9 and 11 nodes.

Proposed work is implemented for 4 benchmark applications and the results are analyzed for accuracy, execution time and scalability. As described in the previous section, training time with and without max-pooling layer with a fixed accuracy was analyzed. As shown in Figure 9, without max-pooling layer, ConvLSTM tends to take much more time.

5.1 DConvLSTM Data Parallel Execution Results

The accuracy and training time of individual benchmarks in the distributed data-parallel mode are given in Figure 4 and Figure 5 respectively. Distributed data-parallel model has almost 2 times the speed up when compared with CNN in a single node. It is a known fact that data-parallel will be much more accurate as well as faster when executed on a single multi-core node. But for scalability, it is mandatory that the proposed ConvLSTM is distributed across multiple nodes.

In figure 6, the maximum and minimum difference achieved with respect to accuracy is highlighted. BCD is Breast Cancer Detection, FMD is Face Mask Detection, LDD is Leaf Disease Detection and PD is Pneumonia Detection application. The number following the name in Figure 6 represents the batch size. It varies from 16, 32, 64, and 128. From Figure 6, it can be seen that the Leaf Disease Detection application has gained the maximum accuracy in a distributed environment when compared with a single node environment and Pneumonia detection has no improvement in accuracy. For all applications, batch size 32 and 64 is more optimized for accuracy improvement in distributed environment.

Considering 1 master and others as worker nodes, ConvLSTM model as discussed above was distributed. Results were analyzed with respect to local maxima (accuracy in a single node), local minima (execution time in a single node), global accuracy (accuracy in master node with all individual node’s results included), and global training time (training time for all iterations with synchronization included at the master node. Aggregated results over benchmark applications considered are shown in Table 7. The results are for 2 workers, 1 master setup.

Table 7. DConvLSTM Data Parallel Model : Accuracy Values achieved Locally and Globally

Local Accuracy at Each Node					Aggregated Accuracy
Master Node		Worker Node 1		Worker Node 2
89.54%		90.42%		88.98%	89.53%
Local Training Time in Seconds					Average Training Time
Master Node	Worker Node 1		Worker Node 2
489.2307692	312.2038462		289.2		347.83
*Master here has increased time for processing due to the synchronization tasks

5.2 DConvLSTM Model Parallel Execution Results

In Figure 7, details of average accuracy achieved with respect to DConvLSTM model parallel execution is shown. In Figure 8, training time comparison is shown. Maximum and minimum improvement achieved are highlighted in Figure 9.

From the results in Figure 9, it can be seen that Leaf disease detection with comparatively less data size gains the maximum improvement in terms of accuracy in DConvLSTM Data parallel model. Aggregated results over benchmark applications considered for model parallel execution is shown in Table 8. This is for 1 master, 2 workers setup. Details of scaling are shown in Figure 10.

Table 8. DConvLSTM Model Parallel Model : Accuracy Values achieved Locally and Globally

Local Accuracy at Each Node					Aggregated Accuracy
Master Node		Worker Node 1		Worker Node 2
88.73%		90.42%		86.43%	88.53%
Local Training Time in Seconds					Average Training Time
Master Node	Worker Node 1		Worker Node 2
1272*	811.73		751.92		781.825
*Master here has increased time for processing due to the synchronization tasks

5.3 DConvLSTM and Scaling

In this section, details about the effect of scaling on distributed deep neural network is analyzed. For scaling analysis, three environments were setups in Google Cloud Platform and the details are listed in Table 9.

Table 9. Distributed Environment and Details

Distributed Infrastructure Environment	Details
E1	1 Master, 2 workers
E2	1 Master, 5 workers
E3	1 Master, 7 Workers
E4	1 Master, 9 workers
E5	1 Master, 11 Workers

In Figure 9, accuracy averaged over multiple batch sizes with respect to different implementation environments is shown. There is a marginal increase in accuracy throughout the scaling. In Figure 10, accuracy averaged over multiple batch sizes with respect to different implementation environment in model parallel DConvLSTM is shown. In both the cases, it can be seen that Breast Cancer Detection gains maximum improvement with respect to scaling while Leaf Disease Detection has negative curve with respect to accuracy and scaling. This is due to the fact that Breast Cancer Detection that contains huge amount of data gets benefited with respect to scaling. For model parallel DConvLSTM, four layers were mapped into four nodes. The rest of the nodes were updated to execute for different sets of data. So in its true sense, for scaled execution and analysis, a hybrid model with data and model parallelism was used .

In the proposed work, a distributed convolution neural network with LSTM for auto-tuning of hyper parameters is proposed. The model is implemented in Google Cloud Platform using Virtual Machines. The environment with single master and 2 workers to multiple workers (maximum of 11) were implemented and the results are analyzed. There is a maximum speed up of 2 times the training time is achieved. Accuracy of more than 92 % was achieved. In this proposed work, the distributed system is considered to be stragglers and staleness free. But in a real-time application, there is a possibility of staleness and stragglers. This will be incorporated in our future work.

Ethical Approval

Not Applicable

Competing interests

No, there is no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Authors' contributions

The manuscript has been read and approved for submission by all the named co-authors.

Harini Sriraman and Saddikuti Lokesh contributed to the design and implementation of the proposed model. Jitendra Sai contributed to the implementation and analysis of the proposed work.

Funding

We have not received any fund for this work.

Availability of data and materials

Data sets can be accessed in Kaggle. It can also be accessed through the github link provided below.

Dataset and Implementations:

https://github.com/HariniSriraman/DConvLSTM.git

Ravikumar, A., Sriraman, H., Saketh, P. M. S., Lokesh, S., & Karanam, A. (2022). Effect of neural network structure in accelerating performance and accuracy of a convolutional neural network with GPU/TPU for image analytics. PeerJ Computer Science, 8, e909.
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G., Cai, J. and Chen, T., 2018. Recent advances in convolutional neural networks. Pattern recognition, 77, pp.354-377.
Boyer, Paul D. "Energy, life, and ATP (Nobel lecture)." Angewandte Chemie International Edition 37, no. 17 (1998): 2296-2307.
Ravi, Renjith V., and Kamalraj Subramaniam. "A Hybrid Bat‐Genetic Algorithm–Based Novel Optimal Wavelet Filter for Compression of Image Data." Nature‐Inspired Algorithms Applications (2021): 89-136.
Lürig, Moritz D., Seth Donoughe, Erik I. Svensson, Arthur Porto, and Masahito Tsuboi. "Computer vision, machine learning, and the promise of phenomics in ecology and evolutionary biology." Frontiers in Ecology and Evolution 9 (2021): 642774.
Moolchandani, Diksha, Anshul Kumar, and Smruti R. Sarangi. "Accelerating cnn inference on asics: A survey." Journal of Systems Architecture 113 (2021): 101887.
Bao, Zhenshan, Junnan Guo, Wenbo Zhang, and Hongbo Dang. "DSCU: Accelerating CNN Inference in FPGAs with Dual Sizes of Compute Unit." Journal of Low Power Electronics and Applications 12, no. 1 (2022): 11.
Ngo, T.D., Bui, T.T., Pham, T.M. et al. Image deconvolution for optical small satellite with deep learning and real-time GPU acceleration. J Real-Time Image Proc 18, 1697–1710 (2021). https://doi.org/10.1007/s11554-021-01113-y
Liu, Congjun. "YOLOv2 acceleration using embedded GPU and FPGAs: pros, cons, and a hybrid method." Evolutionary Intelligence (2021): 1-7.
Sharma, Vijeta, Gaurav Kumar Gupta, and Manjari Gupta. "Performance Benchmarking of GPU and TPU on Google Colaboratory for Convolutional Neural Network." In Applications of Artificial Intelligence in Engineering, pp. 639-646. Springer, Singapore, 2021.
Dhouibi, Meriam, Ahmed Karim Ben Salem, Afef Saidi, and Slim Ben Saoud. "Accelerating Deep Neural Networks implementation: A survey." IET Computers & Digital Techniques 15, no. 2 (2021): 79-96.
Hou, Xueyu, Yongjie Guan, Tao Han, and Ning Zhang. "Distredge: Speeding up convolutional neural network inference on distributed edge devices." In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1097-1107. IEEE, 2022.
Alferaidi, Ali, Kusum Yadav, Yasser Alharbi, Navid Razmjooy, Wattana Viriyasitavat, Kamal Gulati, Sandeep Kautish, and Gaurav Dhiman. "Distributed Deep CNN-LSTM Model for Intrusion Detection Method in IoT-Based Vehicles." Mathematical Problems in Engineering 2022 (2022).
Hong, Rankyung, and Abhishek Chandra. "Dlion: Decentralized distributed deep learning in micro-clouds." In Proceedings of the 30th International Symposium on High- Performance Parallel and Distributed Computing, pp. 227-238. 2021.
Tian, Feng, Yang Zhang, Wei Ye, Cheng Jin, Ziyan Wu, and Zhi-Li Zhang. "Accelerating Distributed Deep Learning using Multi-Path RDMA in Data Center Networks." In Proceedings of the ACM SIGCOMM Symposium on SDN Research (SOSR), pp. 88-100. 2021.
Tian, Feng, Yang Zhang, Wei Ye, Cheng Jin, Ziyan Wu, and Zhi-Li Zhang. "Accelerating Distributed Deep Learning using Multi-Path RDMA in Data Center Networks." In Proceedings of the ACM SIGCOMM Symposium on SDN Research (SOSR), pp. 88-100. 2021.

No competing interests reported.

AppendixA.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

DConvLSTM: An Autotuning Distributed CNN Model in Cloud Environment for Optimized Accuracy, Scaling, and Execution Time

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Study

3 Proposed Work

4 Implementation

5 Results and Analysis

6. Conclusions and Future Work

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1