Drowsiness Detection for Drivers Using Convolutional Neural Networks (CNNs)

doi:10.21203/rs.3.rs-4808639/v1

Download PDF

Research Article

Drowsiness Detection for Drivers Using Convolutional Neural Networks (CNNs)

https://doi.org/10.21203/rs.3.rs-4808639/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Drowsy driving is a huge risk to road safety, resulting in serious accidents. Drowsiness detection in drivers can assist prevent accidents by providing timely alarms and actions. In this research, we look at how deep learning Convolutional Neural Networks (CNNs) can be used to identify tiredness. We discuss our technique, experimental results, and conclusions about the efficacy of deep learning CNNs in solving this essential road safety issue. In this work, we employed two convolutional neural networks, one is built from scratch and another one that is a pre-trained model (ResNet). Our findings show that the pre-trained outperformed the one built from scratch achieving an accuracy of 95.6%

Artificial Intelligence and Machine Learning

Drowsiness

drowsy driving

CNN

pre-trained model

ResNet50

Drowsy driving is a severe threat to road safety, resulting in innumerable accidents and fatalities around the world. Detecting and treating sleepiness in drivers is critical for avoiding potential accidents on the road [1]. Significant research efforts have been devoted in recent years to building sleepiness detection systems utilizing diverse approaches. Deep learning Convolutional Neural Networks (CNNs) have emerged as a useful tool for processing visual data and extracting relevant features among various approaches. Nonetheless, despite progress in this field, there are still some limits that must be addressed [2].

Existing sleepiness detection research has mostly focused on machine learning and computer vision techniques. Some methods have relied on specific traits such as eye closure duration, head position, and blink rate to assess a driver's drowsiness. However, because they rely primarily on explicit feature engineering, these techniques frequently suffer from low resilience and generalization and may fail to catch small variations in sleepiness patterns [3].

Deep learning CNNs have demonstrated amazing performance in a variety of computer vision tasks in recent years, including object detection, image categorization, and facial expression analysis [4]. Several research have looked into the use of CNNs for sleepiness detection, taking use of their capacity to train and extract relevant features straight from raw visual input. These approaches have produced promising results in accurately recognizing sleepy states based on facial signals and expressions by training CNN models on large-scale datasets [5].

Deep learning CNNs have the ability to automatically learn hierarchical representations, collecting both low-level and high-level data. This allows the models to detect complicated patterns and minor variations that indicate tiredness. CNNs can also manage complicated and diverse datasets, allowing them to adapt and generalize effectively to different individuals, lighting circumstances, and camera angles [6–10].

Despite progress, existing research on drowsiness detection using deep learning CNNs have limitations and problems. One significant problem is the lack of labeled datasets created specifically for sleepiness detection. It is still difficult to collect large-scale, diversified, and well annotated datasets that include many sleepy states and circumstances. Inadequate and unbalanced data can stymie the training process and have an impact on the model's effectiveness and generalizability [7].

Another disadvantage of deep learning CNN models is their interpretability. While CNNs excel at learning complex representations, their internal workings are "black boxes." Understanding the decision-making process and interpreting the learned characteristics becomes difficult. This lack of interpretability may jeopardize the credibility and acceptability of these models in real-world situations where transparency and explainability are critical [8].

Furthermore, previous research frequently focuses primarily on facial cues for tiredness identification, ignoring other potential sources of information such as physiological signals (e.g., heart rate, electroencephalogram) or driving behavior (e.g., steering wheel movements, lane departure). Integrating multimodal data and developing fusion algorithms to combine several modalities may improve the robustness and accuracy of sleepiness detection systems [9].

We want to solve some of these limitations and further investigate the potential of deep learning CNNs in sleepiness detection in this study. We want to improve the accuracy, generalization, and interpretability of drowsiness detection systems by leveraging advances in deep learning techniques, thereby contributing to the improvement of road safety and lowering the hazards associated with drowsy driving (Fig. 1).

2.1 Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are a subset of deep learning neural networks that excel at analyzing and extracting characteristics from visual data such as photos and videos (Fig. 2). CNNs have transformed computer vision tasks by using their ability to learn and recognize patterns, objects, and structures inside images automatically [10–20]. They've been used extensively in a variety of fields, including picture categorization, object identification, and facial recognition.

Convolutional layers are the essential building blocks of a CNN, and they play a critical role in capturing local patterns and spatial dependencies inside an image. These layers are made up of several filters or kernels, each of which is responsible for recognizing specific features in the incoming data. These filters are taught to extract relevant features by convolving them with the input image during training [11]. The filters are slid spatially across the image and element-wise multiplications and summations are computed, resulting in a feature map that highlights the existence of the identified features.

One of CNNs' primary advantages is their ability to leverage picture local structure. Parameter sharing is used by convolutional layers, which means that the same set of filters is applied to different parts of the input image. When compared to fully connected networks, this trait dramatically reduces the number of parameters to learn, making CNNs more efficient and scalable for large-scale datasets.

CNNs frequently include pooling layers, typically max pooling, which downsample feature maps by selecting the maximum value within a set neighborhood. Pooling layers aids in reducing the spatial dimensions of feature maps while preserving the most important information. This downsampling method helps to achieve translation invariance, allowing CNNs to recognize features regardless of where they are in the input image [12].

CNN architectures are often made up of numerous stacked convolutional and pooling layers, which are then followed by one or more fully connected layers. The fully connected layers act as the classifier, transferring the high-level characteristics gathered by the preceding layers to particular output classifications. These layers are in charge of understanding the complicated linkages and decision boundaries required for accurate classification [10–20].

A labeled dataset is required to train a CNN. The network learns the ideal values of its parameters (weights and biases) during training by minimizing a loss function that quantifies the difference between predicted and true labels. Backpropagation is used to optimize the network's parameters by computing gradients and employing gradient descent-based optimization methods such as stochastic gradient descent (SGD). Modern CNN architectures frequently use additional techniques to enhance performance and handle common issues [14]. Among these methods are:

Activation functions: Non-linear activation functions, such as ReLU (Rectified Linear Unit), are used to add non-linearity into the network, allowing it to reflect complicated interactions and decision limits.

Regularization: Techniques such as dropout and batch normalization are used to reduce the network's reliance on individual neurons or features during training, thereby preventing overfitting and improving generalization.

Transfer learning and pre-trained models: CNNs can benefit from pre-training on large-scale datasets such as ImageNet, where they learn generic features. These pre-trained models can then be fine-tuned on smaller, task-specific datasets, which saves training time while enhancing performance.

Data augmentation: Image rotation, scaling, and mirroring techniques are used to artificially expand the amount and diversity of the training dataset, allowing the network to generalize better to new data.

CNNs have outperformed traditional techniques in accuracy and performance in a variety of computer vision tasks. Because of their ability to automatically learn and extract significant features from visual data, as well as their hierarchical architecture, they have become a cornerstone of modern computer vision systems. CNNs are still being actively researched and developed, pushing the limits of what is achievable in visual identification and comprehension tasks.

3.1 Model Training

Our drowsiness detection system consists of two main stages: face detection and drowsiness classification. In the face detection stage, we employ a pre-trained CNN model to detect and extract facial regions from the input video frames. This helps isolate the relevant area for subsequent drowsiness classification. For drowsiness classification, we train a CNN model on a large dataset of labeled images containing both drowsy and alert states. The CNN model learns to discriminate between these two states and predicts the likelihood of drowsiness based on extracted facial features (Fig. 3).

To train the CNN model, we augment the dataset [14] with various transformations, such as image rotation and scaling, to improve generalization and robustness. We utilize a training-validation split to fine-tune the model parameters and optimize its performance. We also employ techniques like data balancing to address potential class imbalance issues and ensure accurate representation of both drowsy and alert instances.

3.2 Results Evaluation

As mentioned before the model was trained and tested using the xxx dataset []. 70% of the data were used for training while the rest were used for testing the model. Table 1 below shows the testing results of the network. We trained two models on the drowsiness detection dataset. The first model was a CNN built from scratch while the second was a pre-trained model (ResNet50). It was noticed that the pre-trained model achieved better accuracy than the one built from scratch. This is due to the learned weights that the pre-trained have from its original training on the huge ImageNet dataset, which grants powerful ability in extracting useful features from the face.

Table 1

Results evaluation of CNN and the ResNet50
	Accuracy
CNN	91%
Pre-trained model (ResNet50)	95.6%

Figure 4 and 5 show the accuracy variations the CNN and ResNet50, respectively.

Previously, machine learning algorithms were employed for manual feature extraction, however deep learning architectures now eliminate manual feature extraction. It learns automatically. Deep learning architectures now extract their features automatically. So, we save a lot of time here because we would otherwise have to spend time calculating the feature needs to better the categorization result.

The drowsy observation model's dataset was extensively evaluated under a variety of light circumstances, including exposure and occlusion. In most circumstances, our model works perfectly. It has a higher than 90% accuracy rate. We tested the model on people of various ages and skin tones and found it to be effective. The success rate when driving in real-world scenarios was deemed perfect.

We also discovered that its efficiency is lacking in several circumstances. In low-light settings or when the flashlight is focused at the camera lens in the backdrop, the trained model's performance was not totally satisfied. However, because the dataset did not contain any low-light photographs, it was acceptable. That is why we just expect it to perform properly in this situation.

Our experiments demonstrate the effectiveness of the deep learning CNN approach for drowsiness detection. The trained CNN model achieved an accuracy of 91% of a CNN built from scratch set, accurately classifying instances of drowsiness and alertness. The precision and recall values for drowsiness detection indicated a balanced performance. On the other hand, our pre-trained model showed higher performance (95.6%) in terms of accuracy compared to the CNN. The system also showed promising results in real-time video processing, exhibiting a low latency and providing timely drowsiness alerts.

In this study, we successfully applied deep learning CNNs for the detection of drowsiness. The results indicate that CNN models can effectively learn and extract discriminative features from facial images, enabling accurate identification of drowsy states. Our system demonstrates high accuracy, precision, and recall values, highlighting its potential for real-world implementation in drowsiness detection systems. The application of deep learning CNNs in this context holds great promise for enhancing road safety by alerting drivers and preventing accidents caused by drowsy driving.

Amodio A, Ermidoro M, Maggi DFS, Savaresi SM (2018) Automatic detection of driver impairment based on pupillary light reflex.IEEE Transactions on Intelligent Transportation Systems, pp. 1–11
Yang JH, Mao ZH, Tijerina L, Pilutti T, Coughlin JF, Feron E (2009) Detection of driver fatigue caused by sleep deprivation. IEEE Trans Syst Man Cybernetics-Part A: Syst Hum 39(4):694–705
Hu S, Zheng G (2009) Driver drowsiness detection with eyelid related parameters by support vector machine. Expert Syst Appl 36(4):7651–7658
Fujiwara K, Abe E, Kamata K, Nakayama C, Suzuki Y, Yamakawa T, Hiraoka T, Kano M, Sumi Y, Masuda F, Matsuo M, Kadotani H (2019) „„Heart rate variability-based driver drowsiness detection and its validation with EEG,‟‟ IEEE Trans. Biomed. Eng., vol. 66, no. 6, pp. 1769–1778, Jun
George CFP (2007) Nov. Sleep apnea, alertness, and motor vehicle crashes,‟‟ Amer. J. Respiratory Crit. Care Med., vol. 176, no. 10, pp. 954–956
Owens JA, Dearth-Wesley T, Herman AN, Whitaker RC (2019) „„Drowsy driving, sleep duration, and chronotype in adolescents,‟‟ J. Pediatrics, vol. 205, pp. 224–229, Feb
Abiyev RH, Ma’aitaH MKS (2018) Deep convolutional neural networks for chest diseases detection. Journal of healthcare engineering, 2018(1), 4168538
Abiyev RH, Altabel MZ, Darwish M, Helwan A (2024) A Multimodal Transformer Model for Recognition of Images from Complex Laparoscopic Surgical Videos. Diagnostics 14(7):681
Helwan A, Ma’aitah MKS, Uzelaltinbulat S, Altobel MZ, Darwish M (2021), April Gaze prediction based on convolutional neural network. In International conference on emerging technologies and intelligent systems (pp. 215–224). Cham: Springer International Publishing
Helwan A, Azar D, Ma’aitah MKS (2024) Conventional and deep learning methods in heart rate estimation from RGB face videos. Physiol Meas 45(2):02TR01
Helwan A, Ma’aitah MKS, Hamdan H, Ozsahin DU, Tuncyurek O (2021) Radiologists versus Deep Convolutional Neural Networks: A Comparative Study for Diagnosing COVID-19. Computational and Mathematical Methods in Medicine, 2021(1), 5527271
Helwan A, Ma’aitah S, Abiyev MK, Uzelaltinbulat RH, S., Sonyel B (2021) Deep learning based on residual networks for automatic sorting of bananas. Journal of Food Quality, 2021(1), 5516368
& Haque, M. A. (2021). An online survey and review about the awareness, coping style,and exercise behavior during the COVID-19 pandemic situation by implementing the cloud-based medical treatment technology system in China among the public. Science Progress, 104(2), 00368504211000889
Helwan A, Ma’aitah MKS (2022), September Medical Application of Deep Learning-Based Detection on Malignant Melanoma. In International Conference on Emerging Technologies and Intelligent Systems (pp. 627–637). Cham: Springer International Publishing
Helwan A, Ma’aitah MKS, Uzelaltinbulat S, Sonyel B, Altobel MZZ, Darwish M (2021), April Stacked autoencoders deep learning approach for left ventricular localization in magnetic resonance slices. In International Conference on Emerging Technologies and Intelligent Systems (pp. 225–234). Cham: Springer International Publishing
Helwan A, Azar D, Ma’aitah MKS (2024) Conventional and deep learning methods in heart rate estimation from RGB face videos. Physiol Meas 45(2):02TR01
Helwan A, Menekay M, Ma’aitah MKS (2020), August Machine learning for better understanding of autistics. In International conference on theory and applications of fuzzy systems and soft computing (pp. 817–825). Cham: Springer International Publishing
Helwan A, Ma’aitah MKS (2020), August Machine learning in automated chest radiographs classification. In International conference on theory and applications of fuzzy systems and soft computing (pp. 810–816). Cham: Springer International Publishing
Menekay M, Maaitah MKS (2017), December Applying expert system for bank credit authorization using fuzzy tools. In Proceedings of the 9th International Conference on Education Technology and Computers (pp. 258–261)
Helwan A, Ma’aitah MKS (2022), September Medical Application of Deep Learning-Based Detection on Malignant Melanoma. In International Conference on Emerging Technologies and Intelligent Systems (pp. 627–637). Cham: Springer International Publishing
Kaggle dataset https://www.kaggle.com/datasets/prasadvpatil/mrl-dataset

The authors declare no competing interests.

Download PDF

Version 1

posted

You are reading this latest preprint version

Drowsiness Detection for Drivers Using Convolutional Neural Networks (CNNs)

Status:

Version 1

Abstract

Figures

1. Introduction

2. Methods

2.1 Convolutional Neural Networks

3. Results

3.1 Model Training

3.2 Results Evaluation

4. Discussion

5. Conclusion

References

Additional Declarations

Status:

Version 1