Drowsy driving is a severe threat to road safety, resulting in innumerable accidents and fatalities around the world. Detecting and treating sleepiness in drivers is critical for avoiding potential accidents on the road [1]. Significant research efforts have been devoted in recent years to building sleepiness detection systems utilizing diverse approaches. Deep learning Convolutional Neural Networks (CNNs) have emerged as a useful tool for processing visual data and extracting relevant features among various approaches. Nonetheless, despite progress in this field, there are still some limits that must be addressed [2].
Existing sleepiness detection research has mostly focused on machine learning and computer vision techniques. Some methods have relied on specific traits such as eye closure duration, head position, and blink rate to assess a driver's drowsiness. However, because they rely primarily on explicit feature engineering, these techniques frequently suffer from low resilience and generalization and may fail to catch small variations in sleepiness patterns [3].
Deep learning CNNs have demonstrated amazing performance in a variety of computer vision tasks in recent years, including object detection, image categorization, and facial expression analysis [4]. Several research have looked into the use of CNNs for sleepiness detection, taking use of their capacity to train and extract relevant features straight from raw visual input. These approaches have produced promising results in accurately recognizing sleepy states based on facial signals and expressions by training CNN models on large-scale datasets [5].
Deep learning CNNs have the ability to automatically learn hierarchical representations, collecting both low-level and high-level data. This allows the models to detect complicated patterns and minor variations that indicate tiredness. CNNs can also manage complicated and diverse datasets, allowing them to adapt and generalize effectively to different individuals, lighting circumstances, and camera angles [6–10].
Despite progress, existing research on drowsiness detection using deep learning CNNs have limitations and problems. One significant problem is the lack of labeled datasets created specifically for sleepiness detection. It is still difficult to collect large-scale, diversified, and well annotated datasets that include many sleepy states and circumstances. Inadequate and unbalanced data can stymie the training process and have an impact on the model's effectiveness and generalizability [7].
Another disadvantage of deep learning CNN models is their interpretability. While CNNs excel at learning complex representations, their internal workings are "black boxes." Understanding the decision-making process and interpreting the learned characteristics becomes difficult. This lack of interpretability may jeopardize the credibility and acceptability of these models in real-world situations where transparency and explainability are critical [8].
Furthermore, previous research frequently focuses primarily on facial cues for tiredness identification, ignoring other potential sources of information such as physiological signals (e.g., heart rate, electroencephalogram) or driving behavior (e.g., steering wheel movements, lane departure). Integrating multimodal data and developing fusion algorithms to combine several modalities may improve the robustness and accuracy of sleepiness detection systems [9].
We want to solve some of these limitations and further investigate the potential of deep learning CNNs in sleepiness detection in this study. We want to improve the accuracy, generalization, and interpretability of drowsiness detection systems by leveraging advances in deep learning techniques, thereby contributing to the improvement of road safety and lowering the hazards associated with drowsy driving (Fig. 1).