Enhancing Traffic Safety with Advanced Machine Learning Techniques and Intelligent Identification

doi:10.21203/rs.3.rs-4987140/v1

Download PDF

Research Article

Enhancing Traffic Safety with Advanced Machine Learning Techniques and Intelligent Identification

https://doi.org/10.21203/rs.3.rs-4987140/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Urban traffic safety is a critical concern due to the frequent lane changes and merges that create complex traffic flow patterns. Traditional methods, such as overhead video recordings, are commonly used to study these zones, but they come with high costs and limitations due to environmental factors like lighting and weather conditions. This work addresses the critical challenges in diversion and merging zones for urban traffic safety, with the primary objective of comprehensively tracking and identifying vehicles from video or image data. The process begins with generating traffic data using the SUMO (Simulation of Urban MObility) platform, which creates a simulated environment for urban traffic within synthetic road networks, taking into account traffic rules, signals, and other relevant factors. The generated data includes vital information such as vehicle IDs, coordinates, speeds, and road segments, providing a detailed representation of traffic dynamics. The next step involves utilizing the yolov8-deepsort framework to analyze driving behavior by accurately tracking and identifying vehicles in the simulated environment. This is followed by real-time risk assessments and the enhancement of traffic safety management. For conflict identification, CatBoost is employed due to its robustness and efficiency. To optimize model performance, CatBoost is further refined using Bayesian Optimization (CatBoost-BO), which fine-tunes the model's hyperparameters. Additionally, SMOTE is applied to address sample imbalance, resulting in a more balanced and accurate model. The model's performance is rigorously evaluated using metrics such as the confusion matrix, accuracy, recall, F1 score, and AUC-ROC, ensuring a comprehensive assessment. Furthermore, SHAP values are used to interpret the model, offering valuable insights into the factors contributing to safety risks. This interpretability is crucial for understanding and mitigating traffic conflicts. The study presents a practical and effective approach to improving urban traffic safety through advanced data analysis and machine learning techniques.

Urban Traffic Safety

Vehicle Tracking and Identification

Machine Learning Optimization

SUMO Simulation

Despite progress in traffic safety research, significant challenges persist. Current approaches struggle with data acquisition, lack detailed analysis of driving behaviors, and fail to provide real-time risk assessments. These studies typically depend on overhead camera systems or UAVs to avoid occlusion, which increases costs and proves unreliable in low-light or adverse weather conditions. Moreover, conventional research tends to focus on macro-level indicators like traffic flow and speed, overlooking the critical micro-level driving behaviors that can reveal deeper insights into traffic conflicts. Additionally, many traffic management systems are not equipped to offer real-time risk assessments, crucial for timely interventions to prevent accidents.

This study seeks to overcome these challenges by developing an advanced model for traffic conflict identification and risk assessment in diversion and merging zones. The research focuses on collecting and preprocessing high-quality traffic data using advanced computer vision techniques. The process starts with generating traffic data using the SUMO (Simulation of Urban MObility) platform, which constructs a simulated urban traffic environment within synthetic road networks, incorporating traffic rules, signals, and other relevant elements. This data includes crucial details such as vehicle IDs, coordinates, speeds, and road segments, offering a comprehensive depiction of traffic dynamics. Subsequently, the yolov8-deepsort framework is used to analyze driving behavior by precisely tracking and identifying vehicles in the simulated setting. This is followed by real-time risk assessments and improvements in traffic safety management. For conflict identification, CatBoost is utilized due to its robustness and efficiency, and its performance is further optimized through Bayesian Optimization (CatBoost-BO), which fine-tunes the model’s hyperparameters.

A robust machine learning model, CatBoost with Bayesian Optimization (CatBoost-BO), is employed for its efficiency and reliability. To enhance the model's performance, SMOTE is used to balance the dataset, addressing sample imbalance issues. Bayesian optimization fine-tunes the model's hyperparameters, resulting in a more accurate and robust model. The model's performance is thoroughly evaluated using metrics such as confusion matrix, accuracy, recall, F1 score, and AUC-ROC. Additionally, SHAP analysis is applied to interpret the model's outputs, identifying key factors contributing to traffic conflicts.

This approach not only improves traffic conflict detection accuracy but also provides a deeper understanding of the factors that influence traffic safety, ultimately contributing to safer urban traffic flow.

2.1 Urban Traffic Safety through Advanced Data Collection and Processing Methods

In urban traffic planning and management, accurate traffic information is crucial for monitoring road capacity, issuing early warnings for potential conflicts, and ensuring safety [1]. Traffic data is divided into static and dynamic types [2], with dynamic data being acquired through automatic or non-automatic methods [3]. While non-automatic methods involve manual surveys, automatic methods such as sensors and cameras provide real-time data. Modern technologies like GPS and IoT sensors, along with crowd-sourced data from smartphones, enhance data collection, though challenges related to privacy and data quality remain. Advanced techniques, including high-resolution computer vision, improve automated traffic processing and vehicle detection [4]. The use of drones for video collection in diversion and merging zones provides detailed views and helps address occlusion issues [5]. Despite advancements, challenges persist in complex scenes and low-light conditions [6].

The research is focused on multi-sensor data fusion and real-time enhancements to further improve video-based traffic monitoring. This study tackles the significant challenges in urban traffic safety, especially within diversion and merging zones, by leveraging advanced methods for data collection and processing. The approach begins with using the SUMO (Simulation of Urban MObility) platform to generate comprehensive traffic data within a simulated urban environment. This environment, which incorporates synthetic road networks and traffic rules, provides detailed data on vehicle IDs, coordinates, speeds, and road segments. Subsequently, the yolov8-deepsort framework is utilized to track and identify vehicles from video or image data, allowing for precise analysis of driving behavior. This analysis is followed by real-time risk assessments aimed at improving traffic safety management.

For conflict detection, CatBoost is employed due to its robust performance, and its effectiveness is further enhanced through Bayesian Optimization (CatBoost-BO), which fine-tunes model hyperparameters. SMOTE is applied to balance the dataset, resulting in a more accurate model. The model's performance is evaluated using metrics such as the confusion matrix, accuracy, recall, F1 score, and AUC-ROC to ensure a thorough assessment. SHAP values are also used to interpret the model, offering insights into the factors contributing to safety risks, which is vital for understanding and mitigating traffic conflicts.

2.2 Traffic Conflict Detection and Risk Prediction Technology

Traffic conflicts represent precursors to accidents, arising from complex interactions among drivers, vehicles, infrastructure, and environmental conditions. These conflicts can be identified by observing risk-mitigating maneuvers or by analyzing spatial and temporal proximity between road users [7]. While the majority of research has concentrated on intersection scenarios, there is a growing interest in understanding conflicts in diversion and merging areas [8].

In the Moroccan context, traditional approaches to traffic safety rely heavily on accident data [9]. Statistical models such as accident proximity distribution and diamond grading are employed to evaluate the severity of conflicts [10]. Zhang et al. introduced a logistic regression model tailored for suburban arterial highways, providing a framework for assessing traffic safety based on conflict risk levels [11].

Recent advancements have seen a surge in the application of machine learning techniques to integrate diverse traffic indicators for conflict detection. Conventional methods primarily rely on distance, speed, and time-based metrics, each with inherent limitations. Video-based analysis offers a more comprehensive approach, leveraging machine learning algorithms to identify and classify conflict situations [12]. Sun et al. demonstrated the potential of probabilistic models in estimating collision likelihood from video data, providing a promising avenue for future research [13].

2.3 Oversampling Technique on Imbalanced Dataset

The Synthetic Minority Oversampling Technique (SMOTE) is an advanced method designed to address the issue of class imbalance in datasets, which is a common challenge in traffic conflict detection and risk prediction. Imbalanced datasets occur when the number of instances in one class (usually the minority class, such as accidents or near-misses in traffic safety studies) is significantly lower than in the other class (the majority class). This imbalance can lead to biased predictive models that are more likely to favor the majority class, thereby reducing the accuracy and reliability of traffic safety models.

SMOTE mitigates this problem by artificially generating new instances for the minority class. It does this by selecting two or more similar instances from the minority class and creating synthetic samples that lie between them. This process increases the representation of the minority class without simply duplicating existing data, which helps prevent overfitting—where the model performs well on training data but poorly on new, unseen data. In traffic safety applications, improving the representation of critical but less frequent events (like accidents) is essential for accurate risk prediction and conflict detection. Bach et al. [15] found that combining SMOTE with machine learning algorithms such as RandomForest significantly improves performance on imbalanced datasets. RandomForest, known for its robustness in handling complex data, benefits from SMOTE's ability to balance the dataset, leading to better predictive accuracy. However, while SMOTE enhances the classifier's ability to learn from imbalanced data, it is crucial to ensure that the probability distributions between the training and test datasets remain consistent. If the synthetic data generated by SMOTE deviates too much from the original data's distribution, the model may overfit to the training data, reducing its ability to generalize to new data. This consistency is vital for ensuring that the model's predictions remain reliable and accurate when applied to real-world traffic scenarios.

Thus, while SMOTE is a powerful tool in addressing class imbalance, its application must be carefully managed to ensure that it improves classifier performance without introducing new biases or inaccuracies. Properly implemented, SMOTE can significantly enhance the effectiveness of traffic safety management systems by providing more accurate predictions of conflicts and risks.

3.1 Traffic Conflict Characteristics Recognition System in diversion and merging Zones

The data collection site, shown in Fig. 2, is located in Rabat City, Morocco. The Simulation of Urban MObility (SUMO) simulator was employed to extract extensive traffic data for Rabat, focusing on a variety of vehicle types over specified periods. This simulation environment provided a rich dataset that captures essential details about urban mobility, offering a detailed view of traffic dynamics within the city. The dataset included several key columns, such as dateandtime, which records the precise timestamp of each observation, vehid for vehicle identification, coord and gpscoord for spatial positioning, spd for vehicle speed, edge and lane for road network and lane-specific data, displacement indicating the distance traveled by each vehicle, and turnAngle, which reflects the vehicle's change in direction. Additionally, the nextTLS column documented the state of the next traffic light encountered by the vehicle, providing critical context for analyzing traffic flow.

From this detailed dataset, relevant features were systematically extracted to support the development of predictive models focused on traffic behavior. The extraction process included deriving temporal features from the dateandtime field, such as the hour of the day and the day of the week, which are crucial for identifying patterns in traffic congestion and flow over time. Vehicle speed, captured in the spd field, was another vital feature, offering insights into the movement dynamics of different vehicles throughout the urban landscape. Lane-specific data were extracted from the lane field, enabling an analysis of traffic behavior within individual lanes, which is essential for understanding lane utilization and congestion. The displacement field provided a measure of the distance each vehicle traveled, offering a direct indicator of mobility efficiency and potential bottlenecks within the city.

Moreover, the state of traffic lights, captured in the nextTLS field, was analyzed to understand its impact on vehicle movement and overall traffic flow. Beyond these basic features, additional data engineering techniques were applied to create new features that could further enhance the model's predictive accuracy. By leveraging both the original and engineered features, the model developed a more comprehensive understanding of traffic patterns in Rabat, ultimately contributing to more effective urban traffic management and planning. This systematic approach to data extraction and feature engineering from SUMO's rich dataset ensures a robust foundation for predictive modeling, aiding in the development of intelligent traffic management solutions tailored to the unique dynamics of Rabat's urban environment.

Table 1

Distribution of dataset
Type of vehicle	Number of instances	Mean instance per frame
Bus	1234	1.26
Truck	2415	2.46
Car	53083	4.06
Trolleybus	611	0.62
TRAM	1298	1.28
Moto	2783	2.83

3.2 Traffic data preprocessing method based on yolov8-deepsort

This study focuses on improving the prediction of traffic conflicts in complex areas like diversion and merging zones, where accurate analysis is often challenging. These zones are prone to accidents due to the high level of vehicle interaction, making precise traffic detection crucial. To address this, the research combines the SUMO simulator with advanced video analysis techniques, specifically YOLOv8 for vehicle recognition and DeepSORT for tracking.

The process begins with data collection in Table, anf Fig. 4 where roadside video footage is gathered to create a comprehensive dataset. This data includes images of various vehicle types under different traffic conditions, which are essential for training and testing the detection models.

Next, the YOLOv8 model is trained to accurately recognize different vehicles within the video footage. This model is refined through multiple training iterations to ensure high precision in identifying cars, buses, trucks, and other vehicles. The trained model is then evaluated based on its precision, recall, and overall accuracy.

Finally, DeepSORT is used to track the recognized vehicles across video frames. This algorithm predicts vehicle movements and updates tracking in real-time, ensuring that each vehicle's path is accurately monitored.

By integrating these techniques, the study enhances the detection of traffic conflicts at a micro-level, providing valuable insights into how vehicles interact in merging and diversion zones. This improved accuracy in traffic situation detection supports better traffic management and helps reduce the risk of accidents in these critical areas.

Figure 4 presents an integrated framework that combines advanced traffic simulation and machine learning techniques to enhance vehicle detection, tracking, and prediction. By utilizing SUMO (Simulation of Urban MObility) for generating synthetic traffic data, the framework employs YOLOv8 for accurate object detection and DeepSORT for robust tracking across frames. To address class imbalance in the dataset, SMOTE (Synthetic Minority Over-sampling Technique) is applied, ensuring balanced data for effective model training. The CatBoost algorithm is then used for classification and prediction, leveraging its ability to handle categorical data and optimize model performance. The framework is evaluated using a comprehensive set of metrics, including precision, recall, F1-score, and AUC-ROC, demonstrating its effectiveness in real-time traffic analysis and predictive modeling.

Figure 6. Illustrate the optimized traffic that the traffic simulation data from SUMO with YOLOv8-DeepSORT for precise vehicle tracking, CatBoost for conflict detection, and SMOTE for addressing sample imbalance. By optimizing performance with Bayesian methods and interpreting results with SHAP, the algorithm offers a comprehensive solution to enhance urban traffic safety, particularly in complex zones where traditional methods are less effective. This approach ensures accurate detection and management of traffic conflicts, leading to safer and more efficient urban traffic systems.

This paper primarily aims to enhance the accuracy of traffic situation detection from video images by leveraging roadside video data and applying computer vision and deep learning technologies. It constructs a traffic situation detection system to analyze traffic conflicts in diversion and merging zones at a micro-level, providing data support for detailed traffic conflict analysis. The study follows these specific steps:

S1. Data Collection: Collecting data is crucial for video vehicle recognition, traffic condition detection, traffic flow analysis, and vehicle speed measurement. To build the training dataset, vehicle images of various models are gathered. Additionally, roadside video data covering different congestion scenarios are collected to validate the algorithm’s accuracy.

S2. Vehicle Recognition: The YOLOv8 model is employed to train the vehicle image dataset, and the training weights are used for model validation. The YOLOv8s pre-trained network is adapted through transfer learning with 70 epochs of iterative training. The model's performance is evaluated using recall and precision metrics. Precision measures the ratio of correctly predicted samples to the total number of samples, while recall reflects the model’s ability to correctly identify positive examples. The mean Average Precision (mAP) is used to gauge overall accuracy across different vehicle categories. For instance, the accuracy for cars is 0.90, for buses is 0.98,, for Moto is 0.97, for TRAM is 0.98, and for trucks is 0.96, with an overall mAP of 0.95. The training and testing results, shown in Figs. 7 and 8, indicate model convergence. The final vehicle recognition model achieves 90.9% precision, 95.8% recall, and 91.7% average precision ([email protected]), meeting the requirements for late detection.

S3. Vehicle Target Tracking: The DeepSORT algorithm processes each video frame to generate detection frames, seal target attributes, and perform Kalman filtering. This method updates the tracking predictions and successfully tracks targets, as illustrated in Fig. 3.

4.1 CatBoost Model Overview

CatBoost is a powerful gradient boosting algorithm widely employed in supervised learning for classification and regression tasks[16]. Despite its complexity, CatBoost provides tools for interpretation, including feature importance analysis and SHAP values. The objective function in CatBoost consists of a loss function representing bias and a regularization term controlling model complexity. It is defined to optimize the predictive accuracy of the model. Several parameters, including the number of iterations, maximum depth of the tree, subsampling, learning rate, and regularization terms, need careful tuning to prevent overfitting and ensure optimal model performance.

The structure of a decision tree in CatBoost is similar to a tree containing root nodes, internal nodes, and leaf nodes. In CatBoost, the objective function is defined as:

$$\:Obj={\sum\:}_{i=1}^{n}l\left({y}_{i},{\widehat{y}}_{i}\right)+\frac{1}{2}\lambda\:{\sum\:}_{k=1}^{K}{\sum\:}_{j=1}^{{T}_{k}}{{\omega\:}_{kj}^{2}}_{}$$

where $\:\text{l}$ is the loss function, and $\:{\Omega\:}$ ωkj is the score of the jjj-th leaf in the kkk-th tree is the regularization term, which penalizes the complexity of the model to avoid overfitting. The regularization term is given by:

$$\:\begin{array}{c}\varOmega\:\left({f}_{k}\right)=\frac{1}{2}\lambda\:\sum\:_{j=1}^{T}{\omega\:}_{j}^{2}\end{array}$$

Here, $\:\lambda\:$ is the regularization parameter controlling the penalty strength, $\:\text{T}$ is the number of leaves in the tree, and $\:{\omega\:}_{j}$ is the weight (or score) of the j-th leaf. By solving the above equation, CatBoost iteratively improves the model by adding new trees that predict the residuals of previous trees.

This objective function guides CatBoost in iteratively building and refining trees to improve predictive accuracy while managing model complexity through regularization.

CatBoost is highly effective for traffic conflict recognition, a crucial task in intelligent transportation systems where timely and accurate identification of potential conflicts is essential for enhancing road safety. By leveraging its unique ability to handle categorical features directly, CatBoost can efficiently process diverse traffic data, including vehicle types, road conditions, and signal states, without extensive preprocessing. The algorithm’s ordered boosting technique helps prevent overfitting, ensuring reliable recognition even with complex and imbalanced traffic datasets. Additionally, CatBoost’s symmetric tree structures enable fast and accurate predictions, which are vital for real-time traffic conflict detection. Tuning parameters such as the learning rate, depth of the trees, and the number of iterations is essential to optimize the model’s performance, ensuring high accuracy in recognizing and predicting traffic conflicts. This makes CatBoost a robust and practical choice for developing advanced traffic conflict recognition systems, contributing to safer and smarter transportation networks.

The structure of a decision tree in CatBoost is similar to that of a traditional decision tree, comprising root nodes, internal nodes, and leaf nodes. In CatBoost, the objective function is designed to minimize the loss function while simultaneously controlling model complexity. This objective function combines a loss component, representing the error between the predicted and actual values, with a regularization term that penalizes overly complex models to prevent overfitting. The ordered boosting technique in CatBoost plays a crucial role in calculating the splits, which are determined by the gradient of the loss function, ensuring that each tree added to the model improves its predictive performance while maintaining generalizability. The careful balance of these elements in the objective function allows CatBoost to produce accurate and robust models for a wide range of tasks, including classification, regression, and traffic conflict recognition.

4.2 Traffic Conflict Detection Using CatBoost and Bayesian Optimization

This research leverages CatBoost to target the identification of traffic conflicts in diversion and merging zones. Trajectory data, encompassing features like ObjectID, ObjectType, TimeStamp, Speed, and FrameIndex, is utilized to delineate conflict scenarios. Given the imbalance between conflict and non-conflict instances, the Synthetic Minority Over-sampling Technique (SMOTE) is applied to augment the minority class samples. This strategy improves the model's capacity to detect conflicts by ensuring a more balanced dataset.

Hyperparameter tuning in this study is achieved using Bayesian optimization, which offers a more efficient and less computationally demanding alternative to traditional grid search methods[17]. By incorporating previous parameter information, this approach proves particularly effective for handling non-convex optimization problems. The model's performance is measured using key metrics such as accuracy, recall, and false alarm rate. For model interpretation, SHAP (Shapley Additive exPlanations) is employed, providing insights into the impact of various features on predictions through dependency and summary plots[18–30].

The overall traffic conflict recognition solution integrates several key components: feature engineering, CatBoost model development, SMOTE oversampling, and Bayesian optimization. This combination aims to build a robust classification model that enhances traffic management and safety by accurately detecting conflicts in ramp diversion and merging zones. Figure 6 illustrates the model's training process, highlighting the stages of data preprocessing, model training, and performance evaluation.

This methodical approach establishes a solid framework for traffic conflict recognition, utilizing advanced machine learning techniques to tackle challenges such as data imbalance and model interpretability. The resulting model not only boosts the accuracy of conflict detection but also provides deep insights for traffic safety management through comprehensive feature analysis and interpretation using SHAP. Ultimately, this framework is vital for identifying and mitigating traffic conflicts, contributing to safer and more efficient traffic flow in critical areas [31–40].

4.3 Improving Traffic Conflict Detection

In this work, CatBoost is applied to identify traffic conflicts in diversion and merging zones using trajectory data that includes features like ObjectID, ObjectType, TimeStamp, Speed, and FrameIndex to define conflict conditions. To address the imbalance between conflict and non-conflict samples, the Synthetic Minority Over-sampling Technique (SMOTE) is employed to oversample the minority class, enhancing the model's capacity to recognize conflicts more effectively.

Bayesian optimization is utilized for hyperparameter tuning, offering a more efficient and computationally lightweight alternative to grid search by leveraging previous parameter information, making it particularly suitable for non-convex optimization problems. The model's performance is assessed using metrics such as accuracy, recall, and false alarm rate. To interpret the model's predictions, SHAP (Shapley Additive exPlanations) is used, providing insights into the influence of various features on outcomes through dependency and summary plots [41–53].

The comprehensive traffic conflict recognition framework involves feature engineering, CatBoost model development, SMOTE oversampling, and Bayesian optimization. This structured approach is designed to improve traffic safety by accurately identifying conflicts in critical zones, such as ramps and merging areas[54–59]. Figure 6 illustrates the model's training workflow, including data preprocessing, model training, and evaluation.

By leveraging advanced machine learning techniques, this approach addresses challenges like data imbalance and model interpretability. The resulting model not only enhances conflict detection accuracy but also provides valuable insights for traffic safety management through detailed feature analysis with SHAP. Ultimately, this framework contributes to safer and more efficient traffic flow in critical areas.

5.1 Model Accuracy Analysis

In the analysis of risk prediction models, several key metrics are used to evaluate and interpret model performance, particularly in classification tasks. The confusion matrix is fundamental, summarizing the model's predictions into four categories: True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN). These components help in calculating various performance metrics.

Accuracy is one of the primary metrics, defined as the ratio of correctly predicted instances (both positive and negative) to the total number of predictions. While accuracy provides a general sense of model performance, it can be misleading in cases of imbalanced datasets.

$$\:\begin{array}{c}Accuracy=\frac{TP+TN}{TP+TN+FP+FN}\end{array}$$

To address this, Recall (also known as Sensitivity or True Positive Rate) is used to measure the model's ability to correctly identify positive instances, calculated as the ratio of TP to the sum of TP and FN. Precision (Positive Predictive Value) complements recall by evaluating the accuracy of the positive predictions, calculated as the ratio of TP to the sum of TP and FP.

$$\:Recall=\frac{TP}{TP+FN}$$

$$\:Precision=\frac{TP}{TP+FP}$$

The F1 Score combines precision and recall into a single metric by taking their harmonic mean, offering a balanced evaluation when dealing with imbalanced classes. Additionally, the False Positive Rate (FPR) measures the proportion of actual negatives that are incorrectly predicted as positives, while the True Negative Rate (TNR), or Specificity, captures the proportion of actual negatives correctly identified.

$$\:F1Score=\frac{2*Precision\times\:Recall}{Precision+Recall}$$

Area Under the Curve (AUC) for the Receiver Operating Characteristic (ROC) curve is another critical metric, indicating the model's ability to distinguish between positive and negative classes. A higher AUC signifies better model performance across different thresholds. Where FPR is the proportion of actual negatives that are incorrectly predicted as positives.

$$\:FPR=\frac{FP}{FP+TN}$$

$$\:Aux=\underset{0}{\overset{1}{\int\:}}Recal\left(FPR\right)d\left(FPR\right)$$

Beyond these, Matthews Correlation Coefficient (MCC) provides a balanced measure that considers all four elements of the confusion matrix, and is particularly useful for evaluating binary classifiers, even in the case of imbalanced datasets. The formula is given by:

Additionally, Cohen’s Kappa measures the agreement between predicted and actual labels, adjusting for agreement occurring by chance. It is particularly relevant in multi-class classification scenarios.

These metrics, collectively, provide a comprehensive framework for analyzing model accuracy and reliability, offering insights into both overall performance and specific aspects of prediction quality, especially in the context of risk prediction where both false positives and false negatives can have significant consequences.

Performance metrics derived from the confusion matrix revealed promising results: an accuracy of approximately 96.63%, a recall rate of about 94.58%, and an F1 Score of approximately 95.26%. Furthermore, the model's AUC (Area Under the ROC Curve) analysis, yielding a value of 0.95, indicated its superior classification ability, with a high recall and a low false positive rate, as illustrated in Fig. 5–8.

Despite the promising results, this study has several limitations. Firstly, the data collection was limited to specific locations and times, which may not fully represent the variety of traffic conditions encountered in different environments. Additionally, the models were trained and validated on a limited dataset, and further validation on larger and more diverse datasets is necessary to ensure generalizability.

Another limitation is the reliance on camera-based data collection, which can be influenced by environmental factors such as lighting and weather conditions. Future research could explore the integration of other sensor data, such as LiDAR and radar, to enhance the robustness of the data collection process.

Future work should also consider the development of real-time traffic conflict detection and risk assessment systems. Implementing these models in real-time traffic management systems could provide immediate benefits in terms of traffic safety and efficiency. Additionally, the interpretability of the models could be further enhanced by integrating domain knowledge and expert feedback into the SHAP analysis, providing more actionable insights for traffic management authorities.

In this research, a comprehensive approach was employed to tackle urban traffic challenges, integrating YOLOV8-DeepSORT for preprocessing, the development of a vehicle trajectory database, CatBoost-based conflict detection, and enhanced modeling through Bayesian-CatBoost and SMOTE techniques. The findings indicate that the vehicle target detection model achieves an accuracy of 96.63% and a recall of 88.86%, while the conflict detection model boasts an accuracy of approximately 95.26%, a recall of around 84.48%, and an F1 score of about 95.63%. The application of SHAP analysis provides valuable insights into safety risk factors and deepens the understanding of traffic conflicts. This study offers a practical solution to traffic safety issues in diverging and merging zones and makes a significant contribution to the field of Intelligent Transportation Systems.

1. SUMO

Simulation of Urban MObility

2. IoT

Internet of Things

3. UAV

Unmanned Aerial Vehicle

4. YOLOv8

You Only Look Once version 8

5. DeepSORT

Deep Simple Online and Realtime Tracking

6. CatBoost

Cat Boosting

7. SHAP

Shapley Additive exPlanations

8. SMOTE

Synthetic Minority Over-sampling Technique

9. SVM

Support Vector Machine

10. TTC

Time to Collision

11. PET

Post Encroachment Time

12.AUC

Area Under the Receiver Operating Characteristic Curve

13. PR

Precision-Recall

Ethics approval and consent to participate:

Not applicable.

Consent for publication:

Not applicable.

Competing interests:

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding:

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author Contribution

All authors reviewed the manuscript

10 Acknowledgments

We would like to thank the all professors of Sidi Moahammed ben abdellah University.

Availability of data and materials:

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

Editorial Department of China Journal of Highway and Transport (2022) Review of Academic Research on China's Traffic Tunnel Engineering in 2022. China J Highway Transp 35(04):1–40
KLEIN Lawrence A (2001) Sensor Technologies and Data Requirements for ITS
XIAO Yao. Application Research on Intelligent Transportation Information Acquisition and Fusion Technology, Master's thesis, East China Jiaotong University, (2011)
GAO Yijia (2017) Research on Application of Vehicle Detection and Perception Technology, Engineering Technology Research, no. 8, pp. 56–57
Yi SU, Cong-Ying LI, Jia-Wei ZHANG et al (2021) Characteristic Analysis and Behavior Prediction Method of Converging Behavior in Urban Expressway Weaving Section, International Conference on Smart Transportation and City Engineering. October 26–28, 2021, Academic Exchange Information Centre (AEIC)
Jinghong GAO (2012) YANG Yimin. Review of Road Traffic Vehicle Detection Technology and Development, Highway Traffic Technology, no. 01, pp. 116–119
Shunying ZHU, Ruoxi JIANG, Hong WANG et al (2020) Review of Research on Motor Vehicle Traffic Conflict Technology. China J Highway Transp 33(02):15–33
Yanyong GUO, Pan LIU, Yao WU et al (2018) Bayesian Traffic Conflict Model Accounting for Heterogeneity. China J Highway Transp 31(4):296–303
PART D. Highway Safety Manual, American Association of State Highway and Transportation Officials: Washington, DC, USA, (2010) 19192
Ase SVENSSON (1998) A Method for Analyzing the Traffic Process in a Safety Perspective. Lund Institute of Technology, Sweden
Zhang CHENXIAO, Ma YONGFENG (2018) YANAN Yu. Traffic Safety Evaluation Method of Arterial Highway in Suburban Areas Based on Traffic Conflicts, CICTP 2018 Intelligence, Connectivity, and Mobility. Proceedings of the 18th COTA International Conference of Transportation Professionals, pp. 2020–2029
Paul ST-AUBIN, Nicolas SAUNIER, Luis MIRANDA-MORENO (2015) Transp Res Part C: Emerg Technol 58:363–379Large-Scale Automated Proactive Road Safety Analysis Using Video Data,
Zongyuan SUN, Yuren CHEN, Pin WANG et al (2021) Vision-Based Traffic Conflict Detection Using Trajectory Learning and Prediction. IEEE Access 9:34558–34569
Rok BLAGUS, Lara LUSA (2013) BMC Bioinformatics 14:1–16SMOTE for High-Dimensional Class-Imbalanced Data,
Malgorzata BACH, Aleksandra WERNER et al (2017) ŻYWIEC J,. The Study of Under-and Over-Sampling Methods’ Utility in Analysis of Highly Imbalanced Data on Osteoporosis, Information Science, vol. 384, pp. 174–190
Tianqi CHEN, Carlos GUESTRIN (2016) XGBoost: A Scalable Tree Boosting System, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794
Jasper SNOEK, Hugo LAROCHELLE (2012) ADAMS Ryan P %J Advances in neural information processing systems. Practical bayesian Optim Mach Learn algorithms[J], 25
LUNDBERG Scott M, Su-In LEE (2017) %J Advances in neural information processing systems. A unified approach to interpreting model predictions[J], 30.are historical analysis of urban formation, landscape
Road Traffic Injuries Accessed: Jul. 18, 2018. [Online]. Available: http://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries
Zong F, Xu H, Zhang H (2013) ``Prediction for traffic accident severity: Comparing the Bayesian network and regression models,'' Math. Problems Eng., vol. nos. 23, 2013, Art. no. 475194
Shiran G, Imaninasab R, Khayamim R (2021) Crash Severity Analysis of Highways Based on Multinomial Logistic Regression Model, Decision Tree Techniques, and Artificial Neural Network: A Modeling Comparison. Sustainability 13:5670
Abdel-Aty M (2003) Analysis of driver injury severity levels at multiple locations using ordered probit models. J Saf Res 34:597–603
Sze NN, Wong SC (2007) Diagnostic analysis of the logistic model for pedestrian injury severity in traffic crashes. Accid Anal Prev 39:1267–1278
Savolainen PT, Mannering F, Lord D, Quddus MA (2011) The statistical analysis of highway crash-injury severities: A review and assessment of methodological alternatives. Accid Anal Prev 43:1666–1676
Moghaddam FR, Afandizadeh S, Ziyadi M Prediction of accident severity using artificial neural networks. Int J Civ Eng 2011, 9, 41–48
Taamneh M, Alkheder S, Taamneh S (2017) Data-mining techniques for traffic accident modeling and prediction in the United Arab Emirates. J Transp Saf Secur 9:146–166 [CrossRef]
Zheng M, Li T, Zhu R, Chen J, Ma ZF, Tang MJ, Cui ZQ, Wang Z (2019) Traffic Accident’s Severity Prediction: A Deep-Learning Approach-Based CNN Network. IEEE Access 7:39897–39910
Breiman L (2001) Random forests. Mach Learn 45:5–32
Lu Z, Long Z, Xia J, An C (2019) A Random Forest Model for Travel Mode Identification Based on Mobile Phone Signaling Data. Sustainability 11:5950
Evans J, Waterson B, Hamilton A (2019) Forecasting road traffic conditions using a context-based random forest algorithm. Transp Plan Technol 42:554–572
Hamad K, Al-Ruzouq R, Zeiada W, Abu Dabous S, Khalil MA (2020) Predicting incident duration using random forests. Transp A-Transp Sci 16:1269–1293 [CrossRef]
Macioszek E (2020) Roundabout Entry Capacity Calculation—A Case Study Based on Roundabouts in Tokyo, Japan, and Tokyo Surroundings, vol 12. Sustainability, p 1533
Severino A, Pappalardo G, Curto S, Trubia S, Olayode IO (2021) Safety Evaluation of Flower Roundabout Considering Autonomous Vehicles Operation. Sustainability 13:10120
Macioszek E The Comparison of Models for Critical Headways Estimation at Roundabouts. In Proceedings of the 13th Scientific and Technical Conference on Transport Systems. Theory and Practice (TSTP), Katowice, Poland, 19–21 September 2016
Karpathy G, Toderici S, Shetty T, Leung R, Sukthankar, FeiFei L (2014) ``Large-scale video classication with convolutional neural networks,'' in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. p. 17251732
Lawrence S, Giles CL, Tsoi AC, Back AD (1997) ``Face recognition:A convolutional neural-network approach,'' IEEE Trans. Neural Netw.,vol. 8, no. 1, p. 98113, Jan
Karlaftis MG, Vlahogianni EI (2011) ``Statistical methods versus neural networks in transportation research: Differences, similarities and some insights,'' Transp. Res. C, vol. 19, no. 3, p. 387399, Jun
Al-Ghamdi S (2002) ``Using logistic regression to estimate the inuence of accident factors on accident severity,'' Accident Anal. Prevention, vol. 34, no. 6, p. 729741
Bédard M, Guyatt GH, Stones MJ, Hirdes JP (2002) ``The independent contribution of driver, crash, and vehicle characteristics to driver fatalities,'' Accid Anal. Prevention, vol. 34, no. 6, p. 717727
Kockelman KM, Kweon YJ (2002) ``Driver injury severity: An application of ordered probit models,'' Accident Anal. Prevention, vol. 34, no. 3,p. 313321
Mohamed AbdElAziz K, Ahmed E-MA, El-Mahdy Kholoud Osama Shata Kholoud Osama Shata, Walid Gomaa Walid Gomaa, System and Method for During Crash Accident Detection and Notification. Patent Number : 2020/771, Filing Date: 9 June 2020, Filing Place: Egypt.
Mohamed AbdElAziz Khamis A, El-Mahdy K Osama Shata. An in-Vehicle System and Method for During Accident Detection without being Fixed to Vehicle. Patent Number : 2020/769, Filing Date: 9 June 2020, Filing Place: Egypt.
Mohamed A, Khamis (2014) Walid Gomaa, Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework. Eng Appl Artif Intell 29 March 134–151. https://doi.org/10.1016/j.engappai.2014.01.007
Mohamed AbdElAziz Khamis W, Gomaa, Enhanced Multiagent Multi-Objective Reinforcement Learning for Urban Traffic Light Control (2012) Dec.,, Proc. of the 11th IEEE International Conference on Machine Learning and Applications (ICMLA 2012), Boca Raton, Florida, USA, 12–15 pp. 586–591
Mohamed A, Khamis W, Gomaa H, El-Shishiny (2012) Multi-Objective traffic light control system based on Bayesian probability interpretation, Proc. of 15th IEEE Intelligent Transportation Systems Conference (ITSC 2012), Anchorage, Alaska, USA, 16–19 Sept. pp. 995–1000
Zhang XG (2000) Introduction to statistical learning theory and support vector machines. Acta Automat Sinica 26:32–41
Yuan F, Cheu RL (2003) Incident detection using support vector machines. Transp Res Part C Emerg Technol 11:309–328
Sharma B, Katiyar VK, Kumar K et al (2016) Traffic accident prediction model using support vector machines with Gaussian Kernel. In: Pant M, Deep K, Bansal JC, (eds)Proceedings of fifth international conference on soft computing for problem solving, vol. 437. Singapore: Springer pp.1–10
Flores MJ, Armingol JM, de la Escalera A (2010) Real-time warning system for driver drowsiness detection using visual information. J Intell Robot Syst 59:103–125
Delen D, Sharda R, Bessonov M (2006) Identifying significant predictors of injury severity in traffic accidents using a series of artificial neural networks. Acc Anal Prevent 38:434–444
Alikhani M, Nedaie A, Ahmadvand A (2013) Presentation of clustering-classification heuristic method for improvement accuracy in classification of severity of road accidents in Iran. Safe Sci 60:142–150
Lee S-L (2015) Predicting traffic accident severity using classification techniques. Adv Sci Lett 21:3128–3131
Ez-zahout A (2021 Vol) A distributed big data analytics model for people re-identification based dimensionality reduction. Int J High Perform Syst Archit 10(2):57–63
Haruna Chiroma, Shafi’i M, Abdulhamid, Ibrahim AT, Hashem, Kayode S, Adewole, Absalom E Ezugwu, Saidu Abubakar,and Liyana Shuib, Deep Learning-Based Big Data Analytics for Internet of Vehicles: Taxonomy, Challenges, and Research Directions, Hindawi Mathematical Problems in Engineering 2021. Article ID 9022558, 20 pages, https://doi.org/10.1155/2021/9022558
Asmae Rhanizar ZE, Akkaoui, A Predictive Framework of Speed Camera Locations for Road Safety (2019), Computer and Information Science, Vol. 12, No. 3; https://doi.org/10.5539/jmr.v12n3p92
Salma Bouaich, Mahraz MA, Riffi J et al (2021) Vehicle Counting Based on Road Lines. Pattern Recognit Image Anal 31:739–748. https://doi.org/10.1134/S1054661821040076
Bouti A, Mahraz MA, Riffi J et al (2020) A robust system for road sign detection and classification using LeNet architecture based on convolutional neural network. Soft Comput 24:6721–6733. https://doi.org/10.1007/s00500-019-04307-6
Bouaich S, Mahraz MA, Riffi J, Tairi H (2019) Vehicle Detection using Road Lines, 2019 3rd International Conference on Intelligent Computing in Data Sciences, ICDS 2019, 8942305
Bouaich S, Mahraz MA, Rifii J, Tairi H Vehicle counting system in real-time, 2018 International Conference on Intelligent Systems and Computer Vision, ISCV 2018, 2018, 2018-May, pp. 1–4

No competing interests reported.

Download PDF

Reviews received at journal
18 Nov, 2024
Reviews received at journal
05 Nov, 2024
Reviewers agreed at journal
05 Nov, 2024
Reviewers agreed at journal
31 Oct, 2024
Reviewers agreed at journal
31 Oct, 2024
Reviewers agreed at journal
24 Sep, 2024
Reviews received at journal
23 Sep, 2024
Reviewers agreed at journal
23 Sep, 2024
Reviewers invited by journal
23 Sep, 2024
Editor assigned by journal
28 Aug, 2024
Submission checks completed at journal
28 Aug, 2024
First submitted to journal
27 Aug, 2024

You are reading this latest preprint version

Enhancing Traffic Safety with Advanced Machine Learning Techniques and Intelligent Identification

Status:

Version 1

Abstract

Figures

1 Introduction

2 Literature Review

2.1 Urban Traffic Safety through Advanced Data Collection and Processing Methods

2.2 Traffic Conflict Detection and Risk Prediction Technology

2.3 Oversampling Technique on Imbalanced Dataset

3 Diversion and merging Zones traffic conflict key features identification technology

3.1 Traffic Conflict Characteristics Recognition System in diversion and merging Zones

3.2 Traffic data preprocessing method based on yolov8-deepsort

4 Traffic Conflict Recognition Based on CatBoost Models

4.1 CatBoost Model Overview

4.2 Traffic Conflict Detection Using CatBoost and Bayesian Optimization

4.3 Improving Traffic Conflict Detection

5 Risk Prediction Model Result Analysis

5.1 Model Accuracy Analysis

6 Discussion

7 Conclusion

Abbreviations

Declarations

Competing interests:

Funding:

Author Contribution

10 Acknowledgments

Availability of data and materials:

References

Additional Declarations

Status:

Version 1