TThis chapter presents the results from data collected exclusively for intersection areas using DBSCAN algorithms to capture vehicle approaches, traversals, and exits. Based on this data, detailed CO₂ emissions and energy consumption models were developed and validated for passenger vehicles.
3.1. Data selection for intersection areas
In the process of selecting data for intersection areas, the first step involved applying the DBSCAN algorithm to cluster data points based on their geographical locations. Prior to clustering, geographic coordinate data (latitude and longitude) were standardized using the StandardScaler method. This standardization normalized the data, eliminating the effects of scale differences among the coordinates. Subsequently, experiments were conducted with various parameter values for the DBSCAN algorithm, including eps (neighborhood radius) and min_samples (minimum number of points required to form a cluster). The testing of these parameters aimed to identify the optimal settings for extracting significant clusters corresponding to intersections within the study area. The use of scaled coordinates in combination with different parameters facilitated the visualization and analysis of results, which is crucial for the subsequent stage of data selection and analysis for intersection areas. The results of the analyzes for different values of the neighborhood radius and minimum number of points for the route studied are presented in Fig. 6.
Figure 7 illustrates the optimal predictions for the data groups that denote intersections along the study road segment. This chart uses the analysis parameters eps = 0.1 and min_samples = 20, which accurately identified 8 intersections, corroborating real-world data. The cluster centroids were determined on the basis of the average geographical coordinates (latitude and longitude) of the points within each identified cluster by the DBSCAN algorithm. Initially, all points assigned to a specific cluster by the algorithm were gathered, and then the average values of their geographical coordinates were computed. These resultant values represent the centroids of the clusters, marking the central locations of the intersections in the analyzed area. The centroids obtained, depicted on the graph as red crosses, serve as reference points for further analyses related to vehicle emissions and energy consumption within these designated areas.
On the chart, the points labeled 0, -1 and other numbers represent different assignment of groups made by the DBSCAN algorithm. The points labeled 0 belong to the first identified cluster, indicating that DBSCAN considered them part of a dense data region forming a cluster. The algorithm sequentially assigns numbers to clusters, starting from 0, indicating the first group of points that meets the density criteria. Points labeled as -1 are classified as noise or outliers. These points were not assigned to any cluster because they did not have a sufficient number of neighboring points within the specified eps distance to be considered part of a cluster. The − 1 label indicates that these points are isolated or do not fit into any of the dense regions identified. This classification allows for visualizing how DBSCAN differentiates between clustered points and those deemed as noise.
The centers of the intersection areas analyzed are the points of maximum accumulation of start-stop operations for passing vehicles. At these locations, internal combustion vehicles generate the highest levels of exhaust emissions, contributing significantly to the environmental pollution around these road arteries. Pedestrians in these areas, especially near crosswalks, are particularly exposed to exhaust emissions.
3.2. Creation and validation of CO2 emission and energy consumption models for motor vehicles
The developed method for predicting intersection locations enables rapid data segregation, which can then be used, for example, to create more accurate models to predict CO2 emissions from internal combustion engine vehicles and energy consumption for electric vehicles.
To predict CO₂ emissions and energy consumption based on variables V and a, a comprehensive process of model development and validation was carried out. Initially, data uploaded to Google Colab were processed and saved as a separate CSV file, including information on emissions or energy consumption depending on the type of vehicle powertrain, as well as vehicle speed and acceleration data. A sample view of the data and its columns used for model training is presented in Fig. 8.
Initially, the data were cleaned to remove invalid and missing values, and then converted to a numerical format. After data preparation, the data set was split into training and test sets in a 70:30 ratio. Standardization of input variables was applied to ensure comparability in scale. Five regression models were trained: Linear regression, LASSO, Ridge, Random Forest, and XGBoost. Each model was evaluated using the mean squared error (MSE) and the coefficient of determination (R2). The coefficient of determination measures how well a regression model fits the data. It is defined as (1):
$$\:{R}^{2}=1-\frac{\sum\:_{t=1}^{n}\left({y}_{t}-\widehat{y}{\:}_{t}\right){\:}^{2}}{\sum\:_{t=1}^{n}\left({y}_{t}-\stackrel{-}{y}\right){\:}^{2}}$$
1
where:
yt – the observed actual values,
\(\:\widehat{y}{\:}_{t}\) – are the values predicted by the model,
\(\:\stackrel{-}{y}\) – is the mean of the actual values,
n – the number of observations.
The Mean Squared Error (MSE) measures the average of the squared differences between the actual and predicted values. It is defined as (2):
$$\:MSE=\frac{1}{n}\sum\:_{t=1}^{n}\left({y}_{t}-\widehat{y}{\:}_{t}\right){\:}^{2}$$
2
where:
yt – the observed actual values,
\(\:\widehat{y}{\:}_{t}\) – are the values predicted by the model,
n – the number of observations.
MSE measures how large the prediction errors are. Smaller MSE values indicate a better fit of the model as they represent smaller differences between actual and predicted values.
The data for vehicles were aggregated into larger groups according to the EURO emission standard and further categorized according to the type of fuel used. Consequently, 10 groups were created, for which emission and energy consumption models for intersection areas were developed and subsequently validated. The validation results are presented in Table 3.
Table 3
Validation results of the obtained models by vehicle emission class
Vehicle Type | Validation Metric | Linear Regression | LASSO | Ridge | Random Forest | XGBoost |
EURO2 (gasoline) | MSE | 0.29 | 0.22 | 0.2 | 0.16 | 0.18 |
R² | 0.62 | 0.55 | 0.62 | 0.71 | 0.67 |
EURO3 (gasoline) | MSE | 0.22 | 0.13 | 0.12 | 0.09 | 0.08 |
R² | 0.75 | 0.72 | 0.74 | 0.82 | 0.85 |
EURO3 (LPG) | MSE | 0.21 | 0.12 | 0.11 | 0.08 | 0.07 |
R² | 0.78 | 0.76 | 0.77 | 0.82 | 0.84 |
EURO3 (CNG) | MSE | 0.21 | 0.12 | 0.11 | 0.08 | 0.07 |
R² | 0.8 | 0.77 | 0.78 | 0.82 | 0.85 |
EURO4 (gasoline) | MSE | 0.11 | 0.11 | 0.11 | 0.07 | 0.06 |
R² | 0.82 | 0.8 | 0.81 | 0.85 | 0.88 |
EURO5 (gasoline) | MSE | 0.11 | 0.11 | 0.1 | 0.07 | 0.05 |
R² | 0.83 | 0.81 | 0.82 | 0.86 | 0.9 |
EURO6 (gasoline) | MSE | 0.22 | 0.13 | 0.13 | 0.08 | 0.06 |
R² | 0.8 | 0.78 | 0.79 | 0.85 | 0.9 |
EURO6 (Diesel) | MSE | 0.24 | 0.15 | 0.14 | 0.09 | 0.07 |
R² | 0.78 | 0.76 | 0.77 | 0.8 | 0.83 |
EURO6 (Hybrid) | MSE | 0.13 | 0.14 | 0.13 | 0.09 | 0.06 |
R² | 0.79 | 0.77 | 0.78 | 0.82 | 0.88 |
EV (energy) | MSE | 7.33 | 7.43 | 7.34 | 0.94 | 0.4 |
R² | 0.74 | 0.73 | 0.74 | 0.97 | 0.99 |
Table 3, which presents the validation results of the regression models for different types of vehicles, including electric vehicles (EV), reveals significant differences in the effectiveness of predicting CO2 emissions and energy consumption. The regression models analyzed include linear regression, LASSO, Ridge, Random Forest, and XGBoost. The results indicate that XGBoost achieves the best performance across both quality metrics: Mean square error (MSE) and coefficient of determination (R²). For electric vehicles, the XGBoost model achieved the lowest MSE of 0.40 and the highest R² of 0.99, demonstrating exceptional precision and explanatory power for data variability. In comparison, other models, such as linear regression and LASSO, showed considerably lower performance in terms of both MSE and R². Therefore, XGBoost is recommended to predict CO2 emissions and energy consumption for electric vehicles, and this technique was selected for further analysis. However, consideration should also be given to optimizing other models, such as Random Forest, which also yielded promising results.
3.3. Use of developed models for prediction of CO2 and energy consumption
This chapter details the use of models developed to predict CO₂ emissions and energy consumption for electric vehicles (EVs). These models, grouped by emission class, were applied to verify their accuracy for infrastructure assessments. Comparative CO₂ emission maps and cumulative values were generated. Additionally, a general microscale CO₂ emission model was created using data from the entire road segment to serve as a reference for validating the intersection-specific model.
This approach enables comparison between real-world data and both the general and intersection models, helping to identify potential deviations and assess the applicability of general models to the unique traffic conditions at intersections.
3.3.1. Generation of CO2 emission maps and comparison of predictive capabilities of models
The developed models for CO₂ emissions and energy consumption allow not only the prediction of CO₂ values and vehicle energy consumption, but also the generation of maps that indicate their points of origin. This is particularly important in traffic management strategies, such as at intersections with traffic lights. In this way, potential areas with the highest accumulation of emissions can be identified, which directly impacts pedestrians near these road arteries. An example visualization of emissions for a group of vehicles that meet the EURO 5 standard is shown in Fig. 9.
To compare the methods of creating emission models and further validate the obtained results, an additional model was created based on all aggregated data from the road tests, referred to as the "city micro model." The model based solely on data from intersections was named the "intersection micro model."
Figure 9 identifies the areas with the highest CO₂ emissions, with Intersection 1 and the approach to Intersection 2 showing the most emissions due to frequent start-stop operations. Both microscale models generally align with the actual emission hotspots, but underestimations occur in areas with the highest emissions. Cumulative emissions for the road section reveal a 20% difference between modeled and actual values, which could scale in larger studies. Converting these results to emission factors in g/km would yield similar discrepancies.
Table 4 compares actual emissions with results from the microscale intersection model, the full city route model, and the COPERT program. COPERT, widely used for environmental analyses in Europe, estimates vehicle emissions based on factors like vehicle type, fuel, and engine technology [37, 38]. For this study, COPERT used the average speed over a 1.05 km section for comparison.
Table 4
Comparison of sum of real CO2 emissions and energy consumption, values for intersection micro, city micro and COPERT program models
Vehicle Type | Real world | Intersection micro model | City micro model | COPERT |
EURO2 (gasoline) | 220 g | 218 g | 170 g | 201 g |
EURO3 (gasoline) | 200 g | 198 g | 160 g | 185 g |
EURO3 (LPG) | 185 g | 183 g | 150 g | 170 g |
EURO3 (CNG) | 180 g | 178 g | 145 g | 165 g |
EURO4 (gasoline) | 190 g | 188 g | 155 g | 175 g |
EURO5 (gasoline) | 183 g | 182 g | 138 g | 168 g |
EURO6 (gasoline) | 160 g | 159 g | 120 g | 150 g |
EURO6 (Diesel) | 150 g | 149 g | 121 g | 142 g |
EURO6 (Hybrid) | 110 g | 109 g | 85 g | 100 g |
EV (energy) | 210 Wh | 195 Wh | 210 Wh | - |
Table 4 presents a comparison of carbon dioxide (CO₂) emissions and energy consumption for various types of vehicles under different scenarios, including real-world driving conditions, the micro-scale intersection model, the micro-scale city model and COPERT projections.
The data from the microscale intersection model yield almost identical emission results to the actual data. In this case, road test data were not used to train the intersection micro-model. Models specifically developed for comparative purposes for urban areas show discrepancies of approximately 20–23% compared to real-world data. The values obtained from the COPERT model also tend to underestimate CO₂ emissions.
3.3.2. Use for Vissim simulation - prediction for electrification of vehicles of the future fleet
Since speed and acceleration were chosen as explanatory variables, the developed models exhibit a high degree of versatility in their applications. These models can be utilized with any new real-world data, but they can also be applied in simulation scenarios. One such software that enables traffic modeling is Vissim. Vissim is advanced software for microscopic traffic modeling that allows for the simulation and analysis of driver behavior and traffic flow under various road conditions [39, 40]. It is used to model traffic in cities, on highways, at intersections, roundabouts, and to analyze transportation infrastructure, including public transport systems, pedestrians, and cyclists.
Vissim enables a highly accurate representation of real traffic by modeling individual vehicles that behave similarly to actual drivers. The software is particularly useful for evaluating the effectiveness of different engineering solutions and forecasting the impact of new infrastructure investments on traffic flow and pollutant emissions. Figure 10 illustrates an example of using the energy consumption model developed for electric vehicles in a simulation context.
The use of intersection models in Vissim software, along with the incorporation of various types of vehicles, such as those powered by electric engines, allows a more optimal planning of future road infrastructure and facilitates the analysis of traffic control strategies at such intersections. For this purpose, the CO2 emission models developed and the energy consumption model for electric vehicles can be used effectively. This approach allows an accurate estimation of vehicle emissions generation and energy consumption, as well as the recovery of energy from regenerative braking at and around intersections.