Smart cities harness data and technology to enhance the sustainability and efficiency of urban areas and communities. Walking as a form of active travel is essential in promoting sustainable transport. It is thus crucial to accurately predict pedestrian crossing intention and avoid collisions, especially with the advent of autonomous and advanced driver-assisted vehicles. Current research leverages computer vision and machine learning advances to predict near-misses; however, this often requires high computation power to yield reliable results. In contrast, this work proposes a low-complexity ensemble-learning approach that employs contextual data for predicting the pedestrian's intent for crossing. Given a single scene, the pedestrian is first detected, and their image is then compressed using skeleton-isation, and contextual information is added into a stacked ensemble-learning approach. Our conducted experiments on different datasets achieve up to 15.8% gain in predictive accuracy compared to state-of-the-art approaches with 19.1% reduction in model complexity.