Global Positioning System data plays a crucial role in comprehending an individual's life due to its ability to provide geographic positions and timestamps. However, the large amount of spatio-temporal data generated, and the distinct spatial characteristics exhibited by different modes, poses challenges for learning transportation modes from the Global Positioning System trajectories. This paper introduces a novel approach for transportation mode identification by transforming Global Positioning System trajectory data into image representations and employing these images to train a neural network based on Vision Transformers architectures. The proposed method avoids segmenting or changing trajectories and directly extracts a set of features from the Global Positioning System trajectories. By mapping the trajectory features into pixel location generated using a dimensionality reduction technique, images are created for training a deep learning model to predict five transport modes. Experimental results demonstrate the approach effectiveness, achieving a state-of-the-art accuracy of 92.96% on the Microsoft GeoLife dataset. Moreover, it highlights the differences in the experiments conducted in each study. Additionally, a comparative analysis was performed regarding our proposal, contrasting a machine learning approach and other neural network architectures with this approach. The proposed method offers accurate and reliable transport mode detection applicable in real-world scenarios, facilitating a comprehensive understanding of individual's mobility.