Brain-computer interfaces (BCIs) harness electroencephalographic signals for direct neural control of devices, offering significant benefits for individuals with motor impairments. Traditional machine learning methods for EEG-based motor imagery (MI) classification encounter challenges such as manual feature extraction and susceptibility to noise. This paper introduces EEGEncoder, a deep learning framework that employs modified transformers and Temporal Convolutional Networks (TCNs) to surmount these limitations. We propose a novel fusion architecture, named Dual-Stream Temporal-Spatial Block (DSTS), to capture temporal and spatial features, improving the accuracy of Motor Imagery classification task. Additionally, we use multiple parallel structures to enhance the model's performance. When tested on the BCI Competition IV-2a dataset, our proposed model outperforms the current state-of-the-art techniques.