The primary objective of this research was to enhance video surveillance through the frequent utilization of anomalous event recognition techniques by incorporating transfer learning for recognizing human activity. Every community was extremely concerned with ensuring individual security due to the growing varieties of activities that could cause injury, from malicious acts to accidents. Standard CCTV proved insufficient due to the cost of constant monitoring and the decreasing focus of human operators over time. Automated security systems with real-time anomalous event recognition were essential to solve these problems. In this paper, ResNet50, VGG19, EfficientNetB7, and ViT_b16 models were used. These models were specifically designed to recognize anomalous events in surveillance videos. To streamline video processing, a semantic key frame extraction algorithm based on action recognition was utilized to minimize the number of frames. The algorithm leveraged enhanced features to analyze real-time anomalous events such as arrests and assaults. The proposed method recognized the difficulty presented by the large volume of frames generated by surveillance videos, requiring effective processing methods. To address the challenge of processing big video data, advanced techniques for managing and analyzing extensive video datasets were incorporated. Including both abnormal and normal video during the training and testing phase, a large number of videos in the UCF-Crime dataset were utilized for model evaluation. EfficientNetB7 achieved 86.34% accuracy, VGG19 reached 87.90%, ResNet50 attained 90.46%, and ViT_b16 outperformed with 95.87% accuracy, with the transformer model (ViT_b16) achieving the best result. These findings illustrated the effectiveness of the proposed method in addressing the complexities of anomalous event recognition in video surveillance applications, particularly in handling the large frames generated by surveillance videos.