The surface quality of steel directly impacts product performance, necessitating efficient defect detection techniques. However, existing methods often lack sufficient generalisation capability, struggle with detecting fuzzy small defects, and are computationally intensive. To address these issues, this study proposes an optimisation method for the YOLOv7 model. The approach involves the unif ied processing of multi-source datasets, the integration of the Swin Transformer module, the application of the ACmix attention mechanism to enhance small defect detection, and the use of GSConv convolution to reduce model complexity. Experimental results demonstrate that the optimised YOLOv7 model achieves a 5.9% increase in average accuracy, with detection accuracy for Ss and Cr defects improving by 13.9% and 16.3%, respectively. These enhancements significantly boost the model’s generalisation ability and detection accuracy, offering a more reliable solution for steel surface defect detection.