Recent years have witnessed the remarkable progress of deep learning within the realm of scientific disciplines, yielding a wealth of promising outcomes. A prominent challenge within this domain has been the task of predicting enzyme function, a complex problem that has seen the development of numerous computational methods, particularly those rooted in deep learning techniques. However, the majority of these methods have primarily focused on either amino acid sequence data or protein structure data, neglecting the potential synergy of combining of both modalities. To address this gap, we propose a novel Contrastive Learning framework for Enzyme functional ANnotation prediction combined with protein amino acid sequences and Contact maps (CLEAN-Contact). We rigorously evaluated the performance of our CLEAN-Contact framework against the state-of-the-art enzyme function prediction model using multiple benchmark datasets. Our findings convincingly demonstrate the substantial superiority of our CLEAN-Contact framework, marking a significant step forward in enzyme function prediction accuracy.