Cardiovascular diseases remain the foremost global cause of mortality, necessitating timely and accurate diagnosis. Auscultation, relying on a physician's expertise and a stethoscope, stands as the primary diagnostic tool for cardiovascular disorders. However, its inherent subjectivity necessitates the development of an efficient clinical support system capable of transforming this subjective process into a computerized and proficient method. In real-world clinical settings, auscultation sounds frequently become entangled with ambient noise, demanding the implementation of an effective denoising technique followed by a robust classification model to ensure accurate categorization. In this research paper, we present an innovative preprocessing technique that harnesses the Variational Mode Decomposition (VMD) method to effectively denoise heart sounds. Subsequently, the denoised sound signals undergo processing through a Gammatone filter bank and Short-Time Fourier Transform (STFT) to generate time-frequency distributions in the form of Gammatonegram images and Spectrogram images. To tackle the challenges associated with imbalanced datasets, we incorporate a data augmentation method during the image processing phase. These images are then subjected to classification using various deep convolutional neural network architectures grounded in transfer learning principles, specifically CNN models, including AlexNet, SqueezeNet, GoogLeNet, and VGG19, to mitigate model overfitting.Our experimental results undergo rigorous validation using the publicly accessible PhysioNet 2016 dataset. Notably, our proposed methodology, particularly when leveraging Gammatonegram images, demonstrates highly promising results. These outcomes underscore the considerable clinical potential of our approach, particularly in the context of detecting imbalanced and noisy heart sound signals, ultimately contributing to the enhancement of cardiovascular disease diagnosis.