Ensuring global food security necessitates precise crop yield prediction for informed agricultural planning and resource allocation. We investigated the impact of temperature, rainfall, and pesticide application on crop yield using a comprehensive, multi-year, multi-region dataset. Our research rigorously compared, for the first time, the effectiveness of fifteen different algorithms encompassing both established machine learning and deep learning architectures, particularly Recurrent Neural Network (RNN), in constructing robust CYP models. Through rigorous experimentation and hyperparameter tuning, we aimed to identify the most optimal model for accurate yield prediction. We leveraged a comprehensive dataset encompassing various agricultural attributes, including geographical coordinates, crop varieties, climatic parameters, and farming practices. To ensure model effectiveness, we preprocessed the data, handling categorical variables, standardizing numerical features, and dividing the data into distinct training and testing sets.
The experimental evaluation revealed that Random Forest achieved the highest accuracy, with an impressive (R²=0.99). However, XGBoost offered a compelling trade-off with slightly lower accuracy (R²=0.98) but significantly faster training and inference times (0.36s and 0.02s, respectively), making it suitable for real-world scenarios with limited computational resources. While XGBoost emerged as the most efficient and accurate solution in this investigation, we also explored the potential of deep learning approaches, including RNNs, for crop yield prediction, paving the way for future research into even greater accuracy.