Skip to main content

Open Access | Accepted manuscript on April 15, 2026

Optimized machine learning framework for crop yield prediction using climate and emission data

Abstract

The global food security challenge poses a significant risk due to climate change and rapidly growing greenhouse gas (GHG) emissions. In this study, an ensemble learning framework with an objective performance score function for crop yield prediction was developed using key climate variables and two major GHG emissions: carbon dioxide (CO2) and nitrous oxide (N2O). The main contribution of this study is the development of an ensemble learning with feature selection method called the Weighted Mutual Information with Standard Deviation Method (WMI_SDM) for a high-dimensional dataset, where the relevant predictive factors are selected. Several machine learning models, including Linear Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest (RF), and Extended Gradient Boost (XGBoost) were trained on features in WMI_SDM method with excellent results. The experimental results confirmed that the proposed approach outperformed other models, achieving a coefficient of determination (R2) of 0.9673, MAE of 226.71 kg ha-1, and MAPE of 11.33%. Furthermore, the proposed method showed remarkable computation efficiency, consuming only 281.99 MiB of memory while completing in 0.55 seconds

Graphical abstract
Download graphical abstract
Keywords
climate data, Greeen Gas Emission, Weighted Mutual Information, ensemble learning, Standard Deviation Threshold, Crop Yield