Hybrid time series models with exogenous variable for improved yield forecasting of major Rabi crops in India

Sci Rep. 2023 Dec 14;13(1):22240. doi: 10.1038/s41598-023-49544-w.


Accurate and in-time prediction of crop yield plays a crucial role in the planning, management, and decision-making processes within the agricultural sector. In this investigation, utilizing area under irrigation (%) as an exogenous variable, we have made an exertion to assess the suitability of different hybrid models such as ARIMAX (Autoregressive Integrated Moving Average with eXogenous Regressor)-TDNN (Time-Delay Neural Network), ARIMAX-NLSVR (Non-Linear Support Vector Regression), ARIMAX-WNN (Wavelet Neural Network), ARIMAX-CNN (Convolutional Neural Network), ARIMAX-RNN (Recurrent Neural Network) and ARIMAX-LSTM (Long Short Term Memory) as compared to their individual counterparts for yield forecasting of major Rabi crops in India. The accuracy of the ARIMA model has also been considered as a benchmark. Empirical outcomes reveal that the ARIMAX-LSTM hybrid modeling combination outperforms all other time series models in terms of root mean square error (RMSE) and mean absolute percentage error (MAPE) values. For these models, an average improvement of RMSE and MAPE values has been observed to be 10.41% and 12.28%, respectively over all other competing models and 15.83% and 18.42%, respectively over the benchmark ARIMA model. The incorporation of the area under irrigation (%) as an exogenous variable in the ARIMAX framework and the inbuilt capability of the LSTM model to process complex non-linear patterns have been observed to significantly enhance the accuracy of forecasting. The performance supremacy of other hybrid models over their individual counterparts has also been evident. The results also suggest avoiding any performance generalization of individual models for their hybrid structures.