Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 31:11:1120.
doi: 10.3389/fpls.2020.01120. eCollection 2020.

Forecasting Corn Yield With Machine Learning Ensembles

Affiliations
Free PMC article

Forecasting Corn Yield With Machine Learning Ensembles

Mohsen Shahhosseini et al. Front Plant Sci. .
Free PMC article

Abstract

The emergence of new technologies to synthesize and analyze big data with high-performance computing has increased our capacity to more accurately predict crop yields. Recent research has shown that machine learning (ML) can provide reasonable predictions faster and with higher flexibility compared to simulation crop modeling. However, a single machine learning model can be outperformed by a "committee" of models (machine learning ensembles) that can reduce prediction bias, variance, or both and is able to better capture the underlying distribution of the data. Yet, there are many aspects to be investigated with regard to prediction accuracy, time of the prediction, and scale. The earlier the prediction during the growing season the better, but this has not been thoroughly investigated as previous studies considered all data available to predict yields. This paper provides a machine leaning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa) considering complete and partial in-season weather knowledge. Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions. The forecasts are made in county-level scale and aggregated for agricultural district and state level scales. Results show that the proposed optimized weighted ensemble and the average ensemble are the most precise models with RRMSE of 9.5%. Stacked LASSO makes the least biased predictions (MBE of 53 kg/ha), while other ensemble models also outperformed the base learners in terms of bias. On the contrary, although random k-fold cross-validation is replaced by blocked sequential procedure, it is shown that stacked ensembles perform not as good as weighted ensemble models for time series data sets as they require the data to be non-IID to perform favorably. Comparing our proposed model forecasts with the literature demonstrates the acceptable performance of forecasts made by our proposed ensemble model. Results from the scenario of having partial in-season weather knowledge reveals that decent yield forecasts with RRMSE of 9.2% can be made as early as June 1st. Moreover, it was shown that the proposed model performed better than individual models and benchmark ensembles at agricultural district and state-level scales as well as county-level scale. To find the marginal effect of each input feature on the forecasts made by the proposed ensemble model, a methodology is suggested that is the basis for finding feature importance for the ensemble model. The findings suggest that weather features corresponding to weather in weeks 18-24 (May 1st to June 1st) are the most important input features.

Keywords: US Corn Belt; corn yields; ensemble; forecasting; machine learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The trends of USDA yields in 2000-2016. 1 (A) corn yields per year for all counties. 1 (B) corn yields per year for Iowa counties.
Figure 2
Figure 2
Three-stage feature selection performed to select the independent variables with the most useful information. The number of features was decreased from 597 to 72.
Figure 3
Figure 3
Generating out-of-bag predictions with blocked sequential procedure.
Figure 4
Figure 4
X–Y plots of some of the designed models; Optimized weighted ensemble and Average ensemble made predictions closer to the diagonal line; the color intensity shows the accumulation of the data points.
Figure 5
Figure 5
Performance of ML models in predicting test observations from different years.
Figure 6
Figure 6
Evaluating machine learning ensembles when having partial in-season weather knowledge. The X-axis shows the in-season weather information from planting until June, July, August, September, or October.
Figure 7
Figure 7
Partial dependence plots (PDPs) of proposed optimized weighted average ensemble for some of the influential management and environment input features.

Similar articles

  • County-scale crop yield prediction by integrating crop simulation with machine learning models.
    Sajid SS, Shahhosseini M, Huber I, Hu G, Archontoulis SV. Sajid SS, et al. Front Plant Sci. 2022 Nov 28;13:1000224. doi: 10.3389/fpls.2022.1000224. eCollection 2022. Front Plant Sci. 2022. PMID: 36518505 Free PMC article.
  • Corn Yield Prediction With Ensemble CNN-DNN.
    Shahhosseini M, Hu G, Khaki S, Archontoulis SV. Shahhosseini M, et al. Front Plant Sci. 2021 Aug 2;12:709008. doi: 10.3389/fpls.2021.709008. eCollection 2021. Front Plant Sci. 2021. PMID: 34408763 Free PMC article.
  • Predictive performance of multi-model ensemble forecasts of COVID-19 across European nations.
    Sherratt K, Gruson H, Grah R, Johnson H, Niehus R, Prasse B, Sandmann F, Deuschel J, Wolffram D, Abbott S, Ullrich A, Gibson G, Ray EL, Reich NG, Sheldon D, Wang Y, Wattanachit N, Wang L, Trnka J, Obozinski G, Sun T, Thanou D, Pottier L, Krymova E, Meinke JH, Barbarossa MV, Leithauser N, Mohring J, Schneider J, Wlazlo J, Fuhrmann J, Lange B, Rodiah I, Baccam P, Gurung H, Stage S, Suchoski B, Budzinski J, Walraven R, Villanueva I, Tucek V, Smid M, Zajicek M, Perez Alvarez C, Reina B, Bosse NI, Meakin SR, Castro L, Fairchild G, Michaud I, Osthus D, Alaimo Di Loro P, Maruotti A, Eclerova V, Kraus A, Kraus D, Pribylova L, Dimitris B, Li ML, Saksham S, Dehning J, Mohr S, Priesemann V, Redlarski G, Bejar B, Ardenghi G, Parolini N, Ziarelli G, Bock W, Heyder S, Hotz T, Singh DE, Guzman-Merino M, Aznarte JL, Morina D, Alonso S, Alvarez E, Lopez D, Prats C, Burgard JP, Rodloff A, Zimmermann T, Kuhlmann A, Zibert J, Pennoni F, Divino F, Catala M, Lovison G, Giudici P, Tarantino B, Bartolucci F, Jona Lasinio G, Mingione M, Farcomeni A, Srivastava A, Montero-Manso P, Adiga A, Hurt B, Lewis B, Marathe M, Porebski P, Venkatramanan S, Bartczuk RP, Dreger F, Gambin A, Gogolewski K, Gruziel-Slomka… See abstract for full author list ➔ Sherratt K, et al. Elife. 2023 Apr 21;12:e81916. doi: 10.7554/eLife.81916. Elife. 2023. PMID: 37083521 Free PMC article.
  • Ensemble blood glucose prediction in diabetes mellitus: A review.
    Wadghiri MZ, Idri A, El Idrissi T, Hakkoum H. Wadghiri MZ, et al. Comput Biol Med. 2022 Aug;147:105674. doi: 10.1016/j.compbiomed.2022.105674. Epub 2022 Jun 10. Comput Biol Med. 2022. PMID: 35716436 Review.
  • Reviewing ensemble classification methods in breast cancer.
    Hosni M, Abnane I, Idri A, Carrillo de Gea JM, Fernández Alemán JL. Hosni M, et al. Comput Methods Programs Biomed. 2019 Aug;177:89-112. doi: 10.1016/j.cmpb.2019.05.019. Epub 2019 May 20. Comput Methods Programs Biomed. 2019. PMID: 31319964 Review.

Cited by

References

    1. Ansarifar J., Wang L. (2019). New algorithms for detecting multi-effect and multi-way epistatic interactions. Bioinformatics 35 (24), 5078–5085. - PubMed
    1. Archontoulis S., Licht M. (2019). New Regional Scale Feature Added to FACTS (ICM blog news, Iowa State University; ).
    1. Archontoulis S. V., Castellano M. J., Licht M. A., Nichols V., Baum M., Huber I., et al. (2020). Predicting crop yields and soil-plant nitrogen dynamics in the US Corn Belt. Crop Sci. 60 (2), 721–738. 10.1002/csc2.20039 - DOI
    1. Basso B., Liu L. (2019). Chapter Four - Seasonal crop yield forecast: Methods, applications, and accuracies. Adv. Agron. 154, 201– 255. 10.1016/bs.agron.2018.11.002 - DOI
    1. Belayneh A., Adamowski J., Khalil B., Quilty J. (2016). Coupling machine learning methods with wavelet transforms and the bootstrap and boosting ensemble approaches for drought prediction. Atmos. Res. 172-173, 37–47. 10.1016/j.atmosres.2015.12.017 - DOI

LinkOut - more resources