Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun 3;11(6):e0156571.
doi: 10.1371/journal.pone.0156571. eCollection 2016.

Random Forests for Global and Regional Crop Yield Predictions

Affiliations
Free PMC article

Random Forests for Global and Regional Crop Yield Predictions

Jig Han Jeong et al. PLoS One. .
Free PMC article

Abstract

Accurate predictions of crop yield are critical for developing effective agricultural and food policies at the regional and global scales. We evaluated a machine-learning method, Random Forests (RF), for its ability to predict crop yield responses to climate and biophysical variables at global and regional scales in wheat, maize, and potato in comparison with multiple linear regressions (MLR) serving as a benchmark. We used crop yield data from various sources and regions for model training and testing: 1) gridded global wheat grain yield, 2) maize grain yield from US counties over thirty years, and 3) potato tuber and maize silage yield from the northeastern seaboard region. RF was found highly capable of predicting crop yields and outperformed MLR benchmarks in all performance statistics that were compared. For example, the root mean square errors (RMSE) ranged between 6 and 14% of the average observed yield with RF models in all test cases whereas these values ranged from 14% to 49% for MLR models. Our results show that RF is an effective and versatile machine-learning method for crop yield predictions at regional and global scales for its high accuracy and precision, ease of use, and utility in data analysis. RF may result in a loss of accuracy when predicting the extreme ends or responses beyond the boundaries of the training data.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1
Study regions: global wheat mega-environments (A), US maize producing counties (B), and northeastern seaboard region (NESR) (C). All 12 wheat mega-environments are shown with different colors (A). Maize grain yield by the US counties in 2013 surveyed by USDA-NASS is visualized using different shades with darker shades representing higher yields (B). The NESR includes 433 counties of Connecticut, Delaware, Maine, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, Vermont, Virginia, and West Virginia. The red dots indicate the location of the data points, where weather stations exist. Point type data was used for this region (C).
Fig 2
Fig 2. Random Forests model performance for test datasets.
Observed vs. predicted plots are shown for four case studies: (A) global wheat grain yield, (B) US maize grain yield over 30 years, (C) potato wet tuber yield in northeastern seaboard region (NESR), and (D) maize silage yield in NESR The dashed lines indicate 1:1 relation and the solid line represents linear regression between the observations and predictions made for test datasets. The linear regression equation for the solid line is provided along with RMSE, EF, d, and Pearson’s r.
Fig 3
Fig 3. Partial dependence plots for the top ranked predictor variable from variable importance measures of Random Forests models.
(A) N fertilization rate (NFERT) in global wheat grain yield predictions, (B) year (YR) in the 30-year US maize grain yields, (C) Latitude (lat) for potato wet tuber yields in northeastern seaboard region (NESR), and (D) lat for maize silage yield in NESR. The Y-axis of each plot indicates the average of all of the possible model predictions for the X predictor value. The X-axis hash marks indicate deciles.

Similar articles

Cited by

References

    1. Lobell DB, Burke MB. Why are agricultural impacts of climate change so uncertain? The importance of temperature relative to precipitation. Environ Res Lett. 2008;3(3).
    1. Tilman D, Balzer C, Hill J, Befort BL. Global food demand and the sustainable intensification of agriculture. Proceedings of the National Academy of Sciences. 2011;108(50):20260–4. - PMC - PubMed
    1. Lobell DB, Burke MB. On the use of statistical models to predict crop yield responses to climate change. Agricultural and Forest Meteorology. 2010;150(11):1443–52.
    1. Sheehy JE, Mitchell PL, Ferrer AB. Decline in rice grain yields with temperature: Models and correlations can give different estimates. Field Crop Res. 2006;98(2–3):151–6.
    1. Landau S, Mitchell RAC, Barnett V, Colls JJ, Craigon J, Payne RW. A parsimonious, multiple-regression model of wheat yield response to environment. Agricultural and Forest Meteorology. 2000;101(2–3):151–66.

Grants and funding

This study was supported by a Cooperative Research Program for Agricultural Science and Technology Development (Project No. PJ01000707), Rural Development Administration, Republic of Korea (SHK; KMS). Additional support was provided in part by a Specific Cooperative Agreement: 58-1265-1-074 between University of Washington and USDA-ARS (SHK; VRR), the USDA-ARS Headquarters Postdoctoral Research Associate Program (DHF), the USDA-NIFA-AFRI Grant no. 2011-68004-30057: Enhancing Food Security of Underserved Populations in the Northeast through Sustainable Regional Food Systems (DHF), the USDA AFRI fellowship 2016-67012-25208 (NDM), the NSF Hydrological Sciences grant 1521210 (NDM), and the Packard Foundation (EEB).

LinkOut - more resources