Prediction of monthly dry days with machine learning algorithms: a case study in Northern Bangladesh

Sci Rep. 2022 Nov 16;12(1):19717. doi: 10.1038/s41598-022-23436-x.

Abstract

Dry days at varied scale are an important topic in climate discussions. Prolonged dry days define a dry period. Dry days with a specific rainfall threshold may visualize a climate scenario of a locality. The variation of monthly dry days from station to station could be correlated with several climatic factors. This study suggests a novel approach for predicting monthly dry days (MDD) of six target stations using different machine learning (ML) algorithms in Bangladesh. Several rainfall thresholds were used to prepare the datasets of monthly dry days (MDD) and monthly wet days (MWD). A group of ML algorithms, like Bagged Trees (BT), Exponential Gaussian Process Regression (EGPR), Matern Gaussian Process Regression (MGPR), Linear Support Vector Machine (LSVM), Fine Trees (FT) and Linear Regression (LR) were evaluated on building a competitive prediction model of MDD. In validation of the study, EGPR-based models were able to better capture the monthly dry days (MDD) over Bangladesh compared to those by MGPR, LSVM, BT, LR and FT-based models. When MDD were the predictors for all six target stations, EGPR produced highest mean R2 of 0.91 (min. 0.89 and max. 0.92) with a least mean RMSE of 2.14 (min. 1.78 and max. 2.69) compared to other models. An explicit evaluation of the ML algorithms using one-year lead time approach demonstrated that BT and EGPR were the most result-oriented algorithms (R2 = 0.78 for both models). However, having a least RMSE, EGPR was chosen as the best model in one year lead time. The dataset of monthly dry-wet days was the best predictor in the lead-time approach. In addition, sensitivity analysis demonstrated sensitivity of each station on the prediction of MDD of target stations. Monte Carlo simulation was introduced to assess the robustness of the developed models. EGPR model declared its robustness up to certain limit of randomness on the testing data. The output of this study can be referred to the agricultural sector to mitigate the impacts of dry spells on agriculture.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Bangladesh
  • Linear Models
  • Machine Learning*
  • Support Vector Machine