Machine learning approach for the estimation of missing precipitation data: a case study of South Korea

Water Sci Technol. 2023 Aug;88(3):556-571. doi: 10.2166/wst.2023.237.


Precipitation is one of the driving forces in water cycles, and it is vital for understanding the water cycle, such as surface runoff, soil moisture, and evapotranspiration. However, missing precipitation data at the observatory becomes an obstacle to improving the accuracy and efficiency of hydrological analysis. To address this issue, we developed a machine learning algorithm-based precipitation data recovery tool to detect and predict missing precipitation data at observatories. This study investigated 30 weather stations in South Korea, evaluating the applicability of machine learning algorithms (artificial neural network and random forest) for precipitation data recovery using environmental variables, such as air pressure, temperature, humidity, and wind speed. The proposed model showed a high performance in detecting the missing precipitation occurrence with an accuracy of 80%. In addition, the prediction results from the models showed predictive ability with a correlation coefficient ranging from 0.5 to 0.7 and R2 values of 0.53. Although both algorithms performed similarly in estimating precipitation, ANN performed slightly better. Based on the results of this study, we expect that the machine learning algorithms can contribute to improving hydrological modeling performance by recovering missing precipitation data at observation stations.

MeSH terms

  • Algorithms
  • Machine Learning*
  • Neural Networks, Computer*
  • Temperature
  • Weather