Photovoltaic (PV) system energy production is non-linear because it is influenced by the random nature of weather conditions. The use of machine learning techniques to model the PV system energy production is recommended since there is no known way to deal well with non-linear data. In order to detect PV system faults, the machine learning models should provide accurate outputs. The aim of this work is to accurately predict the DC energy of six PV strings of a utility-scale PV system and to accurately detect PV string faults by benchmarking the results of four machine learning methodologies known to improve the accuracy of the machine learning models, such as the data mining methodology, machine learning technique benchmarking methodology, hybrid methodology, and the ensemble methodology. A new hybrid methodology is proposed in this work which combines the use of a fuzzy system and the use of a machine learning system containing five different trained machine learning models, such as the regression tree, artificial neural networks, multi-gene genetic programming, Gaussian process, and support vector machines for regression. The results showed that the hybrid methodology provided the most accurate machine learning predictions of the PV string DC energy, and consequently the PV string fault detection is successful.
Keywords: PV fault; PV string; ensemble methodology; hybrid methodology; machine learning prediction models.