Mining is known to be one of the most hazardous occupations in the world. Many serious accidents have occurred worldwide over the years in mining. Although there have been efforts to create a safer work environment for miners, the number of accidents occurring at the mining sites is still significant. Machine learning techniques and predictive analytics are becoming one of the leading resources to create safer work environments in the manufacturing and construction industries. These techniques are leveraged to generate actionable insights to improve decision-making. A large amount of mining safety-related data are available, and machine learning algorithms can be used to analyze the data. The use of machine learning techniques can significantly benefit the mining industry. Decision tree, random forest, and artificial neural networks were implemented to analyze the outcomes of mining accidents. These machine learning models were also used to predict days away from work. An accidents dataset provided by the Mine Safety and Health Administration was used to train the models. The models were trained separately on tabular data and narratives. The use of a synthetic data augmentation technique using word embedding was also investigated to tackle the data imbalance problem. Performance of all the models was compared with the performance of the traditional logistic regression model. The results show that models trained on narratives performed better than the models trained on structured/tabular data in predicting the outcome of the accident. The higher predictive power of the models trained on narratives led to the conclusion that the narratives have additional information relevant to the outcome of injury compared to the tabular entries. The models trained on tabular data had a lower mean squared error compared to the models trained on narratives while predicting the days away from work. The results highlight the importance of predictors, like shift start time, accident time, and mining experience in predicting the days away from work. It was found that the F1 score of all the underrepresented classes except one improved after the use of the data augmentation technique. This approach gave greater insight into the factors influencing the outcome of the accident and days away from work.
Keywords: machine learning; neural networks; word embedding.