The development of a machine learning algorithm to identify occupational injuries in agriculture using pre-hospital care reports

Health Inf Sci Syst. 2021 Jul 29;9(1):31. doi: 10.1007/s13755-021-00161-9. eCollection 2021 Dec.


Purpose: Current injury surveillance efforts in agriculture are considerably hampered by the limited quantity of occupation or industry data in current health records. This has impeded efforts to develop more accurate injury burden estimates and has negatively impacted the prioritization of workplace health and safety in state and federal public health efforts. This paper describes the development of a Naïve Bayes machine learning algorithm to identify occupational injuries in agriculture using existing administrative data, specifically in pre-hospital care reports (PCR).

Methods: A Naïve Bayes machine learning algorithm was trained on PCR datasets from 2008-2010 from Maine and New Hampshire and tested on newer data from those states between 2011 and 2016. Further analyses were devoted to establishing the generalizability of the model across various states and various years. Dual visual inspection was used to verify the records subset by the algorithm.

Results: The Naïve Bayes machine learning algorithm reduced the volume of cases that required visual inspection by 69.5 percent over a keyword search strategy alone. Coders identified 341 true agricultural injury records (Case class = 1) (Maine 2011-2016, New Hampshire 2011-2015). In addition, there were 581 (Case class = 2 or 3) that were suspected to be agricultural acute/traumatic events, but lacked the necessary detail to make a certain distinction.

Conclusions: The application of the trained algorithm on newer data reduced the volume of records requiring visual inspection by two thirds over the previous keyword search strategy, making it a sustainable and cost-effective way to understand injury trends in agriculture.

Keywords: Agriculture; Injury surveillance; Machine learning; Occupational epidemiology.