A comparative study on machine learning based algorithms for prediction of motorcycle crash severity

PLoS One. 2019 Apr 4;14(4):e0214966. doi: 10.1371/journal.pone.0214966. eCollection 2019.


Motorcycle crash severity is under-researched in Ghana. Thus, the probable risk factors and association between these factors and motorcycle crash severity outcomes is not known. Traditional statistical models have intrinsic assumptions and pre-defined correlations that, if flouted, can generate inaccurate results. In this study, machine learning based algorithms were employed to predict and classify motorcycle crash severity. Machine learning based techniques are non-parametric models without the presumption of relationships between endogenous and exogenous variables. The main aim of this research is to evaluate and compare different approaches to modeling motorcycle crash severity as well as investigating the effect of risk factors on the injury outcomes of motorcycle crashes. Motorcycle crash dataset between 2011 and 2015 was extracted from the National Road Traffic Crash Database at the Building and Road Research Institute (BRRI) in Ghana. The dataset was classified into four injury severity categories: fatal, hospitalized, injured, and damage-only. Three machine learning based models were developed: J48 Decision Tree Classifier, Random Forest (RF) and Instance-Based learning with parameter k (IBk) were employed to model the severity of injury in a motorcycle crash. These machine learning algorithms were validated using 10-fold cross-validation technique. The three machine learning based algorithms were compared with one another and the statistical model: multinomial logit model (MNLM). Also, the relative importance analysis of the attribute was conducted to determine the impact of these attributes on injury severity outcomes. The results of the study reveal that the predictions of machine learning algorithms are superior to the MNLM in accuracy and effectiveness, and the RF-based algorithms show the overall best agreement with the experimental data out of the three machine learning algorithms, for its global optimization and extrapolation ability. Location type, time of the crash, settlement type, collision partner, collision type, road separation, road surface type, the day of the week, and road shoulder condition were found as the critical determinants of motorcycle crash injury severity.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Accidents, Traffic*
  • Databases, Factual*
  • Ghana
  • Humans
  • Machine Learning
  • Models, Biological*
  • Motorcycles*
  • Predictive Value of Tests
  • Trauma Severity Indices*
  • Wounds and Injuries*

Associated data

  • figshare/10.6084/m9.figshare.7700954

Grant support

This work was supported by Ministry of Transportation, People’s Republic of China (CN), Grant Number: 2013-364-836-900 to Habin Jiang. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.