A comparison of supervised classification methods for the prediction of substrate type using multibeam acoustic and legacy grain-size data

David Stephens; Markus Diesing

doi:10.1371/journal.pone.0093950

A comparison of supervised classification methods for the prediction of substrate type using multibeam acoustic and legacy grain-size data

PLoS One. 2014 Apr 3;9(4):e93950. doi: 10.1371/journal.pone.0093950. eCollection 2014.

Authors

David Stephens¹, Markus Diesing¹

Affiliation

¹ Centre for Environment, Fisheries and Aquaculture Science, Lowestoft, Suffolk, United Kingdom.

Abstract

Detailed seabed substrate maps are increasingly in demand for effective planning and management of marine ecosystems and resources. It has become common to use remotely sensed multibeam echosounder data in the form of bathymetry and acoustic backscatter in conjunction with ground-truth sampling data to inform the mapping of seabed substrates. Whilst, until recently, such data sets have typically been classified by expert interpretation, it is now obvious that more objective, faster and repeatable methods of seabed classification are required. This study compares the performances of a range of supervised classification techniques for predicting substrate type from multibeam echosounder data. The study area is located in the North Sea, off the north-east coast of England. A total of 258 ground-truth samples were classified into four substrate classes. Multibeam bathymetry and backscatter data, and a range of secondary features derived from these datasets were used in this study. Six supervised classification techniques were tested: Classification Trees, Support Vector Machines, k-Nearest Neighbour, Neural Networks, Random Forest and Naive Bayes. Each classifier was trained multiple times using different input features, including i) the two primary features of bathymetry and backscatter, ii) a subset of the features chosen by a feature selection process and iii) all of the input features. The predictive performances of the models were validated using a separate test set of ground-truth samples. The statistical significance of model performances relative to a simple baseline model (Nearest Neighbour predictions on bathymetry and backscatter) were tested to assess the benefits of using more sophisticated approaches. The best performing models were tree based methods and Naive Bayes which achieved accuracies of around 0.8 and kappa coefficients of up to 0.5 on the test set. The models that used all input features didn't generally perform well, highlighting the need for some means of feature selection.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Artificial Intelligence
Bayes Theorem
Models, Theoretical*
North Sea
Pattern Recognition, Automated / methods*
Support Vector Machine

Grants and funding

The work was supported by Cefas Research and Development funding (research project DP312). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.