Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness
- PMID: 26890307
- PMCID: PMC4758710
- DOI: 10.1371/journal.pone.0149089
Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness
Abstract
Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia's marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to 'small p and large n' problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models.
Conflict of interest statement
Figures
Similar articles
-
A comparison of supervised classification methods for the prediction of substrate type using multibeam acoustic and legacy grain-size data.PLoS One. 2014 Apr 3;9(4):e93950. doi: 10.1371/journal.pone.0093950. eCollection 2014. PLoS One. 2014. PMID: 24699553 Free PMC article.
-
A multiscale approach to mapping seabed sediments.PLoS One. 2018 Feb 28;13(2):e0193647. doi: 10.1371/journal.pone.0193647. eCollection 2018. PLoS One. 2018. PMID: 29489899 Free PMC article.
-
Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology.Environ Monit Assess. 2017 Jul;189(7):316. doi: 10.1007/s10661-017-6025-0. Epub 2017 Jun 6. Environ Monit Assess. 2017. PMID: 28589457 Free PMC article.
-
Screening for High Blood Pressure in Adults: A Systematic Evidence Review for the U.S. Preventive Services Task Force [Internet].Rockville (MD): Agency for Healthcare Research and Quality (US); 2014 Dec. Report No.: 13-05194-EF-1. Rockville (MD): Agency for Healthcare Research and Quality (US); 2014 Dec. Report No.: 13-05194-EF-1. PMID: 25632496 Free Books & Documents. Review.
-
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification.In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. PMID: 26269925 Free Books & Documents. Review.
Cited by
-
A novel approach to risk exposure and epigenetics-the use of multidimensional context to gain insights into the early origins of cardiometabolic and neurocognitive health.BMC Med. 2023 Nov 27;21(1):466. doi: 10.1186/s12916-023-03168-z. BMC Med. 2023. PMID: 38012757 Free PMC article.
-
Prediction of acute kidney injury risk after cardiac surgery: using a hybrid machine learning algorithm.BMC Med Inform Decis Mak. 2022 May 18;22(1):137. doi: 10.1186/s12911-022-01859-w. BMC Med Inform Decis Mak. 2022. PMID: 35585624 Free PMC article.
-
Predicting postoperative surgical site infection with administrative data: a random forests algorithm.BMC Med Res Methodol. 2021 Aug 28;21(1):179. doi: 10.1186/s12874-021-01369-9. BMC Med Res Methodol. 2021. PMID: 34454414 Free PMC article.
-
MicroRNA Ratios Distinguish Melanomas from Nevi.J Invest Dermatol. 2020 Jan;140(1):164-173.e7. doi: 10.1016/j.jid.2019.06.126. Epub 2019 Sep 30. J Invest Dermatol. 2020. PMID: 31580842 Free PMC article.
References
-
- Post AL, Wassenberg TJ, Passlow V. Physical surrogates for macrofaunal distribution and abundance in a tropical gulf. Marine and Freshwater Research. 2006;57:469–83.
-
- Mortensen PB, Dolan M, Buhl-Mortensen L. Prediction of benthic biotopes an a Norwegian offshore bank using a combination of multivariate analysis and GIS classification. ICES Journal of Marine Science. 2009;66:2026–32.
-
- Newell RC, Seiderer LJ, Robinson JE. Animal/sediment relationships in coastal deposits of the eastern English Channel. Journal of the Marine Biological Association of the United Kingdom. 2001;81:1–9.
-
- Warwick RM, Davies JR. The distribution of sublitoral macrofauna communities in the Bristol Channel in relation to the substrate. Estuarine, Coastal and Shelf Science. 1977;5:267–88.
-
- McArthur MA, Brooke BP, Przeslawski R, Ryan DA, Lucieer VL, Nichol S, et al. On the use of abiotic surrogates to describe marine benthic biodiversity. Estuarine, Coastal and Shelf Science. 2010;88:21–32.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
