Biodegradability describes the capacity of substances to be mineralized by free-living bacteria. It is a crucial property in estimating a compound's long-term impact on the environment. The ability to reliably predict biodegradability would reduce the need for laborious experimental testing. However, this endpoint is difficult to model due to unavailability or inconsistency of experimental data. Our approach makes use of the Online Chemical Modeling Environment (OCHEM) and its rich supply of machine learning methods and descriptor sets to build classification models for ready biodegradability. These models were analyzed to determine the relationship between characteristic structural properties and biodegradation activity. The distinguishing feature of the developed models is their ability to estimate the accuracy of prediction for each individual compound. The models developed using seven individual descriptor sets were combined in a consensus model, which provided the highest accuracy. The identified overrepresented structural fragments can be used by chemists to improve the biodegradability of new chemical compounds. The consensus model, the datasets used, and the calculated structural fragments are publicly available at http://ochem.eu/article/31660.
Keywords: Outlier detection; Ready biodegradability; Structural and functional interpretation.
© 2014 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.