Machine Learning Methods for X-Ray Scattering Data Analysis from Biomacromolecular Solutions

Biophys J. 2018 Jun 5;114(11):2485-2492. doi: 10.1016/j.bpj.2018.04.018.

Abstract

Small-angle x-ray scattering (SAXS) of biological macromolecules in solutions is a widely employed method in structural biology. SAXS patterns include information about the overall shape and low-resolution structure of dissolved particles. Here, we describe how to transform experimental SAXS patterns to feature vectors and how a simple k-nearest neighbor approach is able to retrieve information on overall particle shape and maximal diameter (Dmax) as well as molecular mass directly from experimental scattering data. Based on this transformation, we develop a rapid multiclass shape-classification ranging from compact, extended, and flat categories to hollow and random-chain-like objects. This classification may be employed, e.g., as a decision block in automated data analysis pipelines. Further, we map protein structures from the Protein Data Bank into the classification space and, in a second step, use this mapping as a data source to obtain accurate estimates for the structural parameters (Dmax, molecular mass) of the macromolecule under study based on the experimental scattering pattern alone, without inverse Fourier transform for Dmax. All methods presented are implemented in a Fortran binary DATCLASS, part of the ATSAS data analysis suite, available on Linux, Mac, and Windows and free for academic use.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Analysis*
  • Machine Learning
  • Macromolecular Substances / chemistry*
  • Solutions
  • X-Ray Diffraction*

Substances

  • Macromolecular Substances
  • Solutions