Principal component analysis for automated classification of 2D spectra and interferograms of protein therapeutics: influence of noise, reconstruction details, and data preparation

J Biomol NMR. 2020 Nov;74(10-11):643-656. doi: 10.1007/s10858-020-00332-y. Epub 2020 Jul 22.


Protein therapeutics have numerous critical quality attributes (CQA) that must be evaluated to ensure safety and efficacy, including the requirement to adopt and retain the correct three-dimensional fold without forming unintended aggregates. Therefore, the ability to monitor protein higher order structure (HOS) can be valuable throughout the lifecycle of a protein therapeutic, from development to manufacture. 2D NMR has been introduced as a robust and precise tool to assess the HOS of a protein biotherapeutic. A common use case is to decide whether two groups of spectra are substantially different, as an indicator of difference in HOS. We demonstrate a quantitative use of principal component analysis (PCA) scores to perform this decision-making, and demonstrate the effect of acquisition and processing details on class separation using samples of NISTmAb monoclonal antibody Reference Material subjected to two different oxidative stress protocols. The work introduces an approach to computing similarity from PCA scores based upon the technique of histogram intersection, a method originally developed for retrieval of images from large databases. Results show that class separation can be robust with respect to random noise, reconstruction method, and analysis region selection. By contrast, details such as baseline distortion can have a pronounced effect, and so must be controlled carefully. Since the classification approach can be performed without the need to identify peaks, results suggest that it is possible to use even more efficient measurement strategies that do not produce spectra that can be analyzed visually, but nevertheless allow useful decision-making that is objective and automated.

Keywords: Automation; Biopharmaceuticals; Chemometrics; Higher order structure (HOS); Monoclonal antibody (mAb); NISTmAb; Nuclear magnetic resonance (NMR); Principal component analysis.

MeSH terms

  • Antibodies, Monoclonal / chemistry*
  • Automation / methods*
  • Biological Products
  • Fourier Analysis
  • Magnetic Resonance Spectroscopy / methods
  • Nuclear Magnetic Resonance, Biomolecular / methods*
  • Principal Component Analysis / methods*


  • Antibodies, Monoclonal
  • Biological Products