Antibody Clustering Using a Machine Learning Pipeline that Fuses Genetic, Structural, and Physicochemical Properties

Adv Exp Med Biol. 2020;1194:41-58. doi: 10.1007/978-3-030-32622-7_4.


Antibody V domain clustering is of paramount importance to a repertoire of immunology-related areas. Although several approaches have been proposed for antibody clustering, still no consensus has been reached. Numerous attempts use information from genes, protein sequences, 3D structures, and 3D surfaces in an effort to elucidate unknown action mechanisms directly related to their function and to either link them directly to diseases or drive the discovery of new medicines, such as antibody drug conjugates (ADC). Herein, we describe a new V domain antibody clustering method based on the comparison of the interaction sites between each antibody and its antigen. A more specific clustering analysis of the antibody's V domain was provided using deep learning and data mining techniques. The multidimensional information was extracted from the structural resolved antibodies when they were captured to interact with other proteins. The available 3D structures of protein antigen-antibody (Ag-Ab) interfaces contain information about how antibody V domains recognize antigens as well as about which amino acids are involved in the recognition. As such, the antibody surface holds information about antigens' folding that reside with the Ab-Ag interface residues and how they interact. In order to gain insight into the nature of such interactions, we propose a new simple philosophy to transform the conserved framework (fragment regions, complementarity-determining regions) of antibody V domain in a binary form using structural features of antibody-antigen interactions, toward identifying new antibody signatures in V domain binding activity. Finally, an advanced three-level hybrid classification scheme has been set for clustering antibodies in subgroups, which can combine the information from the protein sequences, the three-dimensional structures, and specific "key patterns" of recognized interactions. The clusters provide multilevel information about antibodies and antibody-antigen complexes.

Keywords: Antibodies; Antibody drug conjugates; Antibody-antigen complexes; Classification scheme; Clustering; Immunology.

MeSH terms

  • Amino Acid Sequence
  • Antigen-Antibody Complex* / chemistry
  • Antigen-Antibody Complex* / genetics
  • Cluster Analysis*
  • Complementarity Determining Regions / chemistry
  • Machine Learning*
  • Molecular Conformation


  • Antigen-Antibody Complex
  • Complementarity Determining Regions