IMGT, the international ImMunoGeneTics information system(R) (http://imgt.cines.fr) is a high-quality integrated information system specializing in immunoglobulins (IG), T cell receptors (TR) and major histocompatibility complex (MHC) of human and other vertebrates. IMGT comprises IMGT/LIGM-DB, the comprehensive database of IG and TR sequences from human and other vertebrates (76 846 sequences in September 2003). In order to define the IMGT criteria necessary for standardized statistical analyses, the sequences of the IG variable regions (V-REGIONs) from productively rearranged human IG heavy (IGH) and IG light kappa (IGK) and lambda (IGL) chains were extracted from IMGT/LIGM-DB. The framework amino acid positions of 2474 V-REGIONs (1360 IGHV, 585 IGKV, 529 IGLV) were numbered according to the IMGT unique numbering. Two statistical methods (correspondence analysis and hierarchic classification) were used to analyze the 237 framework positions (80 for IGHV, 79 for IGKV, 78 for IGLV), for three properties (hydropathy, volume and chemical characteristics) of the 20 common amino acids. Results of the analyses are shown as standardized two-dimensional representations, designated as IMGT Colliers de Perles statistical profiles. They provide a characterization of the amino acid properties at each framework position of the expressed IG V-REGIONs, and a visualization of the resemblances and differences between heavy and light, and between kappa and lambda sequences. The standardized criteria defined in this paper, amino acid positions and property classes, will be useful to study the mutations and allele polymorphisms, to establish correlations between amino acids in the IG and TR protein three-dimensional structures and to extract new knowledge from V-like domains of chains, other than IG and TR, belonging to the immunoglobulin superfamily.
Copyright 2004 John Wiley & Sons, Ltd.