The current excitement about copy-number variation: how it relates to gene duplications and protein families

Curr Opin Struct Biol. 2008 Jun;18(3):366-74. doi: 10.1016/ Epub 2008 May 27.


Following recent technological advances there has been an increasing interest in genome structural variants (SVs), in particular copy-number variants (CNVs)--large-scale duplications and deletions. Although not immediately evident, CNV surveys make a conceptual connection between the fields of population genetics and protein families, in particular with regard to the stability and expandability of families. The mechanisms giving rise to CNVs can be considered as fundamental processes underlying gene duplication and loss; duplicated genes being the results of 'successful' copies, fixed and maintained in the population. Conversely, many 'unsuccessful' duplicates remain in the genome as pseudogenes. Here, we survey studies on CNVs, highlighting issues related to protein families. In particular, CNVs tend to affect specific gene functional categories, such as those associated with environmental response, and are depleted in genes related to basic cellular processes. Furthermore, CNVs occur more often at the periphery of the protein interaction network. In comparison, protein families associated with successful and unsuccessful duplicates are associated with similar functional categories but are differentially placed in the interaction network. These trends are likely reflective of CNV formation biases and natural selection, both of which differentially influence distinct protein families.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Gene Duplication*
  • Proteins / chemistry
  • Proteins / genetics*


  • Proteins