Analysis of site frequency spectra from Arabidopsis with context-dependent corrections for ancestral misinference

Plant Physiol. 2009 Feb;149(2):616-24. doi: 10.1104/pp.108.127787. Epub 2008 Nov 19.

Abstract

Previous studies have shown that the pattern of single nucleotide polymorphism (SNP) in Arabidopsis (Arabidopsis thaliana) deviates from the distribution expected under a neutral model. Here, we test whether or not ancestral misinference could explain this deviation. We start by showing that there are significant and complex influences of context on mutation dynamics as inferred from SNP frequency, in Arabidopsis, and compare the results to observations about context dependency that have been made on a previous analysis of a maize (Zea mays) SNP dataset. The data concerning heterogeneity across sites are then used to make corrections for ancestral misinference in a context-dependent manner. Using Arabidopsis lyrata to infer the ancestral state for SNPs, we show that the resulting unfolded site frequency spectrum (SFS) in Arabidopsis is skewed toward sites with high frequency derived nucleotides. Sites are also partitioned into two general functional classes, second codon position and 4-fold degenerate sites. These two classes show different SFS; although both show an overrepresentation of high frequency derived sites, low frequency derived sites are vastly overrepresented at the second codon position, but significantly underrepresented at 4-fold degenerate sites. We find that these results are robust to corrections for ancestral misinference, even when context-dependent variation in mutation properties is taken into consideration. The data suggest that, in addition to purifying selection, complex demographic events and/or linked positive selection need to be invoked to explain the SFS, and they highlight the importance of sequence context in analyses of genome-wide variation.

MeSH terms

  • Arabidopsis / genetics*
  • Arabidopsis Proteins / genetics
  • Base Sequence
  • Codon / genetics
  • Dinucleoside Phosphates / genetics
  • Gene Frequency
  • Genes, Plant
  • Genetic Variation
  • Genome, Plant*
  • Kinetics
  • Mutation
  • Polymorphism, Single Nucleotide*
  • Zea mays / genetics

Substances

  • Arabidopsis Proteins
  • Codon
  • Dinucleoside Phosphates
  • cytidylyl-3'-5'-guanosine