Simulation of genomes: a review

Curr Genomics. 2008 May;9(3):155-9. doi: 10.2174/138920208784340759.


There is an increasing role of population genetics in human genetic research linking empirical observations with hypotheses about sequence variation due to historical and evolutionary causes. In addition, the data sets are increasing in size, with genome-wide data becoming a common place in many empirical studies. As far as more information is available, it becomes clear that simplest hypotheses are not consistent with data. Simulations will provide the key tool to contrast complex hypotheses on real data by generating simulated data under the hypothetical historical and evolutionary conditions that we want to contrast. Undoubtedly, developing tools for simulating large sequences that at the same time allow simulate natural selection, recombination and complex demography patterns will be of great interest in order to better understanding the trace left on the DNA by different interacting evolutionary forces. Simulation tools will be also essential to evaluate the sampling properties of any statistics used on genome-wide association studies and to compare performance of methods applied at genome-wide scales. Several recent simulation tools have been developed. Here, we review some of the currently existing simulators which allow for efficient simulation of large sequences on complex evolutionary scenarios. In addition, we will point out future directions in this field which are already a key part of the current research in evolutionary biology and it seems that it will be a primary tool in the future research of genome and post-genomic biology.