Background: There are several situations in population biology research where simulating DNA sequences is useful. Simulation of biological populations under different evolutionary genetic models can be undertaken using backward or forward strategies. Backward simulations, also called coalescent-based simulations, are computationally efficient. The reason is that they are based on the history of lineages with surviving offspring in the current population. On the contrary, forward simulations are less efficient because the entire population is simulated from past to present. However, the coalescent framework imposes some limitations that forward simulation does not. Hence, there is an increasing interest in forward population genetic simulation and efficient new tools have been developed recently. Software tools that allow efficient simulation of large DNA fragments under complex evolutionary models will be very helpful when trying to better understand the trace left on the DNA by the different interacting evolutionary forces. Here I will introduce GenomePop, a forward simulation program that fulfills the above requirements. The use of the program is demonstrated by studying the impact of intracodon recombination on global and site-specific dN/dS estimation.
Results: I have developed algorithms and written software to efficiently simulate, forward in time, different Markovian nucleotide or codon models of DNA mutation. Such models can be combined with recombination, at inter and intra codon levels, fitness-based selection and complex demographic scenarios.
Conclusion: GenomePop has many interesting characteristics for simulating SNPs or DNA sequences under complex evolutionary and demographic models. These features make it unique with respect to other simulation tools. Namely, the possibility of forward simulation under General Time Reversible (GTR) mutation or GTRxMG94 codon models with intra-codon recombination, arbitrary, user-defined, migration patterns, diploid or haploid models, constant or variable population sizes, etc. It also allows simulation of fitness-based selection under different distributions of mutational effects. Under the 2-allele model it allows the simulation of recombination hot-spots, the definition of different frequencies in different populations, etc. GenomePop can also manage large DNA fragments. In addition, it has a scaling option to save computation time when simulating large sequences and population sizes under complex demographic and evolutionary situations. These and many other features are detailed in its web page .