Here we present computational machinery to efficiently and accurately identify transposable element (TE) insertions in 146 next-generation sequenced inbred strains of Drosophila melanogaster. The panel of lines we use in our study is composed of strains from a pair of genetic mapping resources: the Drosophila Genetic Reference Panel (DGRP) and the Drosophila Synthetic Population Resource (DSPR). We identified 23,087 TE insertions in these lines, of which 83.3% are found in only one line. There are marked differences in the distribution of elements over the genome, with TEs found at higher densities on the X chromosome, and in regions of low recombination. We also identified many more TEs per base pair of intronic sequence and fewer TEs per base pair of exonic sequence than expected if TEs are located at random locations in the euchromatic genome. There was substantial variation in TE load across genes. For example, the paralogs derailed and derailed-2 show a significant difference in the number of TE insertions, potentially reflecting differences in the selection acting on these loci. When considering TE families, we find a very weak effect of gene family size on TE insertions per gene, indicating that as gene family size increases the number of TE insertions in a given gene within that family also increases. TEs are known to be associated with certain phenotypes, and our data will allow investigators using the DGRP and DSPR to assess the functional role of TE insertions in complex trait variation more generally. Notably, because most TEs are very rare and often private to a single line, causative TEs resulting in phenotypic differences among individuals may typically fail to replicate across mapping panels since individual elements are unlikely to segregate in both panels. Our data suggest that "burden tests" that test for the effect of TEs as a class may be more fruitful.
Keywords: DGRP; DSPR; genomics; population genetics; transposable element.