Comprehensive Profiling of Retroviral Integration Sites Using Target Enrichment Methods From Historical Koala Samples Without an Assembled Reference Genome

PeerJ. 2016 Mar 28;4:e1847. doi: 10.7717/peerj.1847. eCollection 2016.

Abstract

Background. Retroviral integration into the host germline results in permanent viral colonization of vertebrate genomes. The koala retrovirus (KoRV) is currently invading the germline of the koala (Phascolarctos cinereus) and provides a unique opportunity for studying retroviral endogenization. Previous analysis of KoRV integration patterns in modern koalas demonstrate that they share integration sites primarily if they are related, indicating that the process is currently driven by vertical transmission rather than infection. However, due to methodological challenges, KoRV integrations have not been comprehensively characterized. Results. To overcome these challenges, we applied and compared three target enrichment techniques coupled with next generation sequencing (NGS) and a newly customized sequence-clustering based computational pipeline to determine the integration sites for 10 museum Queensland and New South Wales (NSW) koala samples collected between the 1870s and late 1980s. A secondary aim of this study sought to identify common integration sites across modern and historical specimens by comparing our dataset to previously published studies. Several million sequences were processed, and the KoRV integration sites in each koala were characterized. Conclusions. Although the three enrichment methods each exhibited bias in integration site retrieval, a combination of two methods, Primer Extension Capture and hybridization capture is recommended for future studies on historical samples. Moreover, identification of integration sites shows that the proportion of integration sites shared between any two koalas is quite small.

Keywords: Clustering; Integration sites; KoRV; Retroviral endogenization; Target enrichment.

Grant support

YI, ALR, KMH and ADG were supported by Grant Number R01GM092706 from the National Institute of General Medical Sciences (NIGMS). ADG was additionally supported by a grant from the Morris Animal Foundation, grant number D14ZO-94. PC was supported by a fellowship from the China Scholarship Council. UL was supported by the interdisciplinary training initiative “Evolution across Scales”, funded by the Volkswagen foundation (Grant Number 83459). DEAP was supported by a scholarship from the Deutscher Akademischer Austauschdienst–DAAD. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.