Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Filters applied. Clear all
. 2011 Sep 14;12:367.
doi: 10.1186/1471-2105-12-367.

Methodology and Software to Detect Viral Integration Site Hot-Spots

Affiliations
Free PMC article

Methodology and Software to Detect Viral Integration Site Hot-Spots

Angela P Presson et al. BMC Bioinformatics. .
Free PMC article

Abstract

Background: Modern gene therapy methods have limited control over where a therapeutic viral vector inserts into the host genome. Vector integration can activate local gene expression, which can cause cancer if the vector inserts near an oncogene. Viral integration hot-spots or 'common insertion sites' (CIS) are scrutinized to evaluate and predict patient safety. CIS are typically defined by a minimum density of insertions (such as 2-4 within a 30-100 kb region), which unfortunately depends on the total number of observed VIS. This is problematic for comparing hot-spot distributions across data sets and patients, where the VIS numbers may vary.

Results: We develop two new methods for defining hot-spots that are relatively independent of data set size. Both methods operate on distributions of VIS across consecutive 1 Mb 'bins' of the genome. The first method 'z-threshold' tallies the number of VIS per bin, converts these counts to z-scores, and applies a threshold to define high density bins. The second method 'BCP' applies a Bayesian change-point model to the z-scores to define hot-spots. The novel hot-spot methods are compared with a conventional CIS method using simulated data sets and data sets from five published human studies, including the X-linked ALD (adrenoleukodystrophy), CGD (chronic granulomatous disease) and SCID-X1 (X-linked severe combined immunodeficiency) trials. The BCP analysis of the human X-linked ALD data for two patients separately (774 and 1627 VIS) and combined (2401 VIS) resulted in 5-6 hot-spots covering 0.17-0.251% of the genome and containing 5.56-7.74% of the total VIS. In comparison, the CIS analysis resulted in 12-110 hot-spots covering 0.018-0.246% of the genome and containing 5.81-22.7% of the VIS, corresponding to a greater number of hot-spots as the data set size increased. Our hot-spot methods enable one to evaluate the extent of VIS clustering, and formally compare data sets in terms of hot-spot overlap. Finally, we show that the BCP hot-spots from the repopulating samples coincide with greater gene and CpG island density than the median genome density.

Conclusions: The z-threshold and BCP methods are useful for comparing hot-spot patterns across data sets of disparate sizes. The methodology and software provided here should enable one to study hot-spot conservation across a variety of VIS data sets and evaluate vector safety for gene therapy trials.

Figures

Figure 1
Figure 1
Definition of CIS, BCP and z-threshold hot-spot methods. (A) The CIS hot-spot definition implemented here is based on a commonly used density metric [19,21]. (B) Our 'z-threshold' and 'BCP' hot-spot definitions operate on a partition of the genome into 1 Mb bins. The number of VIS per megabase bin is tallied and then converted to a z-score by subtracting the mean and dividing by the standard error, calculated across all bins. Bins with high z-scores are called 'hot-bins'. Hot-spots are defined by grouping consecutive hot-bins and setting each external boundary to the closest VIS.
Figure 2
Figure 2
Comparison of % VIS in hot-spots for CIS, rate-threshold, z-threshold and BCP analyses for data sets with 200-2000 VIS. Boxplots indicate distributions of % VIS in hot-spots for 12 simulated data sets per data set size for the CIS, rate-threshold, z-threshold and BCP methods. Data set sizes are in increments of 100 for smaller data sets (200-600 VIS) to view performance when less data is available, and in increments of 200 otherwise. The CIS method (A) shows an increasing percentage of VIS in hot-spots with increasing data set size across all data set sizes. The z-threshold (C) and BCP (D) methods give the most consistent % VIS in hot-spots across data set sizes, with both methods giving most consistent results for data sets with ≥300 VIS.
Figure 3
Figure 3
Comparison of VIS clustering among data sets. We developed two methods to describe the extent of VIS clustering. The first method 'maximum %' is simply the maximum bin's z-score divided by the total number of VIS in the data set, 100 max(X)i=1nCi. Data sets with a maximum % > 8 indicate a high degree of clustering. The second method 'BCP posterior probability' is calculated after running the Bayesian change-point analysis, and is simply one minus the average of the posterior probabilities of a change point occurring at each bin, 1-P¯. BCP posterior probabilities > 0.98 indicate a high degree of clustering. Both methods indicate that the CGD data exhibits a high degree of clustering with a maximum % and BCP posterior probabilities of 11.98 and 0.999, respectively, in comparison to the other data sets which ranged from 0.5-1.48 and 0.9356-0.9361, respectively.
Figure 4
Figure 4
Graphical displays of BCP hot-spot results from the SCIDX1, CGD, and X-linked ALD trials. Our hot-spot software produces three types of plots for viewing hot-spot results, (A) a stripchart that displays the full genome for all data sets analyzed, (B) a stripchart that displays results for all data sets, one chromosome at a time; and (C) a barplot that displays results for one data set and chromosome at a time. In all plot types the grey color corresponds to VIS (A, B) or VIS bins (C) that were not defined as hot-spots. In all three plot types the x-axis corresponds to location in megabase units. In plot type C, the y-axis corresponds to bin rate (# VIS per bin/total # VIS) rather than z-score for visual clarity since z-scores can be negative. Color definitions were assigned to each data set independently based on quantiles of its non-zero z-score distribution (ie, the distribution of bin z-scores among bins with non-negative scores). VIS that were located in hot-spot regions corresponding to bins with z-score distributions ≤ 85th percentile are colored light blue, > 85 and ≤ 95 are dark blue, > 95 and ≤ 97.5 are purple, > 97.5 and ≤99 are pink and > 99 are colored red. The plots illustrate hot-spots on chromosomes 6, 11, 12 and 17 in the X-linked ALD data set, and the presence of the chromosome 6 hot-spot in both patients analyzed separately. The MLV data sets exhibit unique VIS patterns that differ from each other as well as the LV data.
Figure 5
Figure 5
Genome features of BCP hot-spots in the SCIDX1, CGD, and X-linked ALD trials. Plots (A), (C), and (D) show the median feature density per Mb and the interquartile range of BCP hot-spot regions in comparison to the genome median. In plot B the enrichment of cancer genes was calculated relative to the RefSeq gene numbers in order to control for gene density differences. The LV data sets showed enrichment of (A) RefSeq genes and (C) CpG islands relative to the genome median (indicated by an asterisk *). No other comparisons to the genome median reached significance at the Bonferroni-corrected level. Overlap of interquartile ranges among the LV data sets shows that their BCP hot-spots have similar genomic features.

Similar articles

See all similar articles

Cited by 6 articles

See all "Cited by" articles

References

    1. An DS, Donahue RE, Kamata M, Poon B, Metzger M, Mao SH, Bonifacino A, Krouse AE, Darlix JL, Baltimore D, Qin FXF, Chen ISY. Stable reduction of CCR5 by RNAi through hematopoietic stem cell transplant in non-human primates. Proceedings of the National Academy of Sciences. 2007;104(32):13110–13115. doi: 10.1073/pnas.0705474104. - DOI - PMC - PubMed
    1. Johnson LA, Morgan RA, Dudley ME, Cassard L, Yang JC, Hughes MS, Kammula US, Royal RE, Sherry RM, Wunderlich JR, Lee CC, Restifo NP, Schwarz SL, Cogdill AP, Bishop RJ, Kim H, Brewer CC, Rudy SF, VanWaes C, Davis JL, Mathur A, Ripley RT, Nathan DA, Laurencot CM, Rosenberg SA. Gene therapy with human and mouse T-cell receptors mediates cancer regression and targets normal tissues expressing cognate antigen. Blood. 2009;114(3):535–46. doi: 10.1182/blood-2009-03-211714. - DOI - PMC - PubMed
    1. Arumugam P, Malik P. Genetic therapy for beta-thalassemia: from the bench to the bedside. Hematology Am Soc Hematol Educ Program. 2010;2010:445–50. - PubMed
    1. Cavazzana-Calvo M, Hacein-Bey S, de Saint Basile G, Gross F, Yvon E, Nusbaum P, Selz F, Hue C, Certain S, Casanova J, Bousso P, Deist F, Fischer A. Gene therapy of human severe combined immunodeficiency (SCID)-X1 disease. Science. 2000;288(5466):669–672. doi: 10.1126/science.288.5466.669. - DOI - PubMed
    1. Gaspar HB, Parsley KL, Howe S, King D, Gilmour KC, Sinclair J, Brouns G, Schmidt M, Von Kalle C, Barington T, Jakobsen MA, Christensen HO, Al Ghonaium A, White HN, Smith JL, Levinsky RJ, Ali RR, Kinnon C, Thrasher AJ. Gene therapy of X-linked severe combined immunodeficiency by use of a pseudotyped gammaretroviral vector. Lancet. 2004;364(9452):2181–7. doi: 10.1016/S0140-6736(04)17590-9. - DOI - PubMed

Publication types

MeSH terms

Feedback