ADaCGH2: parallelized analysis of (big) CNA data

Bioinformatics. 2014 Jun 15;30(12):1759-61. doi: 10.1093/bioinformatics/btu099. Epub 2014 Feb 14.

Abstract

Motivation: Studies of genomic DNA copy number alteration can deal with datasets with several million probes and thousands of subjects. Analyzing these data with currently available software (e.g. as available from BioConductor) can be extremely slow and may not be feasible because of memory requirements.

Results: We have developed a BioConductor package, ADaCGH2, that parallelizes the main segmentation algorithms (using forking on multicore computers or parallelization via message passing interface, etc., in clusters of computers) and uses ff objects for reading and data storage. We show examples of data with 6 million probes per array; we can analyze data that would otherwise not fit in memory, and compared with the non-parallelized versions we can achieve speedups of 25-40 times on a 64-cores machine.

Availability and implementation: ADaCGH2 is an R package available from BioConductor. Version 2.3.11 or higher is available from the development branch: http://www.bioconductor.org/packages/devel/bioc/html/ADaCGH2.html.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • DNA Copy Number Variations*
  • Genomics / methods
  • Software*