Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues

Jason Ernst; Manolis Kellis

doi:10.1038/nbt.3157

Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues

Nat Biotechnol. 2015 Apr;33(4):364-76. doi: 10.1038/nbt.3157. Epub 2015 Feb 18.

Authors

Jason Ernst¹, Manolis Kellis²

Affiliations

¹ 1] Department of Biological Chemistry, University of California, Los Angeles, California, USA. [2] Computer Science Department, University of California, Los Angeles, California, USA. [3] Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at UCLA, Los Angeles, California, USA. [4] Jonsson Comprehensive Cancer Center, University of California, Los Angeles, California, USA. [5] Molecular Biology Institute, University of California, Los Angeles, California, USA.
² 1] MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA. [2] Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

Abstract

With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Chromosome Mapping / methods*
Data Curation / methods*
Database Management Systems*
Databases, Genetic*
Datasets as Topic
Epigenesis, Genetic / physiology*
Genetic Variation / genetics
Genome, Human / genetics*
Humans
Information Storage and Retrieval / methods
Organ Specificity / genetics
Software
User-Computer Interface

Abstract

Publication types

MeSH terms

Grants and funding