Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 25 (2), 274-5

Human Genomes as Email Attachments


Human Genomes as Email Attachments

Scott Christley et al. Bioinformatics.


The amount of genomic sequence data being generated and made available through public databases continues to increase at an ever-expanding rate. Downloading, copying, sharing and manipulating these large datasets are becoming difficult and time consuming for researchers. We need to consider using advanced compression techniques as part of a standard data format for genomic data. The inherent structure of genome data allows for more efficient lossless compression than can be obtained through the use of generic compression programs. We apply a series of techniques to James Watson's genome that in combination reduce it to a mere 4MB, small enough to be sent as an email attachment.

Similar articles

  • The Human Genome Contracts Again
    DS Pavlichin et al. Bioinformatics 29 (17), 2199-202. PMID 23793748.
    Code is available at
  • ERGC: An Efficient Referential Genome Compression Algorithm
    S Saha et al. Bioinformatics 31 (21), 3468-75. PMID 26139636.
    We have done extensive experiments using five real sequencing datasets. The results on real genomes show that our proposed algorithm is indeed competitive and performs be …
  • HUGO: Hierarchical mUlti-reference Genome cOmpression for Aligned Reads
    P Li et al. J Am Med Inform Assoc 21 (2), 363-73. PMID 24368726.
    The proposed multi-reference-based compression algorithm for aligned reads outperforms existing single-reference based algorithms.
  • Genome Mapping Statistics and Bioinformatics
    JC Mychaleckyj. Methods Mol Biol 404, 461-88. PMID 18450063. - Review
    The unprecedented availability of genome sequences, coupled with user-friendly, web-enabled search and analysis tools allows practitioners to locate interesting genome fe …
  • Large-scale Open Bioinformatics Data Resources
    E Stupka. Curr Opin Mol Ther 4 (3), 265-74. PMID 12139313. - Review
    The data explosion in bioinformatics is relentless. More and more genomes are being sequenced and many new types of datasets are being generated in large-scale projects. …
See all similar articles

Cited by 39 PubMed Central articles

See all "Cited by" articles

Publication types

LinkOut - more resources