Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul 15;34(14):2513-2514.
doi: 10.1093/bioinformatics/bty046.

pymzML v2.0: Introducing a Highly Compressed and Seekable Gzip Format

Affiliations

pymzML v2.0: Introducing a Highly Compressed and Seekable Gzip Format

M Kösters et al. Bioinformatics. .

Abstract

Motivation: In the new release of pymzML (v2.0), we have optimized the speed of this established tool for mass spectrometry data analysis to adapt to increasing amounts of data in mass spectrometry. Thus, we integrated faster libraries for numerical calculations, improved data retrieving algorithms and have optimized the source code. Importantly, to adapt to rapidly growing file sizes, we developed a generalizable compression scheme for very fast random access and applied this concept to mzML files to retrieve spectral data.

Results: pymzML performs at par with established C programs when it comes to processing times. However, it offers the versatility of a scripting language, while adding unprecedented fast random access to compressed files. Additionally, we designed our compression scheme in such a general way that it can be applied to any field where fast random access to large data blocks in compressed files is desired.

Availability and implementation: pymzML is freely available on https://github.com/pymzML/pymzML under GPL license. pymzML requires Python3.4+ and optionally numpy. Documentation available on http://pymzml.readthedocs.io.

Similar articles

See all similar articles

Cited by 5 articles

LinkOut - more resources

Feedback