A multi-threading implement of Python gzip module.
copied from cf-staging / pgzippgzip
is a multi-threaded gzip
implementation for python
that increases the compression and decompression performance.
Compression and decompression performance gains are made by parallelizing the usage of block indexing within a gzip
file. Block indexing utilizes gzip's FEXTRA
feature which records the index of compressed members. FEXTRA
is defined in the official gzip
specification starting at version 4.3. Because FEXTRA
is part of the gzip
specification, pgzip
is compatible with regular gzip
files.
pgzip
is ~25X faster for compression and ~7X faster for decompression when benchmarked on a 24 core machine. Performance is limited only by I/O and the python
interpreter.
Theoretically, the compression and decompression speed should be linear with the number of cores available. However, I/O and a language's general performance limits the compression and decompression speed in practice.