lzbench Compression Benchmark

lzbench is an in-memory benchmark of open-source LZ77/LZSS/LZMA compressors.

The benchmark currently consists of 36 datasets, tested against 40 codecs at every compression level they offer. It is run on 1 test machine, yielding a grand total of 7200 datapoints.

Skip to results (pretty pictures!) Learn more about lzbench

Javascript Required

Javascript is required for this to work, and it looks like you don't have it enabled. Sorry; you can still download the CSV data, but the charts and tables on this page will not work unless you enable Javascript.

Introduction

The following benchmark cover the most common compression methods. It's tested against the low level C libraries with the available flags. The large amount of compressors and the similarity between them can cause confusion. It's easier to understand the comparison once you realize that a compressor is just the combination of the following 3 things.

An algorithm, with adjustable settings. (lzma, deflate, lzo, lz4, etc...)
An archive format. (.gz, tar.xz, tar.gz, .7z, .zip, etc...)
A tool or a library, also knows as the implementation. (gzip, tar, 7-zip, zlib, liblzma, libdeflate, etc...)

The algorithm family is the most defining characteristic by far, then comes the implementation. A C library well-optimized over a decade should do a bit better than a random java lib from github.

For example, gzip designates both the tool and its archive format (specific to that tool) but it's based on deflate. It has similar results to everything else that is based on deflate (particularly the zlib library).

There are some bugs or edge cases to account for so you should always test your implementation against your use case. For instance kafka have offered snappy compression for a few years (off by default) but the buffers are misconfigured and it cannot achieve any meaningful compression.

Use Cases

Let's split the compressors in categories: the slow, the medium and the fast:

The slow are in the 0 - 10 MB/s range at compression. It's mostly LZMA derivatives (LZMA, LZMA2, XZ, 7-zip default), bzip2 and brotli (from google).
The medium are in the 10 - 500 MB/s range at compression. It's mostly deflate (used by gzip) and zstd (facebook). Note that deflate is on the lower end while zstd is on the higher end.
The fast algorithms are around 1 GB/s and above, a whole gigabyte that is correct, at both compression and decompression. It's mostly lzo, lz4 (facebook) and snappy (google).

The strongest and slowest algorithms are ideal to compress a single time and decompress many times. For example, linux packages are distributed as packages.tar.xz (lzma) for the last few years. It used to be tar.gz historically, the switch to stronger compression must have saved a lot of bandwidth on the Linux mirrors.

The medium algorithms are ideal to save storage space and/or network transfer at the expense of CPU time. For example, backups or logs are often gzip'ed on archival. Static web assets can be compressed on the fly by some web servers to save bandwidth (html, css, js).

The fastest algorithms are ideal to reduce storage/disk/network usage and make application more efficient. The compression and decompression speed is actually faster than most I/O. Using compression can reduce I/O and it will make the application faster if I/O was the bottleneck. For example, ElasticSearch (Lucene) compresses indexes with lz4 by default.

Balance

The slower the compression, the better the ratio, however it is not necessarily a good idea to waste entire minutes just to save a few megabytes. It's all about balance.

If you ever try to "7-zip ultra" a 4 GB DVD content, or "gzip --strong" a 100 GB database dump, you might realize that it takes 20 hours to run. 20 hours of wasted CPU and electricty and heat, notwistanding 20 hours too long to run it daily, as the daily backup it was supposed to be. That's when zstd and lz4 come in handy and save the day.

Not shown here but to keep in mind is the memory usage. Stronger compression usually comes at the cost of higher memory usage for both compression and decompression. The worst case is probably LZMA that requires a gigabyte of memory per core at the strongest levels, then a bit less for decompression. It prevents usage on low-end machines, mobiles and embedded devices.

Evolution

We're in the 3rd millenimum and there was surprisingly little progress in general compression in the past decades. deflate, lzma and lzo are from the 90's, the origin of lz compression traces back to at least the 70's.

Actually, it's not true that nothing happened. Google and Facebook have people working on compression, they have a lot of data and a ton to gain by shaving off a few percents here and there.

Facebook in particular has hired the top compression research scientist and rolled 2 compressors based on a novel compression approach that is doing wonder. That could very well be the biggest advance in computing in the last decade.

See zstd (medium) and lz4 (fast):

zstd blows deflate out of the water, achieving a better compression ratio than gzip while being multiple times faster to compress.
lz4 blows lzo and google snappy by all metrics, by a fair margin.

Better yet, they come with a wide range of compression levels that can adjust speed/ratio almost linearly. The slower end pushes against the other slow algorithms, while the fast end pushes against the other faster algorithms. It's incredibly friendly as a developer or a user. All it takes is a single algorithm to support (zstd) with a single tunable setting (1 to 20) and it's possible to accurately tradeoff speed for compression. It's unprecedented.

Of course one could say that gzip already offerred tunable compression levels (1-9) however it doesn't cover a remotely comparable range of speed/ratio. Not to mention that the upper half is hardly useful, it's already slow and making it slower for little benefit.

Note that Google released 3 options, none of which are noteworthy in my opinion: brotli (comparable to xz and zstd, a resources hog), gipfeli (medium-fast, never had a release on github) and snappy (fast, stricly inferior to lz4 by all metrics).

To conclude this. Don't take my word for it. Check out the benchmark results below. You can also download the lzbench project && make && ./lzbench -eall file.txt to see for yourself.

Configuration

Choose a dataset

Different codecs can behave very differently with different data. Some are great at compressing text but horrible with binary data, some excel with more repetitive data like logs. Many have long initialization times but are fast once they get started, while others can compress/decompress small buffers almost instantly.

This benchmark is run against many standard datasets. Hopefully one of them is interesting for you. If not, you can use lzbench to easily run the benchmark on your own data.

Note

The default dataset is selected randomly.

	Name	Source	Description	Size
				}

Choose a machine

Note

The default machine is selected randomly.

	Name	Status	CPU/SoC	Architecture	Cores	Clock Speed	Memory	Platform	Distro	Kernel	Compiler	CSV

Results

Compression Ratio vs. Compression Speed
Compression Ratio vs. Decompression Speed
Compression Speed vs. Decompression Speed
Round-Trip vs. Compression Ratio
Transfer + Processing
Optimal Codecs
Results Table

Note that we do provide access to the raw data if you would prefer to generate your own charts.

Compression Ratio vs. Compression Speed

Compression Ratio vs. Decompression Speed

Compression Speed vs. Decompression Speed

Round Trip Speed vs. Compression Ratio

Transfer + Processing

Sometimes all you care about is how long something takes to load or save, and how much disk space or bandwidth is used doesn't really matter. For example, if you have a file that would take 1 second to load if uncompressed and you could cut the file size in half by compressing it, as long as decompressing takes less than half a second the content is available sooner than it would have been without compression.

Note

For the presets I have tried to provide typical real-world speeds, not theoretical peaks. This can be significantly less than the advertised speed.

When entering custom values please keep in mind that this uses bytes per second, not bits per second. Also, it uses binary prefixes (1 MiB is 1024 KiB, not 1000).

Transfer Speed

Visible Items

Sort

Total time Name

Direction

Decompression Compression Both

Optimal Codecs

Results

Plugin	Codec	Level	Version	Compression Ratio	Compression Speed	Decompression Speed
				{{ result.ratio \| number:2 }} {{ result.ratio \| number:2 }}	{{ result.compression_rate \| formatSpeed }} {{ result.compression_rate \| formatSpeed }}	{{ result.decompression_rate \| formatSpeed }} {{ result.decompression_rate \| formatSpeed }}

FAQ

Will you add «insert compression codec»?

In order to be included in the benchmark, the software must be supported by lzbench.

If the codec is reliable, works on Linux, is accessible from C or C++, and open-source, the odds are good that it can be added.

Will you include proprietary codecs?

If it is available on github and can be integrated into lzbench, as stated above, it could be added. Either way, you can clone the repository and test any codec on your own, without redistributing it or the results. The only widespread compression library I know if is Oodle from RAD Game Tools. Proprietary compression tools are often delivered as executable and with a graphical user interface, whereas the current testing methodology can only apply to a library.

How are the values calculated?

The benchmark collects the compressed size, compression time, and decompression time. Those are then used to calculate the values used in the benchmark:

Ratio: uncompressed size / compressed size
Compression Speed: uncompressed size / compression time
Decompression Speed: uncompressed size / decompression time
Round Trip Speed: (2 × uncompressed size) / (compression time + decompression time)

Sizes are presented using binary prefixes—1 KiB is 1024 bytes, 1 MiB is 1024 KiB, and so on.

What about memory usage?

I would really like to include that data. If you have a good way to capture that data on the C side, please have it merged into lzbench.

Is time CPU time or wall-clock?

lzbench captures wall clock time and runs with realtime priority.

What about multi-threading?

That's a tricky question. For one thing it would explode the number of data points, up to 32 times for a server-class CPU with 32 cores. The benchmark would take even longer to run and the visualizations would be meaningless.

Can you switch the graphs to use a logarithmic scale?

You can. Just click on the label for either axis and it will toggle between linear and logarithmic. It's not intuitive and I would be happy to merge a PR to improve that.

I don't want to switch the default because I think that linear is probably better for most people. Logarithmic tends to be better if all you care about is compression ratio, not speed.

Can you make it easier to compare machines or datasets (instead of codecs)?

I would love to. Feel free to submit a pull request with more dynamic charts. In the meantime, the raw data can be downloaded and loaded into an Excel Pivot Chart.

My library isn't performing as well as I think it should!

Refer to lzbench to find out how the library is used. It may or may not be optimal.

Can you add «insert machine, CPU, architecture, OS, etc.»?

Only if I have, or at least have access to, a machine which fits that description.

I included what I had available. If you would like to donate other hardware I'm willing to add it to the benchmark.

What compiler flags were used?

Refer to lzbench. As far as performance is concerned, most plugins are compiled with -O3 except a few that only work with -O2.

How long does it take to run the benchmark?

For each level of each codec, the benchmark will compress the data repeatedly until 5 seconds has elapsed, then do the same for decompression. Therefore, each level executes for a minimum of 10 seconds. There are {{datasets.length|number}} datasets, for a total of {{(data_points_per_machine*datasets.length*10)|formatDuration}} minumum, per machine.

That said, for larger datasets not all codecs will be able to complete even one iteration in 5 seconds. In practice, it takes a whole day on a desktop, more or less. days.

Will you add different sets of compiler flags?

No. There are a huge number of different possible options, which can be combined in any way, leading to a combinatorial explosion of the number of times the benchmark would have to be run. Given the time it takes to run the benchmark, this is simply not feasible.

If you are curious about specific flags you should run the benchmark yourself. Or, even better, create your own benchmark with the codec from your software tested against your data.

This doesn't work in my browser.

This should work in any modern browser, including Internet Explorer, Firefox, Chrome and mobile equivalents. Make sure javascript is enabled.

Can I have the raw data?

Of course! The table in the "choose a machine" section includes a link to a CSV which can be imported into your favorite spreadsheet application.

It's also available from the data folder of the git repository.

If you do something interesting with it please let us know! Or, even better, submit a pull request so everyone can benefit from your brilliance!

Can I link to a specific configuration?

Some things can be configured by passing parameters in the query string:

dataset: Dataset to show (currently ), the default is selected randomly
machine: Machine to show (currently ), the default is selected randomly
speed: Transfer speed (in KiB/s) for the Transfer + Processing chart (currently )
speed-scale: The default scale for the speed axis of charts (linear or logarithmic, currently )
visible-plugins: A comma-separated list of plugins to show in the scatter plots. All other plugins will be disabled, though they can be re-enabled by clicking on their entry in the legend.
hidden-plugins: A comma-separated list of plugins to hide in the scatter plots. Note that, if used, this parameter overrides visible-plugins

For example, the current configuration is: {{ location }}?dataset={{ dataset }}&machine={{ machine }}&speed={{ calculatedTransferSpeed / 1024 }}&speed-scale={{ speedScale }}.