So @rl_dane introduced #bzip3 to me to use instead of #bzip2. Let's turn some bz2
files into bz3
to see the difference.
First example: 90k opus files
hey snips wake word dataset. It has ~90k opus files and a tar file of 3.1GB
. bzip2
produces the same 3.1GB
which is as expected. bzip3 created 3.0GB
but used tons of computation power. Not worth the 100MB
Second example: Windows 7 virtual box VM image
Windows7.vdi
it's Windows 7 VM image for the "special" days. I think I have to get rid of it. But while it is still there, let's see how each will perform. It is 16GB
uncompressed. bzip2 -9
is 7.0GB
. bzip3
is 6.3GB
but at the expense of like 3x CPU time. Deleting all of them anyway. Down with Windows.
Third example: Pure XML text file
Pure XML file. It's Persian and English characters. Uncompressed is 1.7GB
. bzip2 -9
is 276M
while bzip3
is 260MB
Final example: Creating a simple bomb
So I did this:
dd if=/dev/zero of=./justzero bs=2G count=6
So now I have a 16GB
with only zero bytes. bzip2 -9
is 672KB
. bzip3
is 46KB
.
Conclusion
Thank you @rl_dane
Real nice thing!
#compression #gzip #zip #filecompression #textcompression #datacompression #linux #unix #tech