#bzip3

R.L. Dane :Debian: :OpenBSD: :FreeBSD: 🍵 :MiraLovesYou:rl_dane@polymaths.social
2025-11-26

@taylor

Yeah! Of course, this is still a block-sorting compression algorithm*, so you wont get much advantages over zstd or xz when dealing with datasets with more inherent entropy like binary files or whatnot, but it does miracles for text.

* Of course I know what that means. Tell you what, you tell me what you think it means, and I'll tell you if you're right. 🤣

Here's an example with non-text data, where you see that #bzip3 isn't as strong:

Pictures$ for x in cat "gzip -9" "bzip2 -9" "bzip3" "zstd --ultra -22" "xz -9e"; do $x < Hobbes.jpg |wc -c |tr "\n" "\t"; echo "$x"; done |sort -rn
3445659 cat
3444164 xz -9e
3441839 zstd --ultra -22
3439158 gzip -9
3384450 bzip2 -9
3274433 bzip3

WAIT.
WHAT.

Let's try something else...

Videos$ f="Federated Timeline.webm"; for x in cat "gzip -9" "bzip2 -9" "bzip3" "zstd --ultra -22" "xz -9e"; do $x < "$f" |wc -c |tr "\n" "\t"; echo "$x"; done |sort -rn
1231940 bzip2 -9
1231269 bzip3
1227060 xz -9e
1226931 cat
1226421 zstd --ultra -22
1226241 gzip -9

WHAT?!? THE WORLD IS BROKEN!!!

TrYiNg AgAiNnNn...

Documents$ f="Thinkpad x200 hardware maintenance manual.pdf"; for x in cat "gzip -9" "bzip2 -9" "bzip3" "zstd --ultra -22" "xz -9e"; do $x < "$f" |wc -c |tr "\n" "\t"; echo "$x"; done |sort -rn
8942833 cat
8657277 bzip2 -9
8617801 gzip -9
8592319 bzip3
8568484 xz -9e
8535244 zstd --ultra -22

Ok, that makes sense. That's what I was expecting.

YOU SAW NOTHING ELSE. DON'T ASK ME ANY MORE QUESTIONS. 🤣

P.S., here's another interesting one:

138240138   cat (large BMP file)
  3768642	gzip -9
  3143455   PNG format
  1987020	zstd --ultra -22
  1592854	bzip2 -9
  1512291	bzip3
  1501540	xz -9e
R.L. Dane :Debian: :OpenBSD: :FreeBSD: 🍵 :MiraLovesYou:rl_dane@polymaths.social
2025-11-26

For text with a lot of repetition, #bzip3 still blows my mind. 😆

rld@Intrepid:Documents$ for x in cat "gzip -9" "zstd --ultra -22" "xz -9e" "bzip2 -9" bzip3; do $x < weatherlog-2024.txt |wc -c |tr "\n" "\t"; echo "$x"; done
 1735300	cat
   80423	gzip -9
   63275	zstd --ultra -22
   53516	xz -9e
   52374	bzip2 -9
   40645	bzip3
rld@Intrepid:Documents$ echo 1735300/40645 |bc -l
42.69405830975519744125

#Lossless #Compression #LosslessCompression

P.S. times:

real 1.49 zstd --ultra -22
real 0.94 xz -9e
real 0.23 bzip2 -9
real 0.07 gzip -9
real 0.06 bzip3
real 0.00 cat

DANG. 😂

R.L. Dane :Debian: :OpenBSD: :FreeBSD: 🍵 :MiraLovesYou:rl_dane@polymaths.social
2025-05-23
Farooq | فاروق [Master Patata]farooqkz@cr8r.gg
2025-05-20

So I used #bzip3 to compress #shamela library(the big 12GB ISO for Windows) and now it's only 1.4GB.

Edit: Correction. The FS ran out of space and left that file. bzip3 makes the file bigger not smaller.

Farooq | فاروق [Master Patata]farooqkz@cr8r.gg
2025-05-20

So @rl_dane introduced #bzip3 to me to use instead of #bzip2. Let's turn some bz2 files into bz3 to see the difference.

First example: 90k opus files

hey snips wake word dataset. It has ~90k opus files and a tar file of 3.1GB. bzip2 produces the same 3.1GB which is as expected. bzip3 created 3.0GB but used tons of computation power. Not worth the 100MB

Second example: Windows 7 virtual box VM image

Windows7.vdi it's Windows 7 VM image for the "special" days. I think I have to get rid of it. But while it is still there, let's see how each will perform. It is 16GB uncompressed. bzip2 -9 is 7.0GB. bzip3 is 6.3GB but at the expense of like 3x CPU time. Deleting all of them anyway. Down with Windows.

Third example: Pure XML text file

Pure XML file. It's Persian and English characters. Uncompressed is 1.7GB. bzip2 -9 is 276M while bzip3 is 260MB

Final example: Creating a simple bomb

So I did this:

dd if=/dev/zero of=./justzero bs=2G count=6

So now I have a 16GB with only zero bytes. bzip2 -9 is 672KB. bzip3 is 46KB.

Conclusion

Thank you @rl_dane

Real nice thing!

#compression #gzip #zip #filecompression #textcompression #datacompression #linux #unix #tech

R.L. Dane :Debian: :OpenBSD: :FreeBSD: 🍵 :MiraLovesYou:rl_dane@polymaths.social
2025-04-15

#bzip3, y'all!

17,396,992 Apr 15 18:52 powertrack-Excelsior-2024.txt
   564,163 Apr 15 18:52 powertrack-Excelsior-2024.txt.bz3
~ $ tail powertrack.txt 
2025-04-15 18:44 Battery 0: Not charging, 81%; 0.00488959 W; uptime:  4:48
2025-04-15 18:45 Battery 0: Not charging, 81%; 0.00488959 W; uptime:  4:49
2025-04-15 18:46 Battery 0: Not charging, 81%; 0.00488959 W; uptime:  4:50
2025-04-15 18:47 Battery 0: Not charging, 81%; 0.00488959 W; uptime:  4:51
2025-04-15 18:48 Battery 0: Not charging, 81%; 0.00488959 W; uptime:  4:52
2025-04-15 18:49 Battery 0: Not charging, 81%; 0.00488959 W; uptime:  4:53
2025-04-15 18:50 Battery 0: Not charging, 81%; 0.00488959 W; uptime:  4:54
2025-04-15 18:51 Battery 0: Not charging, 81%; 0.00488959 W; uptime:  4:55
2025-04-15 18:52 Battery 0: Not charging, 81%; 0.00488959 W; uptime:  4:56
2025-04-15 18:53 Battery 0: Not charging, 81%; 0.00488959 W; uptime:  4:57
R.L. Dane :Debian: :OpenBSD: :FreeBSD: 🍵 :MiraLovesYou:rl_dane@polymaths.social
2025-04-01

#bzip3 continues to amaze me:

-rw-r--r-- 1 ~~~ ~~~ 100M Apr  1  2025 outbox.json
$ simplify $(bzip3 < outbox.json |wc -c)
4.57 MiB
$ simplify $(xz -9e < outbox.json |wc -c)
4.69 MiB
$ simplify $(zstd --ultra -22 < outbox.json |wc -c)
5.06 MiB

(I didn't time it, but it was much faster than the other two)

Also, just in case anyone's curious, simplify (poor name pick, but I couldn't think of anything better) is just a bash function for converting byte counts to an SI unit:

function simplify { #Reduces a big bytes count down to megabytes or whatnot
    local steps num
    [ $1 ] || ( warn "simplify() called without parameters\n  (requires a number of bytes with no unit name)"; return 1 )
    steps=0
    num=$1
    while [[ $(echo "$num > 1024" |bc) == 1 ]]  #bc has to be used because num is a float
    do
        let steps++
        num=$(echo "$num/1024" |bc -l)
    done
    #Cut off after two decimal place:
    num=$(echo "$num" |sed 's/\(\.[0-9][0-9]\)[0-9]*$/\1/')
    printf "$num "
    case $steps in
        0)  echo b;;
        1)  echo KiB;;
        2)  echo MiB;;
        3)  echo GiB;;
        4)  echo TiB;;
        5)  echo PiB;;
        6)  echo EiB;;
        7)  echo ZiB;;
        8)  echo YiB;;
        *)  echo "1024 ^ $steps bytes";;
    esac
}
2025-02-18

Tein vähän #pakkaus-kokeiluja törmättyäni taannoin uuteen pakkaimeen, #Bzip3:een. Ainakin minun tiedostojeni #varmuuskopiointi-pakkaamisessa se hävisi pakkausteholtaan selvästi #XZ:lle, jota olen varmuuskopiointiin käyttänyt, ja pakkausnopeudeltaan selvästi #ZStd:lle, johon siirtymistä olin aprikoinut. #atkjuttuja

Gea-Suan Lingslin@abpe.org
2025-02-02

BZip3

在 Hacker News 上看到 BZip3 的連結:「Bzip3: A spiritual successor to BZip2 (github.com/kspalaiologos)」。

雖然名字看起來與 bzip2 有關,但看起來是不同的人弄出來的東西,不過有些經典的演算法有留下來用,像是 Burrows-Wheeler transform。

另外值得一提的是,bzip2 是 1996 年出的 (不過 1.0 大約是 2000 年時出的),BZip3 的第一個 release 在 2022 年,這段時間也累積了不少有趣的演算法可以用。

無損壓縮中如果期望有比較的壓縮率,目前比較常用的應該是 LZMA 類的演算法 (差不多是 2001 年出現的),用的工具通常會是 X

blog.gslin.org/archives/2025/0

#Computer #Murmuring #Software #bzip2 #bzip3 #compression #lzma #ratio #xz

2024-12-03

I love playing around with #compression

In this case, it's all text-based data in csv and xml formats.

Size:

32,696,320 202411.tar
 4,384,020 202411.tar.bz2
 4,015,912 202411.tar.zst
 3,878,583 202411.tar.bz3
 3,730,416 202411.tar.xz

zstd was invoked using zstd --ultra -22
xz was invoked using xz -9e
bzip2 was invoked using bzip2 -9
bzip3 has no compression level options

Speed:

zstd    54.31user 0.25system 0:54.60elapsed 99%CPU
xz      53.80user 0.06system 0:53.93elapsed 99%CPU
bzip2    5.33user 0.01system 0:05.35elapsed 99%CPU
bzip3    3.98user 0.02system 0:04.01elapsed 99%CPU

Maximum memory usage (RSS):

zstd    706,312
xz      300,480
bzip3    75,996
bzip2     7,680

*RSS sampled up to ten times per second during execution of the commands in question

#bzip3 is freaking amazing, yo.

#DataCompression #bzip #bz3 #zstd #zst #zstandard #xz #lzma
#CouldaBeenABlost ;)

2024-01-13

Found some #bzip3 options that might make for even better and faster compression.
Looking forward to trying them in the future.

       -b --block N
              Set the block size to N mebibytes. The minimum is 1MiB, the maximum is 511MiB.

       -j --jobs N
              Set the amount of parallel worker threads that process one block each.
2023-11-07

friendship ended with lrzip-next. now #bzip3 is my best friend.

2023-10-22

Okay, #bzip3 is AMAZING. For text-based data, it's about as fast as zstd (haven't fully tested this), and gets better compression than xz -9e!!!

CK's Technology NewsCKsTechnologyNews
2022-10-31

is available for Sid

Not sure about the unstable notice, the package works just fine on my end.

Official page and tracker
tracker.debian.org/pkg/bzip3

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst