#BZip2

Dendrobatus AzureusDendrobatus_Azureus@bsd.cafe
2025-11-26

WARNING

TIL that bunzip2 in Linux / Debian / MX Linux deletes, I repeat, DELETES the original archive when you run it in vanilla form

`bunzip2 dfly-x86_64-6.4.2_REL.iso.bz2`

resulted in the iso being unpacked with deletion of the original WITHOUT WARNING

Since I'm on expensive LTE+ 4G internet that is significant.

I have not used bunzip2 in years, but should have remembered this hostile default. It was not that way IIRC
Do I need to read the manpages of commands I have not used in years in Linux now? Why was the default changed

Luckily I have copies of the bzip2 iso on multiple partitions of HDD and SSD

#bzip2 #bunzip2 #sh #bash #warning #TIL #Linux #OpenSource #POSIX

2025-10-15

Ok #Rust #bzip2 implementation done … in principle 😄 The compression test ran on a #LoremIpsum text file. A more realistic sample would have probably resulted in a much worse ratio. The BWT encoding step is ultra slow because I chose a naive approach instead of using a runtime-optimized but more complex algorithm. I'm also producing only one #Huffman tree for the whole file, which will also significantly degrade compression performance in longer and less homogeneous inputs.

Screenshot of a program output in the terminal. We can see an input text file has been compressed with a ratio of over 85%.
2025-09-28

Ok this is fun!

My #bzip2 implementation in #Rust is coming along nicely and I've been loosely following a #TestDrivenDevelopment approach which makes working with LLMs a breeze. If an AI assisted rework turns tests red you can immediately start debugging, tuning and optimizing, which is nice and focused.

I'm done with everything including the Move-To-Front Transform encoding and am already getting text file sizes down by 25 to 30%.

Next up: Huff huff huff!

2025-09-25

How to get back into a programming language?

"Do small hobby projects", they said.

"It will be fun!", they said.

So here I am reading university lecture notes about how to build suffix arrays in O(n) so I can optimize a Burrows-Wheeler-Transform for the #bzip2 implementation I inexcusably started writing so I could get back into #Rust.

GripNewsGripNews
2025-09-21

🌘 使用 Ada 從零開始編寫高效 BZip2 編碼器 - 第三部分:熵編碼(結合 AI/機器學習!)
➤ 透過機器學習優化 BZip2 壓縮的關鍵:Huffman 樹的智慧分羣
gautiersblog.blogspot.com/2025
本文是關於使用 Ada 編寫 BZip2 編碼器的系列文章的第三部分,重點探討熵編碼階段。作者解釋了 BZip2 格式在熵編碼上提供的靈活性,並透過 Calgary 和 Canterbury 語料庫的實際壓縮數據,展示了不同 BZip2 實作之間存在的差異。文章進一步闡述,這些差異主要源於 Huffman 樹的初始分配方式。作者引入了機器學習中的 k-means 聚類演算法,說明瞭其如何應用於 BZip2 符號的初始分羣,以期找到更優化的 Huffman 樹配置,從而提升壓縮效率。文中透過簡單的幾何圖例和政治黨派的類比,生動地闡述了初始分羣對最終結果的重要性,尤其在高
程式設計

N-gated Hacker Newsngate
2025-09-21

Ah yes, because what the world desperately needs is yet another encoder, but this time dressed up in and sprinkled with the ✨ magic ✨ of AIMachineLearning™. It's the classic tale: boy meets algorithm, algorithm meets Ada, and everyone lives happily ever after in a world of compressed bits nobody asked for. 🤷‍♂️💾
gautiersblog.blogspot.com/2025

Hacker Newsh4ckernews
2025-09-21
N-gated Hacker Newsngate
2025-08-26

🎉 Behold, the groundbreaking revelation: is not the Holy Grail of data formats! 🚀 Apparently, using xz for digital preservation is like using a sieve as a bucket—bound to fail. Who knew? 🤦‍♂️ Stick to , , or if you want actual functionality and avoid sinking your data into the abyss of inadequacy. 🔍💾
nongnu.org/lzip/xz_inadequate.

N-gated Hacker Newsngate
2025-08-16

🚀 Oh, the riveting continues! Witness as , the language nobody asked for, takes on yet another feat: building a that absolutely nobody needed – in time! 🤯 Part 2, because once wasn't enough! 🤡
gautiersblog.blogspot.com/2025

GripNewsGripNews
2025-08-16

🌘 Gautier 的部落格:幾日內從零開始以 Ada 編寫具競爭力的 BZip2 編碼器 - 第二部分
➤ Ada 語言的威力:親手打造高效 BZip2 編碼器
gautiersblog.blogspot.com/2025
本文是 Gautier 部落格系列文章的第二部分,詳述他如何利用 Ada 語言,僅在數日內從頭開始建構一個效能足以與現有 BZip2 編碼器競爭的工具。作者將深入探討其技術實現細節,包括資料結構、演算法選擇以及程式碼優化策略,旨在展示 Ada 在高效能系統開發上的潛力。
+ 能夠用 Ada 在幾天內完成這麼複雜的專案,真是令人驚嘆!作者的技術功力深厚。
+ 對於想了解 BZip2 內部運作和 Ada 效能的人來說,這篇文章提供非常寶貴的見解。

2025-08-16

Как написать bzip2-архиватор на Python: разбираем преобразование Барроуза-Уилера

Привет! Я Рома, бэкендер-питонист в KTS . Это вторая статья в моем цикле об алгоритме архивации bzip2 . Первую можно прочитать здесь , но для понимания сегодняшней темы она необязательна. Ниже я разберу преобразование Барроуза-Уилера — ключевой этап сжатия bzip2.

habr.com/ru/companies/kts/arti

#архиваторы #архивация #сжатие_данных #алгоритмы #bzip2архиватор #bzip2 #bwt

2025-08-15

Как написать bzip2-архиватор на Python: разбираем преобразование Барроуза-Уилера

Привет! Я Рома, бэкендер-питонист в KTS . Это вторая статья в моем цикле об алгоритме архивации bzip2 . Первую можно прочитать здесь , но для понимания сегодняшней темы она необязательна. Ниже я разберу преобразование Барроуза-Уилера — ключевой этап сжатия bzip2.

habr.com/ru/companies/kts/arti

#архиваторы #архивация #сжатие_данных #алгоритмы #bzip2архиватор #bzip2 #bwt

2025-08-03

@ermo

I'm very slowly creeping towards having checksum files auto-built. There's a missing part that needs to be done.

But I'm at least over one initial hurdle of switching from pax -z to pax -j. Not that that helps in the #FreeBSD 10 case because FreeBSD 10's pax does not have -j.

(Make an archive with -z and it isn't idempotent, because #gzip has a timestamp.)

So there's still the installing #GhostBSD mountain to climb, and seeing whether that has pax -j yet. (-:

#bzip2 #pax

2025-07-24

You'll find this benchmarking adventure in its own blog post "Performance lessons of implementing lbzcat in Rust" anisse.astier.eu/lbzip2-rs.htm

#RustLang #lbzip2 #bzip2 #benchmarking #performance

2025-07-24

lbzip2 internally implements a full task-scheduling runtime, and splits tasks at a much smaller increments; it supports bit-aligned blocks (that are standard in bzip2 format), while my Rust implementation purposefully doesn't: I wanted to rely on the bzip2 crate that only supports byte-aligned buffers, and keep code simple (which I failed IMHO). FIN 15/15

#lbzip2 #bzip2

2025-07-24

That's it for the benchmarking! You can find my implementation at github.com/anisse/lbzip2-rs/ ; it's very much PoC-quality code, so use at our own risks! I chose to manually spawn threads instead of using rayon or an async runtime; there are other things I'm not proud of, like busy-waiting instead of condvar for example. 14/N

#lbzip2 #bzip2 #RustLang #async #rayon

2025-07-24

We've been running benchmarks on single CPU cores since the start. What if we unleash the parallel mode? Here are the results: lbzip2 is still much faster on the 8 cores; my implementation holds up fine, but is only 80% faster than bzip2, while running on 8 cores. On bigger files though, it starts to pay off, with up to 6.3x faster, while lbzip2 can go to 7.7x. 13/N

#lbzip2 #bzip2

> $ hyperfine -N -L program bzcat,lbzcat,./target/release/lbzcat "{program} readmes.tar.bz2"
Benchmark 1: bzcat readmes.tar.bz2
  Time (mean ± σ):      74.8 ms ±  19.1 ms    [User: 73.5 ms, System: 0.6 ms]
  Range (min … max):    56.0 ms … 104.3 ms    50 runs
 
Benchmark 2: lbzcat readmes.tar.bz2
  Time (mean ± σ):      29.3 ms ±   3.6 ms    [User: 64.7 ms, System: 2.7 ms]
  Range (min … max):    16.1 ms …  40.1 ms    85 runs
 
Benchmark 3: ./target/release/lbzcat readmes.tar.bz2
  Time (mean ± σ):      44.1 ms ±   2.8 ms    [User: 117.7 ms, System: 2.7 ms]
  Range (min … max):    32.9 ms …  47.7 ms    63 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  lbzcat readmes.tar.bz2 ran
    1.51 ± 0.21 times faster than ./target/release/lbzcat readmes.tar.bz2
    2.56 ± 0.73 times faster than bzcat readmes.tar.bz2$ hyperfine -m 1 -N -L program bzcat,./target/release/lbzcat,lbzcat "{program} contains-Q484170.json.bz2"
Benchmark 1: bzcat contains-Q484170.json.bz2
  Time (abs ≡):        27.194 s               [User: 27.055 s, System: 0.071 s]
 
Benchmark 2: ./target/release/lbzcat contains-Q484170.json.bz2
  Time (abs ≡):         4.237 s               [User: 33.433 s, System: 0.188 s]
 
Benchmark 3: lbzcat contains-Q484170.json.bz2
  Time (abs ≡):         3.513 s               [User: 26.614 s, System: 0.165 s]
 
Summary
  lbzcat contains-Q484170.json.bz2 ran
    1.21 times faster than ./target/release/lbzcat contains-Q484170.json.bz2
    7.74 times faster than bzcat contains-Q484170.json.bz2
2025-07-24

Overall, my Rust implementation (using the bzip2-rs crate) is (much) slower than lbzip2, and faster than bzip2. For some reasons, it also sees huge performance boost on performance cores, most likely due to better IPC and branch prediction. 12/N

#lbzip2 #bzip2

$ taskset -c 7 hyperfine -N -L program bzcat,"lbzcat -n1",./target/release/lbzcat "{program} readmes.tar.bz2"Benchmark 1: bzcat readmes.tar.bz2
  Time (mean ± σ):      83.5 ms ±  13.7 ms    [User: 82.3 ms, System: 0.7 ms]
  Range (min … max):    51.3 ms … 105.0 ms    58 runs
 
Benchmark 2: lbzcat -n1 readmes.tar.bz2
  Time (mean ± σ):      37.7 ms ±   2.4 ms    [User: 36.6 ms, System: 0.9 ms]
  Range (min … max):    36.3 ms …  47.2 ms    63 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (47.2 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Benchmark 3: ./target/release/lbzcat readmes.tar.bz2
  Time (mean ± σ):      54.4 ms ±   2.3 ms    [User: 53.3 ms, System: 0.8 ms]
  Range (min … max):    48.1 ms …  57.5 ms    62 runs
 
Summary
  lbzcat -n1 readmes.tar.bz2 ran
    1.44 ± 0.11 times faster than ./target/release/lbzcat readmes.tar.bz2
    2.22 ± 0.39 times faster than bzcat readmes.tar.bz2$ taskset -c 3 hyperfine -N -L program bzcat,"lbzcat -n1",./target/release/lbzcat "{program} readmes.tar.bz2"
Benchmark 1: bzcat readmes.tar.bz2
  Time (mean ± σ):     116.3 ms ±  22.7 ms    [User: 113.1 ms, System: 1.7 ms]
  Range (min … max):    86.8 ms … 147.4 ms    34 runs
 
Benchmark 2: lbzcat -n1 readmes.tar.bz2
  Time (mean ± σ):      62.3 ms ±   1.2 ms    [User: 59.0 ms, System: 2.4 ms]
  Range (min … max):    61.1 ms …  68.8 ms    47 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: ./target/release/lbzcat readmes.tar.bz2
  Time (mean ± σ):     109.6 ms ±  17.4 ms    [User: 106.0 ms, System: 2.4 ms]
  Range (min … max):    95.2 ms … 143.5 ms    30 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  lbzcat -n1 readmes.tar.bz2 ran
    1.76 ± 0.28 times faster than ./target/release/lbzcat readmes.tar.bz2
    1.87 ± 0.37 times faster than bzcat readmes.tar.bz2

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst