#MathematicalPrecision

Victoria Stuart 🇨🇦 🏳️‍⚧️persagen
2023-07-10

Pruning vs Quantization: Which is Better?
arxiv.org/abs/2307.02973

* Pruning remove weights reducing memory footprint
* Quantization (4-bit, 8-bit matrix multiplication; ...) reduces bit-width used for both weights / computation used in neural networks, leading to both predictable memory savings & reductions in the necessary compute

In most cases quantization outperforms pruning.

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst