Lmst

Pruning vs Quantization: Which is Better?
https://arxiv.org/abs/2307.02973

* Pruning remove weights reducing memory footprint
* Quantization (4-bit, 8-bit matrix multiplication; ...) reduces bit-width used for both weights / computation used in neural networks, leading to both predictable memory savings & reductions in the necessary compute

In most cases quantization outperforms pruning.

#MachineLearning #parametrization #weights #pruning #quantization #NeuralNetworks #MathematicalPrecision #matrices

#MathematicalPrecision

Client Info