Pruning vs Quantization: Which is Better?
https://arxiv.org/abs/2307.02973
* Pruning remove weights reducing memory footprint
* Quantization (4-bit, 8-bit matrix multiplication; ...) reduces bit-width used for both weights / computation used in neural networks, leading to both predictable memory savings & reductions in the necessary compute
In most cases quantization outperforms pruning.
#MachineLearning #parametrization #weights #pruning #quantization #NeuralNetworks #MathematicalPrecision #matrices