Building a system that deduplicates audio files based on acoustic #fingerprinting. The algorithm used is a normalized Hamming distance over two vectors of 32-bit integers, which represent the raw AcoustID #Chromaprint fingerprint.
When testing the identification of the same audio in different formats, here are my findings:
WAV <> AIFF <> M4A → (identical, no surprise here, all lossless)
MP3 320kbps (CBR) <> WAV → 99.9% similarity.
MP3 VBR (Q5) <> WAV → 99.3%
MP3 (128kbps CBR) <> WAV → 99.3%