Revisiting bsdiff as a tool for digital preservation
by @beet_keeper
I introduced bsdiff in a blog in 2014. bsdiff compares the differences between two files, e.g. broken_file_a
and corrected_file_b
and creates a patch
that can be applied to broken_file_a
to generate a byte-for-byte match for corrected_file_b
.
On the face of it, in an archive, we probably only care about corrected_file_2
and so why would we care about a technology that patches a broken file?
In all of the use-cases we can imagine the primary reasons are cost savings and removing redundancy in file storage or transmission of digital information. In one very special case we can record the difference between broken_file_a
and corrected_file_b
and give users a totally objective method of recreating corrected_file_b
from broken_file_a
providing 100% verifiable proof of the migration pathway taken between the two files.
#ac3 #Archives #audio #audiovisual #Audit #authenticity #av #Bash #bsdiff #checksums #Code4Lib #corruption #corruptionIndex #digipres #DigitalArchiving #DigitalForensics #digitalLiteracy #DigitalPreservation #DigitalStorage #diplomatics #FileFormats #glitch #glitchAudio #GlitchArt #integrity #PreservationAnalysis #PreservationMetadata #provenance #sensitivityIndex #Storage