#preservationMetadata

Revisiting bsdiff as a tool for digital preservation


by @beet_keeper

I introduced bsdiff in a blog in 2014. bsdiff compares the differences between two files, e.g. broken_file_a and corrected_file_b and creates a patch that can be applied to broken_file_a to generate a byte-for-byte match for corrected_file_b.

On the face of it, in an archive, we probably only care about corrected_file_2 and so why would we care about a technology that patches a broken file?

In all of the use-cases we can imagine the primary reasons are cost savings and removing redundancy in file storage or transmission of digital information. In one very special case we can record the difference between broken_file_a and corrected_file_b and give users a totally objective method of recreating corrected_file_b from broken_file_a providing 100% verifiable proof of the migration pathway taken between the two files.

Continue reading “Revisiting bsdiff as a tool for digital preservation”

#ac3 #archives #audio #audiovisual #audit #authenticity #av #bash #bsdiff #checksums #code4lib #corruption #corruptionIndex #digipres #digitalArchiving #digitalForensics #digitalLiteracy #digitalPreservation #digitalStorage #diplomatics #fileFormats #glitch #glitchAudio #glitchart #integrity #preservationAnalysis #preservationMetadata #provenance #sensitivityIndex #storage

Image shows two layered waveforms, one a corrupt waveform and the other a good original. The corrupt form is in red and the uncorrupt one is green.Image shows one corrupted file side-by-side with its non-corrupted partner through the lens of a diff tool. The differences are highlighted on the command line in red and green.Image shows a hexdump with non-null bytes colorized making it easier to see differences, and ultimately how sparse the data is in the file.

What information is in a file format identification report?


by @beet_keeper

In early 2022, I was finally able to get around to writing a paper that I had been thinking about for the better part of a decade. The paper, “Fractal in Detail: What Information Is in a File Format Identification Report?” was published in the Code4Lib journal Issue 53.

The paper takes a deep dive into the fractal contents of file format identification reports exported from tools like Siegfried and DROID.

Let’s take a brief look the article and its contents below.

Continue reading “What information is in a file format identification report?”

#code4lib #code4libJournal #digipres #digitalPreservation #droid #fileFormatAnalysis #fileFormatIdentification #fileFormats #filedriller #formatIdentification #freud #linting #metadata #preservationMetadata #pronom #puid #puids #siegfried #staticAnalysis #technicalMetadata

Abstract from Fractal in Detail: What information is in a file format identification report from the Code4Lib Journal.Abstract from Fractal in Detail: What information is in a file format identification report from the Code4Lib Journal.

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst