#formatIdentification

What information is in a file format identification report?


by @beet_keeper

In early 2022, I was finally able to get around to writing a paper that I had been thinking about for the better part of a decade. The paper, “Fractal in Detail: What Information Is in a File Format Identification Report?” was published in the Code4Lib journal Issue 53.

The paper takes a deep dive into the fractal contents of file format identification reports exported from tools like Siegfried and DROID.

Let’s take a brief look the article and its contents below.

Continue reading “What information is in a file format identification report?”

#code4lib #code4libJournal #digipres #digitalPreservation #droid #fileFormatAnalysis #fileFormatIdentification #fileFormats #filedriller #formatIdentification #freud #linting #metadata #preservationMetadata #pronom #puid #puids #siegfried #staticAnalysis #technicalMetadata

Abstract from Fractal in Detail: What information is in a file format identification report from the Code4Lib Journal.Abstract from Fractal in Detail: What information is in a file format identification report from the Code4Lib Journal.
2022-12-12

Hi #digipres + #Wikidata + #formatIdentification colleagues! Is there a common strategy for identifying XML-based formats? Should we try to identify the default namespace?

(Question triggered by a strange result brought by Siegfried: it identifies an OWL RDF/XML file as a teach2000 file in Wikidata (wikidata.org/wiki/Q105851165) because of a wrong signature from TrID...)

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst