#WDPD

@triciap love this! Thanks for making my day! i think this is totally a digipres hit, along with the #wdpd music vids. also, i think you need to upload and add this to the #zenodo #digipres community!

PRONOM’s dustiest records


by @beet_keeper

Tyler’s recent blog post for the PRONOM Hack-a-thon Week 2024 (my previous for this week), brought up an interesting point about two of PRONOM’s oldest outline records, Real Video Clip (fmt/204) and Real Video (x-fmt/277). How did they end up in PRONOM?

NB. because of the complexity of this post, it may be easier to read in original blog form, than on Mastodon here: exponentialdecay.co.uk/blog/pr

Tyler suggests:

I assume PRONOM originally added these based on MIME types available.

I thought I knew the answer, but it prompted a forensic look at the records to see if what I thought I knew aligned with reality!

#digipres #DigitalPreservation #DROID #FileFormat #FileFormats #OIT #Oracle #OutsideIn #PRONOM #Stellent #WDPD #WDPD2024

C3PO narrates the story of Star Wars to the Ewoks in Return of the JediExample of the Save As menu in ExcelAn example of Google Sheets glitching while writing this entry

simpledroid: completing the loop


by @beet_keeper

It’s nearing the end of 2024 and that must mean a PRONOM hackathon as part of the World Digital Preservation Day (#WDPD2024).

My contribution is a follow-up on my work earlier in the year to produce a valid DROID signature file from Wikidata in wddroidy.

Continue reading “simpledroid: completing the loop”

#digipres #DigitalPreservation #DROID #FileFormats #PRONOM #Python #siegfried #SkeletonTestCorpus #WDPD #WDPD2024

Lovebot by Matthew Del Degan. Taken in Toronto in 2017.Screenshot of DROID identifying results using the simplified DROID fileA screenshot of the simplified DROID XML. The signatures are encoded in plain-text from PRONOM itself with no additional optimizations or modiification.
Tim Allisontallison
2024-11-08

Day late and a dollar short for , but this article and the recent bout of "Excel can't read these files" are eye-opening.

rzymek.github.io/post/excel-zi

Before I dive into any #wdpd catchup ... my best piece of news on this year's #wdpd and just recently in general:
OPF has annouced Neil Jeffries as the new Executive Director! I'm really looking forward to working with Neil and am stoked to see what good he'l bring to OPF!

openpreservation.org/news/neil

Happy #WDPD! Today we celebrate all things to do with digital preservation across the world.

This year, the day’s theme is Preserving Our Digital Content: Celebrating Communities. DANS wants to celebrate today by shining some light on the recent community-driven developments in the way repositories are defined and described.

As a member of CoreTrustSeal, DANS was involved with co-writing a blog with UK Data Service.

Read more about it here 👉 edu.nl/fu6x4

Georgia MoppettGeorgia@digipres.club
2024-11-07

Incredible scenes happening in the #digipres socials today

Find your perfect file format fling this #WDPD with a #digitalpreservation dating game created by some brilliantly creative minds ...

openpreservation.org/blogs/wdp

ZB MEDZBMED
2024-11-07

Would you like to dive deeper into the topic and find out what the hurdles are?
Then read our blog article on the occasion of last year's on the topic of automating :
blog.zbmed.de/automated-digita

I got a response to my paper PREMIS Events Through an Event-Source Lens.

There are two strange choices made by this response. I’ll touch on the more personal one at the end, but first, what does the response say?

It’s not entirely clear. 

If the response says that, “it is a choice to implement PREMIS?” And that “PREMIS can be implemented in different ways?” “and that it’s technology agnostic” Then yes, 100% that’s basically the driver for my original paper and once you read it holistically, instead of dissecting it and cherry-picking points, you will probably read it that way as well.

As I wrote in my first blog response to the publication of my paper in 2023, Tessella’s Rob Sharpe’s 2013 presentation was an important reference point for me and we’ll revisit it below, but Rob labors that PREMIS is technology agnostic and can be represented in other formats, and since 2013 I haven’t seen enough conversation or discussion about that, and I wanted to amplify that message by looking at PREMIS in an event-sourced model as an aggregation. 

If there’s something more substantive in the PREMIS Editorial Committee’s (EC) response, then I feel it’s lost in its own stylistic choices (to focus on what I might have been saying rather than taking a show don’t tell approach to clarifying their more salient points.).

I wonder if it might have been handled differently? I am pretty easy to find these days, and so reaching out to clarify any of my thinking might have been one way; perhaps there was a way to collaborate on a response; perhaps most of of the EC’s concerns (if there are any) could have been handled with a joint editorial note in the original paper to clarify that my words are not an authoritative source on PREMIS, rather, PREMIS (events) were largely a vehicle to describe more the benefits of an event-sourced architecture and that you still need to consider and interpret the PREMIS documentation and guidance for yourself before implementing it in your own solutions.

Going a different direction

The essence of the original paper is this: (from my perspective) PREMIS is not a schema to be implemented in the back-end of any digital preservation system. Should it be still be deemed a relevant technology, it might be studied in your requirements analysis, and you would make sure that your own system is not lossless in any way as to effect PREMIS “conformance”, but you would not match your “schema” to PREMIS, you would ensure that you can output it, “present it” that is, it would become one representation of data that can be generated from your system out of many. One view, or as I clearly point out, an aggregation, in the case we have chosen an event-based architecture. 

This is not at odds with the (so-called) corrections that have been provided to me in the Code4Lib journal article from the PREMIS EC.

That being said, a further thesis is that PREMIS events are often a lossy, stateful representation of data in a digital preservation system. PREMIS represents one-dimensional state (or slices of state) over a period of time. In the modern engineering world, we have at our disposal methods of capturing, greedily, all events in the life of a digital object and doing that will create a richer view of the life of that object, and, as a representation of that data, a richer PREMIS view of an object and its events over time if so desired.

The authors of the EC response labor heavily on their perception of a misunderstanding on my part about PREMIS and they can choose to do that but what may look like a misunderstanding of PREMIS is not a misunderstanding of technology:

Conformance, in general, is defined as:

> how well something, such as a product, service or a system, meets a specified standard

And the PREMIS EC have decided to attach levels to conformance (also graduated levels, and degrees) to “quantify(ing) the degree to which PREMIS has been implemented”, three of which are anchored in implementation, apparently, three distinct implementations.

  1. Mapping, indirect or otherwise,
  2. Export,
  3. Direct implementation,

I write:

PREMIS conformance should be separate from representation. If we acknowledge PREMIS is at least one important representation of preservation metadata, i.e. for its ability to act as an interface to those looking to interpret preservation metadata, then whether it exists logically on disk, or is generated through an event sourced projection, is irrelevant. How a representation complies with the PREMIS data model remains of greater importance, but this is measured from the same eventual view, whatever intermediate abstraction it sits within.

The PREMIS EC can choose to have three graduated levels of implementation to quantify degree of implementation. They can also make it clear level three (internal representation) is not necessarily the final goal, but it might benefit you; but If you’re not the PREMIS EC, don’t go near it, there’s no need. 

I posit that conformance is only how well you can map to PREMIS or access something PREMIS-like that satisfies its data model. Your goal is to look at PREMIS as one interface you can potentially satisfy (you still need to describe objects uniquely; you need to describe agents engaging with them; rights need to sit somewhere), and once you can satisfy that interface you can access it in many different ways, and conformance should be measured against that, if PREMIS conformance is deemed valuable.

Put simply, conformance does not require levels. Levels may simply be the wrong word, these are just guides you might follow to demonstrate conformance (or ways that someone might audit a system to determine conformance).

The EC clipped this from one of the points they responded to:

Is level three (internal implementation) reasonable in today’s software development world, is it reasonable in today’s environmental climate?

Do we sacrifice the potential to store and access other different, richer, more-complex, (or less-complex), representations about other cross-sections of our data at the expense of putting PREMIS at the core of our digital preservation system? – No. We can make it an output of many, and use its schema and data dictionary to output it, but we don’t build around it, we essentially report around it. 

They argue: 

there are also benefits in choosing to take an internationally defined and agreed data model and use that as the basis of your system. 

Well, if it’s internationally defined and agreed, let’s just do that! 🤷

The benefits of not implementing an external data model are broadly around increased control and flexibility, however the trade-off to consider is the likely loss of easy interoperability and exchange with other systems.

If you re-frame PREMIS as an interchange-format and you can prove that as useful, you absolutely have my buy-in and I will have designed you a system that doesn’t preclude a PREMIS-like output, i.e. a way of aggregating more detailed information in your system and outputting PREMIS as a representation (a format) for others to understand.

The resurgence of OAIS?

From the EC: 

There are two responses to this, the first is to note that access has always been considered a part of Digital Preservation, to the point that one of the functional areas of the OAIS model is Access.

Who had OAIS on their World Digital Preservation Day (WDPD) Bingo Card? 

But also, no. This is a misleading read and deserves more context.

Access when it is considered part of digital preservation is when access is used as a measure of success of digital preservation (or indicator of the potential obsolescence of an object) – it is an intrinsic property of digital preservation. 

But the access function in OAIS is not that. And even if you’re crafty, and build an access component to a system that provides a feedback loop to digital preservation functions, it’s not that part of OAIS.

Now, PREMIS does have some nice features that support access BUT we’re talking “events”, and information that supports digital preservation and even though there may be a way to encode events that provide a feedback loop to measure the success of preservation, e.g. {“event”: “access”, “detail”: “tried to open PSD in GIMP”, “outcome”: “FAIL”}, true access goes well beyond the scope of my article and the spirit in which it was written.

We need to evolve

The EC presents a somewhat dogmatic and institutionalised response. As a flaneur in the field, as someone who has worked implementing PREMIS in one of the most PREMIS heavy digital preservation systems out there, and involved too in efforts to minimise PREMIS verbosity, including my own event-like approaches I revisit Sharpe’s paper in 2022/2023. I do this asking, why don’t we talk about it more? Why do I see projects today still see XML as the end goal of PREMIS?

My view is that a 20 year old standard, a 2015 specification (last revision) and a 2016 reference implementation in an out of date technology (XML), and an very institutional PREMIS EC, with roots at the Library of Congress, all have influence, and some of the points I do see appearing from their response are being buried in their desire to hold onto authority.

The biggest point being buried, technological agnosticism, appears in the EC’s response to me five times, technology independent once, and in the official data dictionary once (unrelated), and it appears in the official 2015 conformance statement, zero (although you can bend the verbosity of the conformance statement into words that read like technologically agnostic. But make it explicit, don’t write it five times to me and not put it in the docs. Make new reference implementations, or borrow them from your implementers. Use plain-language, and just make it explicit.

Better still, let’s evolve the presentation of the PREMIS standard (away from separate PDFs), and use a modern documentation framework (e.g. Diataxis), and put it into public versioned source control, and give us a way that we can help write the documentation with you to make things like this clearer.

While the EC’s response to me labor on the idea I have missed the fact that PREMIS is technology agnostic I wrote the original paper to amplify previous conversations and keep them relevant because they were formative for me, and I hope that they will be formative for others.

I also wrote the original paper as more of a technology paper than a PREMIS paper (honouring PREMIS of course) but I make a very clear conclusion that is very much inclusive of PREMIS:

It is this paper’s assertion that we can store more, and “do more” by taking an event-sourced approach to storing events associated with the “objects” described in the PREMIS data dictionary. 

I can nuance this further:

  • Store events about your digital objects and try to make sure some of those events can be aligned with PREMIS, 
  • Store events because events happen on a continuum, don’t fall into the trap of storing state,
  • Create representations of your data, PREMIS might be one, access reports and logs might be another, feature analyses might be another, don’t limit yourself to one schema, use many. 

My paper is about trying to fit older trusted paradigms into modern development practices. It’s about moving away from dogmatic adherence to the past while honouring something that exists. 

We can do PREMIS exactly the same as we do it now, as long as we don’t put it front and centre of our implementation.

How to respond to a “well-actually”?

Well-actually… https://www.recurse.com/social-rules#no-well-actuallys 

There are some editorial quirks in my paper, the one I am most embarrassed by is when my writing conflated the data model with the events in the Library of Congress controlled vocabulary (what other controlled vocabularies have other folks been using in the last decade? Next PREMIS revision, please, put those listings in there or open the editorial process to modern practices). Conflating these two things in one paragraph should hardly be the thread that untangles the entire piece.

The PREMIS EC haven’t reached out to me before publication, or after, yet as I point out, they all know where to find me (I wasn’t able to make the PREMIS birds-of-a-feather at iPRES (probably a good thing while this seems to have been in the air) but I was at the conference). Their response though does something strange, directing their efforts at things I might not have understood, may seemingly be getting at; or pointing out what I am “really saying here”. It is a patronising approach. For the gaps they filled in on my behalf, I would happily have provided clarity, offering me the opportunity to respond in a less reactive way, or perhaps all of us a chance to collaborate.

Their response appeals to authority, and its two references are my article and the PREMIS data dictionary. I am sure there was a more neutral, reflective, and holistic way to approach this work by focusing on the entirety of the article and its spirit, and giving the benefit of the doubt to what is perceived as the author’s “mistakes” or “misreadings”. A show don’t tell approach might have helped, and would certainly be valuable, e.g. spending more time implementing examples that lent themselves to updating future revisions of the data dictionary and conformance statements. 

 ¯\_(ツ)_/¯

Anyway folks. ¯\_(ツ)_/¯ Interpretation is tricky? I imagine that the PREMIS EC will find fault with the above text, but to try to avoid another article on the subject of my misinterpretation: The PREMIS EC aren’t foisting the standard on you and I most definitely am not. Read their docs if you do choose PREMIS. Technology changes and so do standards. I feel we have an obligation to modernise (and demonstrate modernisation) with those changes.  I feel we have an obligation to question, and evaluate as time moves on; especially when technology is front and centre of how we support our archivists and librarians.

Hopefully people reading this can continue to read the original paper for what it is. There may be some potentially interesting ideas and conclusions that a pure PREMIS discussion distracts from, including what event-sourced data might mean for activating information supporting digital preservation.

Hopefully too, from this engagement, the PREMIS EC will take an opportunity to fold some of their own response into their own documentation and guidance.

Thanks for reading.

PREMIS conformance statement (2015): https://www.loc.gov/standards/premis/premis-conformance-20150429.pdf

PREMIS data dictionary (Version 3.0 (2015)): https://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf

 

https://exponentialdecay.co.uk/blog/dont-implement-premis-represent-it/

#Code #Coding #Data #digipres #DigitalPreservation #PREMIS #WDPD #WorldDigitalPreservationDay

Two men in a public-lending library phone-booth in Harburg near Hamburg Germany (taken August 2019)Figure describing the Access functional area in OAIS from the OAIS standard
Deutscher Bibliotheksverbandbibverband@openbiblio.social
2024-11-07

Heute ist „Welttag der Digitalen Erhaltung“ #WDPD, der für das Thema der digitalen #Langzeitarchivierung sensibilisieren soll! Die wissenschaftlichen Bibliotheken sind hier bereits seit Jahrzehnten aktiv. Sie sorgen für die langfristige Verfügbarkeit von Daten und schützen sie vor Gefahren.

#WeiterWissen. Mit Uns!

@stabihh @SLUBDresden @ZBMED @UB_HUBerlin @DNB_Aktuelles @bsbmuenchen @tibhannover @stabi_berlin @ubleipzig @ZBW_MediaTalk @subugoe

👉 weiterwissen-kampagne.de

Für immer Daten gesichert. Mit uns.
Die Wissenschaftlichen Bibliotheken in Deutschland
Weiter Wissen
2024-11-06

@dpc_chat Just a crazy idea, maybe we can have Kiribati at the start of our #WDPD festivities next year. Kiribati is at UTC+14:00, so they beat New Zealand by one hour 😊

Georgia MoppettGeorgia@digipres.club
2024-11-04

Back by popular demand, the #PRONOM team will be running their yearly hackathon on 7th November-15th November, to celebrate #WDPD

They will be kicking off the week with a PRONOM Open Drop-In session on the 7th dedicated to answering your questions.

openpreservation.org/news/pron

2024-06-04

@mickylindlar @Thorsted for “Only errors in the files,” it’s gotta be #wtfpdf. This would actually be so fun for #WDPD (World Digital Preservation Day)

2023-11-02
Happy World Digital Preservation DayPreserving your items is as easy as 1, 2, 3

1. Submit a collection summary
2. Sign the Deed of Gift
3. Prepare an inventory and send your tapes
2023-11-02

@BertrandCaron @MireilleNappert I am not sure if it is easy or hard for beginners but we (Artefactual) create a game of card with acronyms where the « rules » are if you don’t know it, make up something funny. If you’d like a deck, send me your postal address! artefactual.com/flashcards #WDPD

In der Aufzählung relevanter LZA-Netzwerke, darf natürlich der Erfinder des #wdpd nicht fehlen ... die dpc! Leider ist die TIB aktuell nicht dpc Mitglied, aber seit der Verleihung des dpc Fellowships 2020 (immer noch eine riesige Ehre!) bin ich zumindest persönlich intensivst und auf Lebenszeit mit der dpc verbandelt 😜
Die dpc Technology Watch Reports sind wichtige Ressourcen zu diversen Themen der LZA!
dpconline.org/digipres/discove

#wdpd2023

2023-11-02

For #wdpd I am... playing catch up! Writing up notes from a Halloween special #webarchiving session with RDM colleagues which should go towards a procedure for capture and long term preservation. #digpres

2023-11-02

"No one can whistle a symphony. It takes a whole orchestra to play it!” excellent blog piece in celebration of #wdpd by my colleague Fran Horner: dpconline.org/blog/wdpd/wdpd20

2023-11-01

My #dp3 things for today:

1) Have a catch-up with @amyicurrie on Workforce Development things

2) Amy's yearly staff review

3) Secret #WDPD stuff...

Tim Allisontallison
2023-10-26

In honor of World Digital Preservation Day , I'm putting on a virtual hands-on workshop on for practitioners.

This will focus on a new beta-grade user interface available on my personal github repo.

Dial-in information to follow:
meetup.com/apache-tika-communi

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst