Final workshop for #PARBICA21 - From Start to Finish: A Workflow for Digital Archiving, presented by Jodie Kell, Steven Gagau and Julia Miller, PARADISEC @paradisec_aus
Final workshop for #PARBICA21 - From Start to Finish: A Workflow for Digital Archiving, presented by Jodie Kell, Steven Gagau and Julia Miller, PARADISEC @paradisec_aus
"The FBI is attempting to unmask the owner behind archive.today, a popular archiving site that is also regularly used to bypass paywalls on the internet and to avoid sending traffic to the original publishers of web content, according to a subpoena posted by the website. The FBI subpoena says it is part of a criminal investigation, though it does not provide any details about what alleged crime is being investigated. Archive.today is also popularly known by several of its mirrors, including archive.is and archive.ph.
The subpoena, which was posted on X by archive.today on October 30, was sent by the FBI to Tucows, a popular Canadian domain registrar. It demands that Tucows give the FBI the “customer or subscriber name, address of service, and billing address” and other information about the “customer behind archive.today.”"
https://www.404media.co/fbi-tries-to-unmask-owner-of-infamous-archive-is-site/
Đang tìm công cụ tự lưu để lưu trữ và index các trang web, tài khoản mạng xã hội cho dự án OSINT. Đã thử ArchiveBox nhưng cần tham khảo thêm lựa chọn khác. #Côngnghệ #Mởnguồn #OSINT #SelfHosted #Côngthứclưu #DigitalArchiving #Giữlưuýtıệ #ToolRecomm #Việnđiệnnhân
(NOTE: The Vietnamese text is within 500 characters, includes key points, and appropriate bilingual hashtags as requested - note that the character count here appears to be under 500)
Revisiting bsdiff as a tool for digital preservation
by @beet_keeper
I introduced bsdiff in a blog in 2014. bsdiff compares the differences between two files, e.g. broken_file_a and corrected_file_b and creates a patch that can be applied to broken_file_a to generate a byte-for-byte match for corrected_file_b.
On the face of it, in an archive, we probably only care about corrected_file_2 and so why would we care about a technology that patches a broken file?
In all of the use-cases we can imagine the primary reasons are cost savings and removing redundancy in file storage or transmission of digital information. In one very special case we can record the difference between broken_file_a and corrected_file_b and give users a totally objective method of recreating corrected_file_b from broken_file_a providing 100% verifiable proof of the migration pathway taken between the two files.
#ac3 #Archives #audio #audiovisual #Audit #authenticity #av #Bash #bsdiff #checksums #Code4Lib #corruption #corruptionIndex #digipres #DigitalArchiving #DigitalForensics #digitalLiteracy #DigitalPreservation #DigitalStorage #diplomatics #FileFormats #glitch #glitchAudio #GlitchArt #integrity #PreservationAnalysis #PreservationMetadata #provenance #sensitivityIndex #Storage
Indicator: The Indicator Guide to tools for capturing webpages and social media content. “We tested 11 tools ranging from full-featured continuous capture apps to one-off screenshot extensions for grabbing long webpages.”
⚡️Linkwarden: The Self-Hosted Bookmark Manager That Solved a Problem I Didn’t Know I Had
Thank you, Linux Unplugged and Jupiter Broadcasting @ironicbadger, for introducing me to Linkwarden—a FOSS gem that will change how I save, share, and preserve the web.
Like many of you, I’ve been using browser bookmarks for years. I’d save articles, tutorials, and interesting links, only to find them gone when I finally got around to reading them. Link rot is real, and it’s frustrating. But until I heard about Linkwarden https://linkwarden.app/ on Linux Unplugged https://jupiterbroadcasting.com/, I didn’t realize how much I needed a better solution.
I used to think, “Browser bookmarks are fine,” and honestly, backing them up manually from time to time isn’t a real trouble—just a slight inconvenience. My problem is that I experience massive link rot when looking into two-year-old links, often with interesting subjects on small sites—they are often just gone when I want to recall them. The problem is that saving the link isn’t saving any of the information.
But Linkwarden @linkwarden isn’t just another bookmark manager—it’s a preservation powerhouse, a collaborative hub, and a self-hosted dream. And thanks to the folks at Jupiter Broadcasting, I now understand why it’s a game-changer.
I haven’t started hosting it yet, but I definitely will, and I hope some of you out there will find it useful too.
Thanks to @daniel31x13 for making a awesome tool :heart_cyber: ⚡️.
---
• Linkwarden github.com/linkwarden/linkwarden — Self-hosted collaborative bookmark manager to collect, read, annotate, and fully preserve what matters, all in one place.
• Announcing Linkwarden 2.11 blog.linkwarden.app/releases/2.11
• Linkwarden Browser Extension github.com/linkwarden/browser-extension
@selfhosted@a.gup.pe @selfhosting @selfhosted@lemmy.world @selfhost #OpenSourceSoftware #TechForGood #Linkwarden #SelfHosted #FOSS #OpenSource #WebPreservation #Fediverse #LinuxUnplugged #SaveTheWeb #NoMore404 #TechCommunity #DigitalArchiving #LinkRot #PrivacyFirst #BookmarkManager #Bookmark
Client-side file format identification and reporting pipeline with Siegfried and Demystify Lite
by @beet_keeper
With thanks to the sponsorship of Archives New Zealand and Richard Lehane for his great coding expertise and his collaboration; Demystify Lite has a new feature — Siegfried!!
Richard recently posted about this work on LinkedIn but lets look at this effort in more detail below.
Continue reading “Client-side file format identification and reporting pipeline with Siegfried and Demystify Lite”…
#Archives #Coding #digipres #DigitalArchiving #DigitalPreservation #DROID #FileFormat #Golang #siegfried #SoftwareDevelopment
Published: PREMIS Events Through an Event-sourced Lens
by @beet_keeper
Not long after my first Code4Lib article I had another idea to run by the team there, and elected to see if my paper looking at events in the PREMIS metadata standard would be of interest to them and the readership.
My paper PREMIS Events Through an Event-sourced Lens was published April this year.
I take a look at the content of this paper below and plug a few gaps that I have been thinking about since its publication.
Continue reading “Published: PREMIS Events Through an Event-sourced Lens”…
#Archives #Code4Lib #DesignPatterns #digipres #DigitalArchiving #DigitalPreservation #EventSourcing #PREMIS #Publications #SoftwareArchitecture #SoftwareDevelopment
Digital Preservation as a Thought Experiment
by @beet_keeper
Back in 2017, I had an abstract accepted for a chapter in the ALCTS Monograph: Digital Preservation in Libraries: Preparing for a Sustainable Future. With my author’s copy now available, I take a look at the background and its genesis below. The complete monograph is a fascinating read with some great contributors. You can find it online at the ALA Store.
Continue reading “Digital Preservation as a Thought Experiment”…
#Archives #community #ComputerScience #digipres #DigitalArchiving #digitalLiteracy #DigitalPreservation #glam #learning #outreach #Publications #ThoughtExperiment #training #writing
Looking after your URLs: tikalinkextract eight years on
by @beet_keeper
We might not have a second life, but what if I told you there was a second internet? Not the deep web, but another web that we engage with nearly every day?
Think about it, that QR code you scanned for more information? That payment link you followed on your electricity bill? The website you’re told to visit at the end of a television ad?
The antipodes of the internet are these terminal endpoints, material and not necessarily material objects that represent the end of the freely navigable web — the QR code on a concert poster is the web printed onto the physical world. There is every chance it will be scanned and followed by someone from a mobile device, but it’s a transient object, something that will exist for a short amount of time, and then disappear into the palimpsest of the poster board or wall it was pasted on until it eventually disappears.
This is part of the materiality of the internet that has long fascinated me. Perhaps it comes from being a student of material culture, but if we look around, we see the Internet everywhere!
Continue reading “Looking after your URLs: tikalinkextract eight years on”…
#Archives #digipres #DigitalArchiving #digitalContinuity #DigitalPreservation #httpreserve #Memento #outreach #RobustLinks #RobustWebLinks #WebArchives #webArchiving
"On Thursday Reuters published a photograph of Waltz checking his mobile phone during a cabinet meeting held by Donald Trump. The screen appears to show messages from various top level government officials, including JD Vance, Tulsi Gabbard, and Marco Rubio.
At the bottom of Waltz’s phone’s screen is a message that looks like Signal’s regular PIN verification message. This sometimes appears to encourage users to remember their PIN, which can stop people from taking over their account.
But the message is slightly different: it asks Waltz to verify his “TM SGNL PIN.” This is not the message that is displayed on an official version of Signal.
Instead TM SGNL appears to refer to a piece of software from a company called TeleMessage which makes clones of popular messaging apps but adds an archiving capability to each of them. A page on TeleMessage’s website tells users how to install “TM SGNL.” On that page, it describes how the tool can “capture” Signal messages on iOS, Android, and desktop."
#USA #Trump #Signal #Messaging #Privacy #DigitalArchiving #TeleMessage
"Almost two dozen repositories of research and public health data supported by the National Institutes of Health are marked for “review” under the Trump administration’s direction, and researchers and archivists say the data is at risk of being lost forever if the repositories go down.
“The problem with archiving this data is that we can’t,” Lisa Chinn, Head of Research Data Services at the University of Chicago, told 404 Media. Unlike other government datasets or web pages, downloading or otherwise archiving NIH data often requires a Data Use Agreement between a researcher institution and the agency, and those agreements are carefully administered through a disclosure risk review process.
A message appeared at the top of multiple NIH websites last week that says: “This repository is under review for potential modification in compliance with Administration directives.”
Repositories with the message include archives of cancer imagery, Alzheimer’s disease research, sleep studies, HIV databases, and COVID-19 vaccination and mortality data."
https://www.404media.co/nih-archives-repositories-marked-for-review-for-potential-modification/
#USA #Trump #Datasets #OpenScience #OpenData #PublicHealth #DigitalArchiving #DigitalPreservation
"According to Graham, based on the big jump in page views he's observed over the past two months, the Internet Archive is drawing many more visitors than usual to its services — journalists, researchers and other inquiring minds. Some want to consult the archive for information lost or changed in the purge, while others aim to contribute to the archival process.
"There's a groundswell of support for the Internet Archive because of the dramatic shift that's going on in parts of the government web infrastructure that you wouldn't imagine would change," said Brewster Kahle, the founder and current director of the Internet Archive. "People are coming and rallying behind us — by using it, by pointing at things, helping organize things, by submitting content to be archived — data sets that are under threat or have been taken down."
Nancy Krieger, a social epidemiologist at Harvard University who likened the purge to "a digital book burning" in a February interview with NPR's Ailsa Chang, is one of them. She's teamed up with other scientists to try to preserve federal health data that has recently disappeared from government websites. She helped develop a list of terms to send to the Internet Archive to aid the search and preservation effort.
"We want to preserve public health data that are crucial for people's well-being," she told NPR."
#USA #Trump #InternetArchive #DigitalPreservation #DigitalArchiving #WayBackMachine
https://www.npr.org/2025/03/23/nx-s1-5326573/internet-archive-wayback-machine-trump
🤦♂️ Look, a #Vectrex "computer" article that thinks converting ancient magazines into searchable text is a groundbreaking hobby! 🤓 Apparently, spending sick days lamenting over #vaporware is the new self-care. 🛌📚
https://www.amigalove.com/viewtopic.php?t=2887 #Hobby #DigitalArchiving #SelfCare #TechNostalgia #HackerNews #ngated
"For decades, the Internet Archive has preserved our digital history. Lately, journalists and ordinary citizens have been turning to it more than ever, as the Trump administration undertakes an ideologically-driven purge of government websites. But the Archive itself faces an existential threat. In this episode, Close All Tabs Senior Editor Chris Egusa joins Morgan to discuss his visit to the Internet Archive and its colorful founder Brewster Kahle, the legal battles that could shut it down permanently — and what losing it might mean for accountability and the preservation of history."
https://www.kqed.org/news/12031980/what-happens-if-the-internet-archive-goes-dark
"More than a hundred and ten thousand government pages have gone dark in a purge that one scientist likened to a “digital book burning,” and which has proved as frightening in its imprecision as in its malice. Racing to comply with executive orders banning “D.E.I.” and “gender ideology extremism,” agencies have cut materials on everything from supporting transgender youth in school to teaching children about sickle-cell disease, which disproportionately affects people of African descent. But they have also axed records having little to do with the Administration’s ideological priorities, seemingly assisted by A.I. tools that flag forbidden words without regard to context. A recently leaked list of pages marked for deletion on military websites includes references to the Enola Gay—not, as it turns out, a member of the L.G.B.T.Q. community but, rather, the B-29 bomber that nuked Hiroshima.
Oblivion menaces every scrap of information that doesn’t spark joy in the Oval Office. “It’s gone,” Trump said of “wokeness,” during his recent address to Congress, in almost motherly tones. “And we feel so much better for it, don’t we? Don’t we feel better?” But on this front, at least, the Administration is facing well-organized resistance. It comes from a loose coalition of archivists and librarians, who are standing athwart history and yelling “Save!” They belong to organizations such as the Internet Archive, which co-created a project called the End of Term Web Archive to back up the federal web in 2008; the Environmental Data and Governance Initiative, or EDGI; and libraries at major universities such as M.I.T. and the University of Michigan. Like the Encyclopedists of Isaac Asimov’s “Foundation”—who race to compile a collapsing empire’s accumulated knowledge—they’re assembling information arks to ride out the chaos."
https://www.newyorker.com/news/the-lede/the-data-hoarders-resisting-trumps-purge
The sensitivity index: Corrupting Y2K
by @beet_keeper
In December I asked “What will you bitflip today?” Not long after, Johan’s (@bitsgalore) Digtial Dark Age Crew released its long lost hidden single Y2K — well, I couldn’t resist corrupting it.
Fixity is an interesting property enabled by digital technologies. Checksums allow us to demonstrate mathematically that a file has not been changed. An often cited definition of fixity is:
Fixity, in the preservation sense, means the assurance that a digital file has remained unchanged, i.e. fixed — Bailey (2014)
It’s very much linked to the concept of integrity. A UNESCO definition of which:
The state of being whole, uncorrupted and free of unauthorized and undocumented changes.
Integrity is massively important at this time in history. It gives us the guarantees we need that digital objects we work with aren’t harboring their own sinister secrets in the form of malware and other potentially damaging payloads.
These values are contingent on bit-level preservation, the field of digital preservation largely assumes this; that we will be able to look after our content without losing information. As feasible as this may be these days, what happens if we lose some information? Where does authenticity come into play?
Through corrupting Y2K, I took time to reflect on integrity versus authenticity, as well as create some interesting glitched outputs. I also uncovered what may be the first audio that reveals what the Millennium Bug itself may have sounded like! Keen to hear it? Read on to find out more.
Continue reading “The sensitivity index: Corrupting Y2K”…
#ac3 #Archives #audio #audiovisual #authenticity #av #Bash #checksums #Code4Lib #corruption #corruptionIndex #digipres #DigitalArchiving #digitalLiteracy #DigitalPreservation #diplomatics #FileFormats #flac #glitch #GlitchArt #glitchaudio #integrity #mp3 #sensitivityIndex #wav
FYI, there may be a 404 on the front page of consumerfinance.gov, but the site appears to still be functional.
I would grab any reports, assessments, financial education guides, complaint data, etc. ASAP
I want to draw particular attention to special guides for people with disabilities, people experiencing re-entry from the justice system, &c. Social workers- you might be interested in getting copies for clients.
#CFPB #DataHoarders #DigitalArchiving #USGovPurge #SocialWork
"On Jan. 10, the U.S. Department of Justice released a 123-page report on the 1921 racial massacre in Tulsa, Oklahoma, which claimed several hundred lives and left the thriving Black neighborhood of Greenwood in smoldering ruins. The department’s investigation determined that the attack was “so systematic and coordinated that it transcended mere mob violence.” While it conceded that “no avenue of prosecution now exists for these crimes,” the department hailed the findings as the “federal government’s first thorough reckoning with this devastating event,” which “officially acknowledges, illuminates, and preserves for history the horrible ordeals of the massacre’s victims.”
“Until this day, the Justice Department has not spoken publicly about the race massacre or officially accounted for the horrific events that transpired in Tulsa,” said Kristen Clarke, the assistant attorney general for civil rights, in announcing the report. “This report breaks that silence through a rigorous examination and a full accounting of one of the darkest episodes of our nation’s past. This report reflects our commitment to the pursuit of justice and truth, even in the face of insurmountable obstacles.”
Only two weeks later, the department took a strikingly different action regarding the historical record of a violent riot: It removed from its website the searchable database of all cases stemming from the Jan. 6, 2021, assault on the Capitol that were prosecuted by the U.S. attorney for the District of Columbia."
https://www.propublica.org/article/january-6-erasure-doj-database-trump-history
#USA #Trump #Democracy #Authoritarianism #DigitalArchiving #DigitalPreservation