Lmst

HTML-Archiving: static, standalone, efficient

https://katharinabrunner.de/2026/02/html-archiving-static-standalone-efficient/

I just added my Archive-it #warc to Michigan DPN process to GitHub for easier access and whatnot. I’ll be talking about this on Monday for the @dpc_chat workflows webinar series! #webarchiving #digipres #digitalpreservation https://github.com/mlibrary/digiPres/blob/main/webarchiving/warcs2mdpn/readme.md

Arndt, Tracy; Arndt, Natanael: How to describe the past Web? A data model for web archiving. SWIB25 - Semantic Web in Libraries, ZBW - Leibniz-Informationszentrum Wirtschaft et al., 2025. https://doi.org/10.5446/72405

#webarchiving #linkedopendata

presentation slide, showing a diagram view of the live web and then a web archive container and how it links to web pages, the web container are shown to be disconnected to the live web

Wikipedia mette Archive.today in blacklist per DDoS
#CyberSecurity #DDoS #Privacy #Sicurezza #TechNews #Tecnologia #WebArchiving #Wikipedia

https://www.ceotech.it/wikipedia-mette-archive-today-in-blacklist-per-ddos/

Popular Science: The Internet Archive records its 1 trillionth website. “The Internet Archive—one of cyberspace’s most essential library projects—has achieved a feat that’s hard to even conceptualize. After nearly 30 years of painstaking work, the nonprofit has preserved its trillionth webpage.”

https://rbfirehose.com/2026/02/23/popular-science-the-internet-archive-records-its-1-trillionth-website/

Ars Technica: Wikipedia blacklists Archive.today, starts removing 695,000 archive links. “In the course of discussing whether Archive.today should be deprecated because of the DDoS, Wikipedia editors discovered that the archive site altered snapshots of webpages to insert the name of the blogger who was targeted by the DDoS. The alterations were apparently fueled by a grudge against the blogger […]

https://rbfirehose.com/2026/02/21/ars-technica-wikipedia-blacklists-archive-today-starts-removing-695000-archive-links/

De nombreux chercheur·es utilisent #ArchiveToday pour archiver des réseaux sociaux qu'ils citent dans leurs articles... Cela semble compromis désormais
#Wikipedia blacklists Archive.today, starts removing 695,000 archive links
https://arstechnica.com/tech-policy/2026/02/wikipedia-bans-archive-today-after-site-executed-ddos-and-altered-web-captures
#iloveinternetarchive
#webarchiving
#archives

À lire aussi sur la question de l'#IA et la #waybackmachine Mark Graham
Generative #AI presents real challenges in today’s information ecosystem. But preserving the time-honored role of #libraries and #archives in society has never been more important. We’ve worked alongside news organizations for decades. Let’s continue working together in service of an open, referenceable, and enduring #web
#webarchiving
https://www.techdirt.com/2026/02/17/preserving-the-web-is-not-the-problem-losing-it-is
#iloveinternetarchive

#WaybackMachine Director Pushes Back on AI Scraping Fears Driving Archive Blocks
https://blog.archive.org/2026/02/18/wayback-machine-director-pushes-back
As reported by Nieman Lab last month, some major media organizations—including The #NewYorkTimes, #TheGuardian, and #Reddit—have started blocking the Wayback Machine from archiving their sites over unfounded concerns about AI scraping.
Mike Masnick in #Techdirt explained why this is “a mistake we’re going to regret for generations.”
limiting #webarchiving threatens our shared #digitalhistory.

Hmm, HTTP response headers are still encoded in latin-1

https://github.com/Kludex/starlette/pull/1236

#TIL #WebDevelopment #Unicode #WebArchiving

Hi I’ll be covering this #workflow for backing up WARCs from Archive-it to a state run LOCKSS program at the upcoming @dpc_chat workflows webinar. I also have info about running these websites offline. https://docs.google.com/document/d/14FZzbfICaddW1wJP8N1CQE6YZOHXjtT_ouAy1-YvsM0/edit?usp=sharing #digipres #webarchiving #warc #digitalpreservation Have a look!

RE: https://mastodon.social/@cutterkom/115926148559409105

Update on the dataset that contains PII of trans persons living in the US: @SafeguardingResearch stopped distributing it via bitorrent after I reported it: https://sciop.net/datasets/nyc-trans-oral-history

Why? "Resilience makes p2p file sharing is such a compelling technology not only for pirated content, but also for scientific data and public records. But is it suitable for the life stories of marginalized people living in a country whose own government is persecuting them?"

https://katharinabrunner.de/2026/01/archival-demiground-thoughts-on-preserving-trans-oral-history

#webarchiving

Library of Congress: From Print Volumes to Digital Scholarship: The Handbook of Latin American Studies Web Archive. “Since the 1930s, the Handbook of Latin American Studies has documented scholarship on Latin America and the Caribbean. In this interview, Tracy North describes how that long-standing mission now extends to web archiving, ensuring long-term access to web-based research materials. […]

https://rbfirehose.com/2026/02/09/from-print-volumes-to-digital-scholarship-the-handbook-of-latin-american-studies-web-archive-library-of-congress/

Internet Archive and Partners Select Local Newsrooms from Across the US to Participate in the Today’s News for Tomorrow Program

https://fed.brid.gy/r/https://blog.archive.org/2026/02/06/internet-archive-and-partners-select-local-newsrooms-from-across-the-us-to-participate-in-the-todays-news-for-tomorrow-program/

"In a bizarre act of cultural vandalism they've not just removed the entire site (including the archives of previous versions) but they've also set every single page to be a 302 redirect to their closure announcement."
https://fedi.simonwillison.net/@simon/116015180016712361

#webarchiving is an act of resistance against cultural vandalism.

Journalists don’t just report from the web anymore—they report on it.

Learn the 9 Ways Web Archives Are Used In Digital Investigations in a new guest post by researchers from King’s College London who analyze 8,600 news articles to identify how journalists use the #WaybackMachine in digital investigations.🕵️‍♀️

Read Follow the Changes 👉 https://blog.archive.org/2026/02/02/follow-the-changes/

#WebArchiving #DigitalAccountability @kingsdh

Tomorrow we will do a small input on #ArtDocArchive, which was a prototype for #webarchiving self documentation of artists on websites and social media (basically trying to preserve websites and feeds, extract information, and visualize it) at this event at nGbK in Berlin:

https://ngbk.de/en/programm/termine/eastunbloc-in-medias-rest

Website of the project: https://art-doc-archive.net/ There you find software and blog posts from the 4 month project.

#EastUnBloc

Network visualization of colorful dots in a cloud. On the right a social media posts.

I heard it through the grapevine that the Library of Congress is accepting bids to become their #WebArchiving vendor. The documents provide a little window in on some of the details of how they currently do web archiving (transferring Bagit packages from S3) and the reports they generate to monitor it.

https://sam.gov/workspace/contract/opp/a2c5551af2b74c3d84c775032c83a55e/view

RE: https://mastodon.social/@cutterkom/115926148559409105

Our member @cutterkom with thoughts on preserving trans oral history

#queerhistory #webarchiving #trans @histodons

Archival Demiground: Thoughts on preserving trans oral history

What started as a little web archiving project for @SafeguardingResearch ended with a question about radical openness.

tldr; The present seems to call for dark archives and archival demiground, a term coined by @margaret. That's quite a depressing finding: visibility has been a central goal of queer movements for many years.

Longform ➡️ https://katharinabrunner.de/2026/01/archival-demiground-thoughts-on-preserving-trans-oral-history/

#webarchiving #trans #queerhistory #uspol

#WebArchiving

Client Info