#WebArchiving

Katharina Brunnerblog@katharinabrunner.de
2026-02-26
2026-02-25

I just added my Archive-it #warc to Michigan DPN process to GitHub for easier access and whatnot. I’ll be talking about this on Monday for the @dpc_chat workflows webinar series! #webarchiving #digipres #digitalpreservation github.com/mlibrary/digiPres/b

Lukas Fuchsgruberlukasfx@chaos.social
2026-02-24

Arndt, Tracy; Arndt, Natanael: How to describe the past Web? A data model for web archiving. SWIB25 - Semantic Web in Libraries, ZBW - Leibniz-Informationszentrum Wirtschaft et al., 2025. doi.org/10.5446/72405

#webarchiving #linkedopendata

presentation slide, showing a diagram view of the live web and then a web archive container and how it links to web pages, the web container are shown to be disconnected to the live web
2026-02-23

Popular Science: The Internet Archive records its 1 trillionth website. “The Internet Archive—one of cyberspace’s most essential library projects—has achieved a feat that’s hard to even conceptualize. After nearly 30 years of painstaking work, the nonprofit has preserved its trillionth webpage.”

https://rbfirehose.com/2026/02/23/popular-science-the-internet-archive-records-its-1-trillionth-website/
2026-02-21

Ars Technica: Wikipedia blacklists Archive.today, starts removing 695,000 archive links. “In the course of discussing whether Archive.today should be deprecated because of the DDoS, Wikipedia editors discovered that the archive site altered snapshots of webpages to insert the name of the blogger who was targeted by the DDoS. The alterations were apparently fueled by a grudge against the blogger […]

https://rbfirehose.com/2026/02/21/ars-technica-wikipedia-blacklists-archive-today-starts-removing-695000-archive-links/
2026-02-21

De nombreux chercheur·es utilisent #ArchiveToday pour archiver des réseaux sociaux qu'ils citent dans leurs articles... Cela semble compromis désormais
#Wikipedia blacklists Archive.today, starts removing 695,000 archive links
arstechnica.com/tech-policy/20
#iloveinternetarchive
#webarchiving
#archives

2026-02-18

À lire aussi sur la question de l'#IA et la #waybackmachine Mark Graham
Generative #AI presents real challenges in today’s information ecosystem. But preserving the time-honored role of #libraries and #archives in society has never been more important. We’ve worked alongside news organizations for decades. Let’s continue working together in service of an open, referenceable, and enduring #web
#webarchiving
techdirt.com/2026/02/17/preser
#iloveinternetarchive

2026-02-18

#WaybackMachine Director Pushes Back on AI Scraping Fears Driving Archive Blocks
blog.archive.org/2026/02/18/wa
As reported by Nieman Lab last month, some major media organizations—including The #NewYorkTimes, #TheGuardian, and #Reddit—have started blocking the Wayback Machine from archiving their sites over unfounded concerns about AI scraping.
Mike Masnick in #Techdirt explained why this is “a mistake we’re going to regret for generations.”
limiting #webarchiving threatens our shared #digitalhistory.

#Digital ⚓️ #Vagabond 🦈beet_keeper@digipres.club
2026-02-17

Hmm, HTTP response headers are still encoded in latin-1

github.com/Kludex/starlette/pu

#TIL #WebDevelopment #Unicode #WebArchiving

2026-02-13

Hi I’ll be covering this #workflow for backing up WARCs from Archive-it to a state run LOCKSS program at the upcoming @dpc_chat workflows webinar. I also have info about running these websites offline. docs.google.com/document/d/14F #digipres #webarchiving #warc #digitalpreservation Have a look!

cutterkomcutterkom
2026-02-11

RE: mastodon.social/@cutterkom/115

Update on the dataset that contains PII of trans persons living in the US: @SafeguardingResearch stopped distributing it via bitorrent after I reported it: sciop.net/datasets/nyc-trans-o

Why? "Resilience makes p2p file sharing is such a compelling technology not only for pirated content, but also for scientific data and public records. But is it suitable for the life stories of marginalized people living in a country whose own government is persecuting them?"

katharinabrunner.de/2026/01/ar

2026-02-09

Library of Congress: From Print Volumes to Digital Scholarship: The Handbook of Latin American Studies Web Archive. “Since the 1930s, the Handbook of Latin American Studies has documented scholarship on Latin America and the Caribbean. In this interview, Tracy North describes how that long-standing mission now extends to web archiving, ensuring long-term access to web-based research materials. […]

https://rbfirehose.com/2026/02/09/from-print-volumes-to-digital-scholarship-the-handbook-of-latin-american-studies-web-archive-library-of-congress/
Internet Archive Blogs | Updates from the Internet Archiveblog.archive.org@web.brid.gy
2026-02-06

Internet Archive and Partners Select Local Newsrooms from Across the US to Participate in the Today’s News for Tomorrow Program

fed.brid.gy/r/https://blog.arc

2026-02-05

"In a bizarre act of cultural vandalism they've not just removed the entire site (including the archives of previous versions) but they've also set every single page to be a 302 redirect to their closure announcement."
fedi.simonwillison.net/@simon/

#webarchiving is an act of resistance against cultural vandalism.

2026-02-03

Journalists don’t just report from the web anymore—they report on it.

Learn the 9 Ways Web Archives Are Used In Digital Investigations in a new guest post by researchers from King’s College London who analyze 8,600 news articles to identify how journalists use the #WaybackMachine in digital investigations.🕵️‍♀️

Read Follow the Changes 👉 blog.archive.org/2026/02/02/fo

#WebArchiving #DigitalAccountability @kingsdh

Logo of the Internet Archive WayBack Machine, in black and red text on a white background.
Lukas Fuchsgruberlukasfx@chaos.social
2026-01-28

Tomorrow we will do a small input on #ArtDocArchive, which was a prototype for #webarchiving self documentation of artists on websites and social media (basically trying to preserve websites and feeds, extract information, and visualize it) at this event at nGbK in Berlin:

ngbk.de/en/programm/termine/ea

Website of the project: art-doc-archive.net/ There you find software and blog posts from the 4 month project.

#EastUnBloc

Network visualization of colorful dots in a cloud. On the right a social media posts.
2026-01-23

I heard it through the grapevine that the Library of Congress is accepting bids to become their #WebArchiving vendor. The documents provide a little window in on some of the details of how they currently do web archiving (transferring Bagit packages from S3) and the reports they generate to monitor it.

sam.gov/workspace/contract/opp

Forum Queeres Archiv Münchenqueerarchivemunich@openbiblio.social
2026-01-20
cutterkomcutterkom
2026-01-20

Archival Demiground: Thoughts on preserving trans oral history

What started as a little web archiving project for @SafeguardingResearch ended with a question about radical openness.

tldr; The present seems to call for dark archives and archival demiground, a term coined by @margaret. That's quite a depressing finding: visibility has been a central goal of queer movements for many years.

Longform ➡️ katharinabrunner.de/2026/01/ar

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst