#aiscraping

Ars Technica Newsarstechnica@c.im
2025-06-05
2025-05-22

Wer sich über die vielen tollen Informationsangebote im Internet freut, sollte wissen:

#KI 🤖 randaliert im Netz – #Admins halten dagegen, damit wir Menschen ungestört surfen können.

Lest mal, wie Admins ihre absolut frustrierende aber unsichtbare Abwehrarbeit gegen KI beschreiben – im Blog von @campact:
👉 blog.campact.de/2025/05/ki-ran

🙏 @flberger

❗Nicht vergessen: 25. Juli ist #SysAdminDay

#FediAdmins #KIScraping #AI #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep

2025-04-26

Just noticed this setting on Cloudflare 🙃 Probably not great from a sustainability point of view but I can easily see this turning into a popular enabled setting just to spite AI companies... i.e. Let’s see how these bots like an infinite loop instead of my content.

AI#

2025-04-17

Feeling less and less inclined to put new content on my websites as AI scrapers regularly come by, ignoring robots.txt. How do you deal with that? Seriously? I don't want to feed their machines with machine readable data. I even considered using PDFs with text as images but that is wrong in so many ways...

#ai #openweb #aiscraping

2025-04-03

👀 Esta mañana al comentar los problemas de Wikimedia con el scrapping, un amigo programador me han hablado del proyecto Anubis github.com/TecharoHQ/anubis/
"Es bastante sencillo y fácil de implementar en cualquier web medio seria, te cargas automáticamente cualquier scrapper (sea de IA sea de lo que sea). Además, no pueden inventar nada que haga que sea rentable el scrapping con eso puesto." #aiscraping #aiscrapers #wikimedia #anubis #iahastaenlaputasopa

2025-03-21

🌐 LLM crawlers continue to DDoS SourceHut | sr_ht status

「 SourceHut continues to face disruptions due to aggressive LLM crawlers. We are continuously working to deploy mitigations. We have deployed a number of mitigations which are keeping the problem contained for now. However, some of our mitigations may impact end-users 」

status.sr.ht/issues/2025-03-17

#sourcehut #ddos #aiscraping

aproposnixaproposnix
2025-03-16

Serious question, isn't this an issue even with decentralized systems? What's preventing AI bots from just using all of our public data on the Fediverse? Is there any difference?

techcrunch.com/2025/03/15/blue

2025-03-12

Hi #Admins 👋,

Can you give me quotes that explain your fight against #AIScraping? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)

I intend to publish your quotes in a text on @campact 's blog¹ (DE, German NGO).

The quotes should make your work🙏 visible in a generally understandable way

¹ blog.campact.de/author/friedem

#TDM #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep #Kudurru #Nightshade #Glaze #FediAdmins

PPC Landppcland
2024-12-25

Cloudflare unveils tools to give publishers control over AI scraping: New AI Audit feature allows website owners to analyze and manage how AI models access their content, with plans for a marketplace. ppc.land/cloudflare-unveils-to

2024-11-25

How to turn off #AIscraping from your Word documents "#Microsoft Office has slyly turned on an “opt-out” feature that scrapes your #Word,#Excel docs to train its internal AI systems. This setting is turned on by default, and you have to manually uncheck a box in order to opt out. If you are a writer who uses MS Word to write any proprietary content (blog posts, novels, any work you intend to protect w #copyright and/or sell), u want to turn this feature off immediately medium.com/illumination/ms-wor

Norobiik @Norobiik@noc.socialNorobiik@noc.social
2024-10-09

The #WebApp, called #AdobeContentAuthenticity, allows artists to signal that they do not consent for their work to be used by #AI models. It also gives creators the opportunity to add what Adobe is calling “#ContentCredentials,” including their verified identity, social media handles, or other online domains, to their work. #C2PA #DataScraping

#Adobe wants to make it easier for artists to blacklist their work from #AIScraping
technologyreview.com/2024/10/0

𝓖𝓵𝓸𝓻𝓲𝓪Glor
2024-10-09

Hmm, interesting. I think tools like this are definitely a good thing.

wants to make it easier for artists to blacklist their work from | MIT Technology Review @technologyreview
technologyreview.com/2024/10/0

Ecologia Digitaljosemurilo@mato.social
2024-10-01

"It’s pretty crazy that not only a) these bots shamelessly harvest all your data without asking for permission and b) they do it in such a brute-force manner.
My coworker and security expert António pointed me to #DarkVisitors, and I’ll probably be installing their #WordPressPlugin on all my sites. For what it’s worth."
@john_fisherman on #AIscraping
fred-rocha.medium.com/ai-crawl

Ecologia Digitaljosemurilo@mato.social
2024-08-05

big scoop by @404mediaco:
"#Nvidia employee leaked documents, Slack conversations, and emails to 404 Media showing how the company went about building a video foundational model that would feed into its other products. It's a fascinating look into how a tech giant operates as it's attempting to stay competitive in AI world, and how it gobbles up copyrighted content from around the web in the process."
#AIscraping
404media.co/nvidia-ai-scraping

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst