Reddit sues Anthropic over AI scraping that retained users’ deleted posts https://arstechni.ca/zBYc #ArtificialIntelligence #largelanguagemodels #aiscraping #Anthropic #chatbots #Policy #Amazon #Claude #reddit #alexa #AI
Reddit sues Anthropic over AI scraping that retained users’ deleted posts https://arstechni.ca/zBYc #ArtificialIntelligence #largelanguagemodels #aiscraping #Anthropic #chatbots #Policy #Amazon #Claude #reddit #alexa #AI
Annual reminder that #cloudflare has a button to kill #aibots that steal your work:
https://arstechnica.com/tech-policy/2024/09/cloudflare-lets-sites-block-ai-crawlers-with-one-click/
Also checkout #nightshade at https://nightshade.cs.uchicago.edu/whatis.html
Wer sich über die vielen tollen Informationsangebote im Internet freut, sollte wissen:
#KI 🤖 randaliert im Netz – #Admins halten dagegen, damit wir Menschen ungestört surfen können.
Lest mal, wie Admins ihre absolut frustrierende aber unsichtbare Abwehrarbeit gegen KI beschreiben – im Blog von @campact:
👉 https://blog.campact.de/2025/05/ki-randaliert-im-netz-admins-halten-dagegen/
❗Nicht vergessen: 25. Juli ist #SysAdminDay
#FediAdmins #KIScraping #AI #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep
Just noticed this setting on Cloudflare 🙃 Probably not great from a sustainability point of view but I can easily see this turning into a popular enabled setting just to spite AI companies... i.e. Let’s see how these bots like an infinite loop instead of my content.
Feeling less and less inclined to put new content on my websites as AI scrapers regularly come by, ignoring robots.txt. How do you deal with that? Seriously? I don't want to feed their machines with machine readable data. I even considered using PDFs with text as images but that is wrong in so many ways...
👀 Esta mañana al comentar los problemas de Wikimedia con el scrapping, un amigo programador me han hablado del proyecto Anubis https://github.com/TecharoHQ/anubis/
"Es bastante sencillo y fácil de implementar en cualquier web medio seria, te cargas automáticamente cualquier scrapper (sea de IA sea de lo que sea). Además, no pueden inventar nada que haga que sea rentable el scrapping con eso puesto." #aiscraping #aiscrapers #wikimedia #anubis #iahastaenlaputasopa
AI Crawlers Overwhelm Open-Source Projects, Forcing Developers to Block Entire Countries
#AI #Web #Robotstxt #AIScraping #OpenSource #Cybersecurity #DataScraping #Scraping #WebScraping
AI scrapers are a plague on the internet
#aiscraping #aiscrapers #ai #llm
https://www.osnews.com/story/141969/foss-infrastructure-is-under-attack-by-ai-companies/
🌐 LLM crawlers continue to DDoS SourceHut | sr_ht status
「 SourceHut continues to face disruptions due to aggressive LLM crawlers. We are continuously working to deploy mitigations. We have deployed a number of mitigations which are keeping the problem contained for now. However, some of our mitigations may impact end-users 」
Serious question, isn't this an issue even with decentralized systems? What's preventing AI bots from just using all of our public data on the Fediverse? Is there any difference?
#ai #AITraining #aiscraping #askfedi
https://techcrunch.com/2025/03/15/bluesky-users-debate-plans-around-user-data-and-ai-training/
Hi #Admins 👋,
Can you give me quotes that explain your fight against #AIScraping? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)
I intend to publish your quotes in a text on @campact 's blog¹ (DE, German NGO).
The quotes should make your work🙏 visible in a generally understandable way
¹ https://blog.campact.de/author/friedemann/
#TDM #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep #Kudurru #Nightshade #Glaze #FediAdmins
Cloudflare unveils tools to give publishers control over AI scraping: New AI Audit feature allows website owners to analyze and manage how AI models access their content, with plans for a marketplace. https://ppc.land/cloudflare-unveils-tools-to-give-publishers-control-over-ai-scraping/?utm_source=dlvr.it&utm_medium=mastodon #Cloudflare #AIScraping #PublishingTools #DigitalMarketing #ContentManagement
How to turn off #AIscraping from your Word documents "#Microsoft Office has slyly turned on an “opt-out” feature that scrapes your #Word,#Excel docs to train its internal AI systems. This setting is turned on by default, and you have to manually uncheck a box in order to opt out. If you are a writer who uses MS Word to write any proprietary content (blog posts, novels, any work you intend to protect w #copyright and/or sell), u want to turn this feature off immediately https://medium.com/illumination/ms-word-is-using-you-to-train-ai-86d6a4d87021
The #WebApp, called #AdobeContentAuthenticity, allows artists to signal that they do not consent for their work to be used by #AI models. It also gives creators the opportunity to add what Adobe is calling “#ContentCredentials,” including their verified identity, social media handles, or other online domains, to their work. #C2PA #DataScraping
#Adobe wants to make it easier for artists to blacklist their work from #AIScraping
https://www.technologyreview.com/2024/10/08/1105234/adobe-wants-to-make-it-easier-for-artists-to-blacklist-their-work-from-ai-scraping/?utm_source=press.coop
Hmm, interesting. I think tools like this are definitely a good thing.
#Adobe wants to make it easier for artists to blacklist their work from #AIscraping | MIT Technology Review @technologyreview #ai #aiart
https://www.technologyreview.com/2024/10/08/1105234/adobe-wants-to-make-it-easier-for-artists-to-blacklist-their-work-from-ai-scraping/
"It’s pretty crazy that not only a) these bots shamelessly harvest all your data without asking for permission and b) they do it in such a brute-force manner.
My coworker and security expert António pointed me to #DarkVisitors, and I’ll probably be installing their #WordPressPlugin on all my sites. For what it’s worth."
@john_fisherman on #AIscraping
https://fred-rocha.medium.com/ai-crawler-bots-on-the-hunt-caf5a59ff478
Meta scraped all public posts for AI https://3dcandy.social/2024/09/meta-scraped-all-public-posts-for-ai/ #ai #aiscraping #boost #facebook #instagram #meta
»Online #publishers face a #dilemma: Allow #AIscraping from #Google or lose #searchvisibility: Blocking the company’s #AIoverviews also blocks its #webcrawler.« https://www.engadget.com/ai/online-publishers-face-a-dilemma-allow-ai-scraping-from-google-or-lose-search-visibility-202246891.html?eicker.news #tech #media
big scoop by @404mediaco:
"#Nvidia employee leaked documents, Slack conversations, and emails to 404 Media showing how the company went about building a video foundational model that would feed into its other products. It's a fascinating look into how a tech giant operates as it's attempting to stay competitive in AI world, and how it gobbles up copyrighted content from around the web in the process."
#AIscraping
https://www.404media.co/nvidia-ai-scraping-foundational-model-cosmos-project/