#Scraping

DocYeet :verified:docyeet@halis.io
2025-07-07

Wow ok, done

That was so easy

Kudos to this blog post for the amazing tutorial : xeiaso.net/blog/2025/anubis/

Managed to also quickly add a grafana dashboard to reflect some metrics, and those numbers give some perspective to the insane spam all the internet is under, just to generate more slop

#selfhosted #homelab #kubernetes #grafana #prometheus #anubis #gitea #faang #spam #ai #nginx #ingress #scraping

Grafana dashboard showing a bunch of metrics related to anubis' work and how Gitea was getting spamed
DocYeet :verified:docyeet@halis.io
2025-07-07

Ok, time to deploy Anubis in front of Gitea, I'm done with those FAANG oligarchs scraping my repos 24/7 to check if anything changed...

F*ck off.

But that also means Gitea might get unstable for some time, woops

If you are curious : git.halis.io

If you see the cute furry, it worked

#homelab #selfhosted #kubernetes #anubis #nginx #ingress #gitea #ai #scraping #faang

2025-07-05

@zeldman

Watt is being Dunn about AI scraping images and descriptions?

Make RED sure you fill your gravy description meat with AI hostile get em on the beaches words.

Images uploaded to mastodon should have AI poison added to them.

#Scraping #AI #ZuckSucks

2025-07-05

#ia
#scraping
#korben

Cloudflare bloque les IA par défaut et lance le "Pay Per Crawl" - La fin du pillage gratuit | Intelligence artificielle | Le site de Korben
korben.info/cloudflare-bloque-

ugpl.net/blogugpl_net
2025-07-05
2025-07-04

Really interesting project Anubis to protect against #LLM scraping bots : anubis.techaro.lol/ #Scraping #bots

Rod2ik 🇪🇺 🇨🇵 🇪🇸 🇺🇦 🇨🇦 🇩🇰 🇬🇱rod2ik
2025-07-04
Rod2ik 🇪🇺 🇨🇵 🇪🇸 🇺🇦 🇨🇦 🇩🇰 🇬🇱rod2ik.bsky.social@bsky.brid.gy
2025-07-04

Le #scraping #payant : vers un changement radical du modèle économique de l’ #IA #AI #générative ? www.journaldugeek.com/2025/07/04/l...

Le scraping payant : vers un c...

Alec Muffettalecmuffett
2025-07-02

Civil Society: Cloudflare’s latest change {blocks, unblocks} network use by {people, software} that we {hate, love} – {yay, boo} this is {great, terrible}!
alecmuffett.com/article/113629

2025-07-01

Civil Society: Cloudflare’s latest change {blocks, unblocks} network use by {people, software} that we {hate, love} – {yay, boo} this is {great, terrible}!

Details don’t matter – pick your own headline. I doubt we have heard the last of this, but this, too, shall pass:

With Cloudflare’s new setting, websites can block – by default – online bots that scrape their data

https://www.nytimes.com/2025/07/01/technology/cloudflare-ai-data.html

2022: cloudflare blocks kiwifarms, but in 2025 it still exists:

https://www.theguardian.com/technology/2022/sep/04/cloudflare-reverses-decision-and-drops-trans-trolling-website-kiwi-farms

Quote:

In a blog post [in September 2022], which didn’t mention Kiwi Farms or the pressure campaign, Cloudflare’s chief executive, Matthew Prince, and its vice-president of public policy, Alissa Starzak, suggested the company regretted taking action against the far-right websites 8chan and Daily Stormer in 2019 and 2017, saying there was a “deeply troubling” response afterwards from authoritarian regimes calling for the company to block human rights websites.

2017: cloudflare blocks daily stormer, but in 2025 it still exists:

https://blog.cloudflare.com/why-we-terminated-daily-stormer/

2016: cloudflare blocks users of the tor project:

https://blog.torproject.org/trouble-cloudflare/

https://blog.cloudflare.com/the-trouble-with-tor/

2018: cloudflare introduces tor project onion services:

https://blog.cloudflare.com/cloudflare-onion-service/

Personal Perspective

Cloudflare’s position is expeditious. Don’t read too much into what either the long term impact will be, nor what the moral impact will pan out as.

#ai #censorship #cloudflare #scraping

Petra van CronenburgNatureMC@mastodon.online
2025-07-01

@akamran @davidtoddmccarty If you search Google for #Mastodon hashtag scraping, you find software and programs that help AI for doing that. It exists.

Fact is that from today, the main instances mastodon.social and mastodon.online prohibit #scraping officially: techcrunch.com/2025/06/17/mast

Problem of decentralisation: admins/users of other instances must get aware of the problem and change their terms, too.

It may be funny but it's no joke.

#gravy

Tommaso Gagliardonitomgag@infosec.exchange
2025-07-01

I keep reading rumours that #gravy breaks AI crawlers. I am skeptical. Can anyone link a proper source?

#AI #ML #scraping

2025-07-01

Something tells me it's not cloudflare who should get paid, but we've got to start somewhere.

#scraping #ai #arstechnica

arstechnica.com/tech-policy/20

2025-06-25

Fighting fire with fire: how to tackle the AI bots that threaten the open Web

It is a measure of how fast the field of AI has developed in the three years since Walled Culture the book (free digital versions available) was published that the issue of using copyright material for training AI systems, briefly mentioned in the book, has become one of the hottest topics in the copyright world, as numerous posts on this blog attest.

The current situation sees the copyright […]

#aiBots #cloudComputing #cloudflare #firewalls #freeSoftware #genai #glamELab #openSource #openWeb #robotsTxt #scraping #survey #training #unc #wikimedia

walledculture.org/fighting-fir

2025-06-23

⚖️ Kammergericht Berlin, Urteil vom 03.04.2025, 1 U 44-23: Kein Schadensersatzanspruch aus Art. 82 DSGVO bei bereits bestehender Datenzugänglichkeit. #Schadensersatz #Immaterieller #Schaden #Scraping #teamdatenschutz #dsgvoportal dsgvo-portal.de/gerichtsentsch

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst