#crawlers

Inautiloinautilo
2025-10-24


Are 73% of e-commerce visitors fake? · What’s broken in measuring online success ilo.im/167nz4

_____

2025-10-21

Just added 280 #Crawlers incoming via #AWS to my personal #Blocklist which is distributed to all #PfSense I maintain.

Crazy, they produce 90% of the traffic of one on my sites......

2025-10-12

"Optimize4AI" là công cụ mới giúp bạn xem metadata trang web của mình trông như thế nào đối với các trình thu thập thông tin và AI. Nó phân tích các khoảng trống metadata và cách AI hiểu nội dung, cung cấp chế độ xem song song (trình duyệt vs AI), phân tích JSON-LD và đánh giá E-E-A-T. Miễn phí 20 lần quét, không cần đăng nhập. Hữu ích cho các nhà phát triển và chủ trang web!

#Optimize4AI #Metadata #SEO #WebDevelopment #AI #Crawlers #SideProject #WebDev #CôngCụWeb #PhânTíchMetadata

https://www

𝓜𝓪𝓻𝓬 𝓐𝓷𝓰𝓮𝓵𝓲bax3l33t
2025-09-29
2025-09-17

Yay, I implemented a way to keep #AI #Crawlers from hammering our punny community Server w/o relying on javascript/anubis yet:

* hashlimit https to few connections/min only. (browsers will use pipelining anyway and make only one/few connections)
* Anything about that hashlimit becomes hard dropped.
* Configure the Webserver to drop a connection when the reply code is >=400
* make a hidden page with thousands of nonsense links that end in 404
* link that hidden page with hidden links from almost any other page.
* put that hidden page (and some non existing links) into robots.txt as 'Disallow'

🍿🎉

Inautiloinautilo
2025-09-13


AI’s free web scraping days may be over · Say hello to RSS’s younger, tougher brother ilo.im/166s9q

_____

Inautiloinautilo
2025-09-10


The web has a new AI payment system · The RSL Standard sets rules for AI scraping fees ilo.im/166ryy

_____

Inautiloinautilo
2025-08-29


Introducing AI Crawl Control · Cloudflare boosts content creators’ control over AI crawlers ilo.im/166gqk

_____

Barry Schwartzrustybrick@c.im
2025-08-28

Some sites (not all) are seeing massive declines in crawl rates in Google Search Console - not super widespread but wide enough... seroundtable.com/google-crawl- via @glenngabe @dhruvpandyadp @senormunoz and @vercel CTO @cramforce and @aleyda @pedrodias and more

#google #bing #googlesearchconsole #crawlers #spiders #crawling #seo #search #bug

cartcharrtchartchart
2025-08-21

I've had it with the aggressive #AI #crawlers now. Some bot has been hitting #MacPorts with a legitimate enough user agent that I can't block it without also blocking users.

Yesterday, it sent 377k requests (62 % of the total), 369k to URLs forbidden in robots.txt from 274k unique IPs. Most of it for content that could be analyzed quicker using `svn checkout` or `git clone`.

Dynamic content on the #web is broken. There's just no way to do that anymore. What a waste of energy.

Skewray Researchskewray@mathstodon.xyz
2025-08-17

I have a recent project to stop (LLM training) crawlers from copyright-thefting my website. I find these bots with either a hidden-link tarpit or by looking for single access events (no css), which I then ban if they come from a cloud server. So far I have learned:

• Amazon AWS, Google Cloud, Microsoft Azure, and Chinese telecom companies are pretty easy to block. These were the early heavy hitters.
• Huawei has little cloud server farms all over the world. I seem to still find about one a day.
• Some mysterious entity rents servers all over the world and crawls by sniping one page at a time. The snipes come in clusters, so all these bots are running the same crawler, with some but not complete inter-communication. Popular cloud companies are OVH Cloud, EGI Hosting, Web2Objects, Host Royale, Digital Ocean, Cloud Innovation, ....

#apache #litespeed #htaccess #crawlers #botfarms

codeberg.org/skewray/htaccess

N-gated Hacker Newsngate
2025-08-15

🤖🎉 Wow, are now the Indiana Jones of , fearlessly solving while we mere mortals fumble with on Mastodon. 🙄 Clearly, the robots are one step closer to world domination, and we're still struggling to open our native apps. 📱💥
social.anoxinon.de/@Codeberg/1

Hacker Newsh4ckernews
2025-08-15
Inautiloinautilo
2025-08-13


Who owns your content in the AI age? · When AI bots take your content without consent ilo.im/165tej

_____

Inautiloinautilo
2025-08-06


The web isn’t URL-shaped anymore · How machines are changing the rules of the web ilo.im/165rul

_____

Inautiloinautilo
2025-08-06


AI search engine fight · Cloudflare and Perplexity clash over crawling ilo.im/165wpr

_____

Inautiloinautilo
2025-08-05


Perplexity is using undeclared crawlers · The AI search engine tries to evade website no-crawl rules ilo.im/165vrc

_____

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst