https://www.the-independent.com/tech/bots-internet-traffic-ai-chatgpt-b2733450.html
#cybersecurity #bots #AI #LLMs #Bytedance #ByteSpider
Fuck off " #AI " bots (in this image #OpenAI #Claudebot #Bytespider all get a mention) and your #IP #theft
@khobochka guess why I maintain a #Scraper #blocklist?
http://hil-speed.hetzner.com/10GB.bin
as an extra middlefinger!@alice #Funfact: #ValueRemoving #RentSeekers like #ClownFlare aren't even good at stoping #bots literally #DDoS'ing a site offline as #MattKC learned.
#Cloudflare #EpicFail #TikTok #ByteDance #ByteSpider #Crawlers #DontDoCloud
@fuchsiii @lynn @LunaDragofelis also in case anyone needs to be convinced how much of a useless value-remover ClownFlare / CloudFlare are in "preventing DDoS attacks":
@MattKC had been DDoS'd by ByteDance's Crawler into downtime!
#CloudFlare #ClownFlare #MattKC #DDoS #ByteDance #ByteSpider #TikTok #NetworkSecurity #Hosting #ITsec #ComSec
Hementxe informazio pixkat gehiago #Bytespider eta beste lagun batzuei buruz.
➡️ https://darkvisitors.com/agents
Webguneak eta zerbitzariak badituzue, seguru ongi etorriko zaizuela,
#Bytespider ez nuen ezagutzen.
Norbaitek badu honi buruzko informaziorik?
Gutti entzun dut scraper honi buruz 🧐
@GossiTheDog Well, they can be forced to if not face #accountability, at least take #consequences.
#DropKiwifarms worked as a unified effort
Customers yeeting #ClownFlare did force them to yeet #KiwiFarms.
#Cloudflare has been a #RogueISP for over a decade now as they accept gross violations of their own #ToS and host #Daesh propaganda sites...
Pretty shure #Brazil will hold CloudFlare contempt and force them to either fire #Twitter as client or get #blocked as well...
CloudFlare will then yeet #Shitter because #ApartheidEmeralBoy is known to bounce checks and refusing to pay on time, so he doesn't even make this something worth risking.
Cloudflare is and will always remain a shitty hoster - period!
I still block #Cloudflare's entire #ASN as a security measure since they shield #cybercriminals.
Their entire #ValueRemoving business is just a form of #racketeering that should not only not exist, but be illegal to begin with.
Every half-decent #hoster offers #DDoS protevtion these days eithout ClownFlare.
ClownFlare doesn't even prevent DDoS attacks, but lets #ByteSpider DDoS their customers!
@xogium this issue of excessive crawlers is sadly nothing new. @MattKC / #MattKC experienced the same with #ByteSpider, the #Scraper used by #TikTok which results basically in his site getting #DDoS'd despite #ClownFlare being tasked to prevent it!
Personally, I've run out of patience and tolerance for such actions by #GAFAMs and #TechBros and I'm so close to just blocklist their entire ASN as a matter of principle!
@ParadeGrotesque @ellie sadly, that doesn't help against #ByteSpider and other hyperaggressive crawlers that just crawl harder if one.blocks them anyway else but #DROP|ping connections and .htaccess rules...
#TLDR: Fedi never asked for this and it's considered a as much as an asshole move as violating the #robotstxt and literally DDoS'ing a site with a Crawler like #ByteDance does regularly using #ByteSpider...
https://www.youtube.com/watch?v=Hi5sd3WEh0c
https://github.com/greyhat-academy/lists.d/issues/48
http://www.robotstxt.org
As a sysadmin, I've been seeing a massive increase the #Bytedance #Bytespider bot indexing the sites I manage. In the order of magnitude of at least 4x the rate of #GoogleBot, which is really saying something. After blocking their IP address range, they started indexing from the #AWS Singapore netblock, which I thought was an interesting workaround.
It's almost like they won't take no for an answer.
My best guess is they're scraping every page and image they can get their hands on to train #AI, unless they're planning to turn #TikTok into a web search engine.
The response we got from ByteDance on what ByteSpider does has not added any detail "is used in ByteDance products or future products" though it does specifically say it's for "search".
I don't think we're going to get anything helpful from them.
#SEO #byteSpider
We recently noticed a fair bit of traffic on www.bbc.co.uk & www.bbc.com from a User Agent which identifies itself as "ByteSpider" (& has a @bytedance.com email address).
Lots of docs on the web state it doesn't obey robots.txt but ByteDance have told us it *does*:
> *...in the robots.txt files*
*> user-agent:Bytespider*
*> Disallow:/*
Thought that might be worth documenting as it might be a recent change & several of us searched but found zero docs from ByteDance
🕸️ If your site hasn’t been visited by #ByteSpider, lucky you!
Take a moment to prepare. Sorta like your startup blowing up overnight except you come out even broker and no one actually visits your site.
Boo 👻