#contentscraping

Boston Managed ITbmit
2024-09-06

Content delivery partner Cloudflare released new technology to help website owners from having their content scraped by bots training AI without permission.

msn.com/en-us/news/technology/

2024-06-20

This is sparking interesting discussions. wired.com/story/perplexity-is- Hallucinations and “bullshitting” are definitely an AI thing, I’d probably say it’s their best feature… But Wired’s article focuses on an important topic, scraping content without permission, ignoring robot.txt among other things. Perplexity is not the first and won’t be the last doing this and it definitely causes harm to publishers. The question is: “Why is this happening”? It’s not just because AIs need more accurate sources (instead of making stuff up), but imho it’s because finding the right content has become increasingly challenging, search engines are dominated by SEO practices and search results are disappointing at best. News sites, obviously in need of getting some revenues, are paywalling everything. In many fora, sites like archive.is and unpaywall extensions are often praised under the “free the information” slogan, RSS feeds kind of play a role there too, because in many cases they don’t drive people to visit the original websites. I think this is not much different than what AIs are doing now, and I’m not saying this is legal or ethical, it’s just a fact.
So, my question is: is the ball on the court of AIs, needing to be regulated, or is it on the publishers’, to identify other ways of getting revenues out of this?
#AI #Hallucinations #ContentScraping #SEO #Paywalls #AIRegulation #News #Publishers #Ethics #searchengine #perplexity #chatgpt

2024-06-10

😤 #Scraperbots are automating data theft, extracting your website's content without permission! 🌐

💣 Learn about the impact of scraper bots and how to prevent them: bit.ly/3RiXgya

#contentscraping #bots #webscrapers #webcrawlers #scraping #waf #botmanagement #waap #scrapingbots #apptrana #indusface

FinchHaven sfbaFinchHaven@sfba.social
2024-02-12

@jackwilliambell

I've had newsmast.social domain-blocked for a good while

Last night I saw a rather personal, "sorry I've been gone for so long here's why" post that someone made that -- hashtags used aside -- was clearly not public

Newmast scraped it because of the hashtag and broadcast it out to all subscribers to that hashtag

I have little doubt that the person making the original post was aware of any of this or would have wanted their post broadcast by a content scraper

And note that on their web site they claim to be run by a "charitable organization" and to request donations

"Help us to remain ad-free and non-profit, whilst amplifying impactful, unheard voices on matters of global interest"

Here: newsmastfoundation.org/donate/

NOTE: Firefox throws a security warning; I continued because I've been on that web site before...

cc @seb This (Newsmast) and its ilk is really an issue that needs to be addressed at a larger scale

#Newsmast #Fediblock #ContentScraping

Magpieblogsarahc@mas.to
2024-01-27

' Requiring an email address to read our articles has, for the moment, stopped our content from being scraped and repurposed by AI. It will also, we hope, serve as a preventative measure against the impacts of the internet being flooded by all of this AI-generated drek. We are worried that a flood of low-quality, AI-generated bullshit ... is going to drown out what we do, and make it harder to ... find our work.'

#news #AI #ContentScraping #404Media #discoverability

404media.co/why-404-media-need

GameOPSgameops
2023-07-20

🚨 philnews.co.kr caught RED-HANDED content scraping from top PH sites! 😱 Exposing their irony & copyright infringement! gameops.net/2023/07/philnewsco

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst