Lmst

»RSL 1.0 (Really Simple Licensing) statt robots.txt — Neuer Standard für Internet-Inhalte:
Ein neuer Standard, um Inhalte im Internet zu schützen. RSL wird von Akteuren wie Verlagen und der Werbebranche unterstützt.«

Ich erfahre davon erst jetzt - mal sehen wie weit dies nützlich ist und allgemein Webinhalte schützt oder nur wieder die kommerzielle Daten.

👉 https://www.heise.de/news/RSL-1-0-Standard-soll-Verwendung-von-Inhalten-regeln-11111422.html

#rsl #copyright #robotstxt #webdev #web #realsimplelicensing #webstandards #standart #internet #verlage #werbung

RSL 1.0 - L'heure pour les IA, de passer à la caisse, a sonné

https://fed.brid.gy/r/https://korben.info/rsl-web-ia-payer-scraping.html

<p>On vit une époque formidable (non), car d’un côté,
<a href="https://www.theregister.com/2025/12/08/publishers_say_no_ai_scrapers">5,6 millions de sites web bloquent maintenant le GPTBot d’OpenAI</a>
,
<a href="https://www.cloudflare.com/press/press-releases/2025/cloudflare-just-changed-how-ai-crawlers-scrape-the-internet-at-large/">5,8 millions bloquent ClaudeBot</a>
alors que de l’autre côté, ce sont
<a href="https://www.webpronews.com/cloudflares-2025-robots-txt-update-blocks-ai-scraping-adds-pay-per-crawl/">13,26% des bots IA qui se contrefoutent royalement des robots.txt</a>
. Les webmasters sont tous en PLS, et plantent des pancartes “<em>Propriété privée - IA interdit</em>” partout… Mais je vous le donne en mille Émile, ça ne sert strictement à rien !</p>
<p>Il y a quand même des gens très intelligents qui se penchent sur le sujet et hier, c’est un nouveau standard qui vient de sortir pour dire stop à cette comédie ! Cela s’appelle
<a href="https://rslstandard.org/press/rsl-1-specification-2025">Really Simple Licensing (RSL) 1.0</a>
et ça propose quelque chose de radical : Arrêter de bloquer, et commencer à facturer ! Miam !</p>
<p>Concrètement, c’est un petit fichier texte pour passer du fuck-off à la négociation commerciale. Car oui on le sait, le problème avec le robots.txt, c’est que c’est comme demander poliment à des cambrioleurs de ne pas rentrer chez vous. Ça marchait en 1994 quand le web étai

The Register: Publishers say no to AI scrapers, block bots at server level . “Online traffic analysis conducted by BuiltWith, a web metrics biz, indicates that the number of publishers trying to prevent AI bots from scraping content for use in model training has surged since July. About 5.6 million websites presently have added OpenAI’s GPTBot to the disallow list in their robots.txt file, up […]

https://rbfirehose.com/2025/12/11/the-register-publishers-say-no-to-ai-scrapers-block-bots-at-server-level/

The New York Times sues Perplexity for producing ‘verbatim’ copies of its work – The Verge

Credit: NYT Times, gettyimages-2249036304

The New York Times sues Perplexity for producing ‘verbatim’ copies of its work

The NYT alleges Perplexity ‘unlawfully crawls, scrapes, copies, and distributes’ work from its website.

by Emma Roth, Dec 5, 2025, 7:42 AM PS, Emma Roth is a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO.

The New York Times has escalated its legal battle against the AI startup Perplexity, as it’s now suing the AI “answer engine” for allegedly producing and profiting from responses that are “verbatim or substantially similar copies” of the publication’s work.

The lawsuit, filed in a New York federal court on Friday, claims Perplexity “unlawfully crawls, scrapes, copies, and distributes” content from the NYT. It comes after the outlet’s repeated demands for Perplexity to stop using content from its website, as the NYT sent cease-and-desist notices to the AI startup last year and most recently in July, according to the lawsuit. The Chicago Tribune also filed a copyright lawsuit against Perplexity on Thursday.

The New York Times sued OpenAI for copyright infringement in December 2023, and later inked a deal with Amazon, bringing its content to products like Alexa.

Perplexity became the subject of several lawsuits after reporting from Forbes and Wired revealed that the startup had been skirting websites’ paywalls to provide AI-generated summaries — and in some cases, copies — of their work. TheNYT makes similar accusations in its lawsuit, stating that Perplexity’s crawlers “have intentionally ignored or evaded technical content protection measures,” such as the robots.txt file, which indicates the parts of a website crawlers can access.

Perplexity attempted to smooth things over by launching a program to share ad revenue with publishers last year, which it later expanded to include its Comet web browser in August.

Related

“By copying The Times’s copyrighted content and creating substitutive output derived from its works, obviating the need for users to visit The Times’s website or purchase its newspaper, Perplexity is misappropriating substantial subscription, advertising, licensing, and affiliate revenue opportunities that belong rightfully and exclusively to The Times,” the lawsuit states.

Continue/Read Original Article Here: The New York Times sues Perplexity for producing ‘verbatim’ copies of its work | The Verge

#AI #artificialIntelligence #Copyright #Crawlers #Distribution #Lawsuit #NYTWork #OpenAI #Perplexity #RobotsTxt #Scrapping #Sues #TheNewYorkTimes #TheVerge #VerbatimCopies

How a web crawler is supposed to work:

1. Reads /robots.txt
2. Parses robots.txt and honors User-Agent | Allow / Disallow designations
3. Returns periodically to retrieve permitted content

How AI/LLM training crawlers work:

1. Crawls entire website
2. Reads /robots.txt
3. Returns 10 minutes later
4. GOTO 1.

#AI #LLM #webCrawlers #robotsTxt 🔹

robots.txt – kleine Datei, große Wirkung

Wenn Suchmaschinen deine Website besuchen, schauen sie zuerst in die robots.txt. Dort steht, was gecrawlt werden soll – und was nicht.

Das hilft z.B.:
• wichtige Seiten schneller auffindbar zu machen
• unwichtige Bereiche auszuschließen
• Suchmaschinen-Crawls effizient zu gestalten

Wir erklären die wichtigsten Regeln und zeigen Beispiele aus der SEO-Praxis.

Mehr dazu im Blog:
🔗 https://t1p.de/btwdi

#SEO #robotsTXT #Crawling #Suchmaschinen #Website

🦸 Votre application web (coucou Wiki.js) n'a pas de robots.txt ? Pas de panique ! 🚨

Quand une solution ne propose pas de gestion de robots.txt intégrée, NPM arrive à la rescousse ! 📦

Découvrez comment un simple module peut servir ce fichier essentiel directement, vous sauvant la mise et assurant que les moteurs de recherche et autres bots respectent vos règles.

➡️ Le sauveur NPM est là : https://wiki.blablalinux.be/fr/gestion-robots-txt-simple-npm-wikijs

#NPM #robotsTxt #WikiJS #OutilCLI #Dépannage

🤖 STOP aux robots IA indiscrets sur votre site WordPress ! 🚫

Vous en avez marre que les IA viennent sniffer et piller votre contenu ? Protégez votre jardin secret numérique ! 🤫

Découvrez comment utiliser un simple fichier robots.txt pour dire gentiment "non merci" aux aspirateurs d'IA. C'est facile, c'est efficace, et c'est un peu flippant pour Skynet ! 😉

➡️ La marche à suivre ici : https://wiki.blablalinux.be/fr/robots-txt-wordpress-anti-ia

#WordPress #AntiIA #robotsTxt #SécuritéWeb #BlablaLinux

📝 New article: Why We Reject Google: Our Anti-Surveillance SEO Policy

An in-depth look at why Virebent.art deliberately blocks Google and other surveillance-based crawlers, and our strategy for visibility in a privacy-first web.

🔗 https://virebent.art/blog/seo-policy.html

#antiseo #robotstxt #surveillancecapitalism

#Development #Approaches
Rate-limiting requests with Nginx · An alternative approach to counter AI crawlers https://ilo.im/168axr

_____
#RateLimiting #Nginx #WebServer #AI #Scrapers #RobotsTxt #DevOps #WebDev #Backend

Robots.txt Deep Dive: Advanced Configurations for Complex Websites https://www.boldoutlook.com/robotstxt-deep-dive-advanced-configurations-complex-websites/

#webdevelopment #digitalmarketing #webdesign #robotstxt

How to Remove Robots.txt File from WordPress? 🤖📝❌ https://www.youtube.com/watch?v=Wv97WVRK3qw 🎬 #WordPress #Remove #Robotstxt #File #Guide

Khắc phục vấn đề robots.txt trong docker container khi sử dụng reverse proxy. Thêm cấu hình location ~* /robots\.txt$ vào file cấu hình để chặn bot và indexer. #docker #reverseproxy #robotstxt #selfhosted #máy_chủ #đổi_proxy #tài_liệu_đại_chung