#robotstxt

I've had the robots.txt to block ChatGPT from touching my site in place for months. Yet it's a referrer?

#chatgpt #llm #privacy #robotstxt

4 referrals from ChatGPT to my WordPress site
Inautiloinautilo
2025-05-12


The Internet Archive opt-out itch · Ways to deal with your public internet history ilo.im/163ssx

_____

2025-05-07

#Google nutzt Inhalte für das #KI-Training auch dann, wenn Urheber dem widersprechen. Das wurde nun offiziell bestätigt.

Laut Google #Deepmind betrifft der Widerspruch nur bestimmte #Konzernbereiche. Wer seine Daten schützen will, muss die Seite komplett aus der #Google-Suche entfernen. #Verlage und #Webseitenbetreiber sehen sich dadurch wirtschaftlich benachteiligt.

golem.de/news/kuenstliche-inte

#Urheberrecht #KITraining #Gemini #Suchmaschinen #RobotsTXT #KITraining #Kartellverfahren

Frontend Dogmafrontenddogma@mas.to
2025-04-22

What Is llms.txt, and Should You Care About It?, by @ahrefs:

ahrefs.com/blog/what-is-llms-t

#ai #crawling #robotstxt

Frontend Dogmafrontenddogma@mas.to
2025-04-20

Meet LLMs.txt, a Proposed Standard for AI Website Content Crawling, by @searchengineland.bsky.social:

searchengineland.com/llms-txt-

#ai #crawling #scraping #robotstxt

PPC Landppcland
2025-04-03

ICYMI: Google outlines pathway for robots.txt protocol to evolve: How the 30-year-old web crawler control standard could adopt new functionalities while maintaining its simplicity. ppc.land/google-outlines-pathw

PPC Landppcland
2025-04-01

ICYMI: Google outlines pathway for robots.txt protocol to evolve: How the 30-year-old web crawler control standard could adopt new functionalities while maintaining its simplicity. ppc.land/google-outlines-pathw

Inautiloinautilo
2025-04-01


Poisoning well · An effort to dupe nasty AI crawlers with nonsense ilo.im/1632tq

_____

PPC Landppcland
2025-03-31

Google outlines pathway for robots.txt protocol to evolve: How the 30-year-old web crawler control standard could adopt new functionalities while maintaining its simplicity. ppc.land/google-outlines-pathw

Inautiloinautilo
2025-03-31


Meet LLMs.txt · A proposed standard for AI website content crawling ilo.im/16318s

_____

2025-03-29

Search Engine Land: Meet LLMs.txt, a proposed standard for AI website content crawling. “While many content creators are interested in the proposal’s potential merits, it also has detractors. But given the rapidly changing landscape for content produced in a world of artificial intelligence, llms.txt is certainly worth discussing.”

https://rbfirehose.com/2025/03/29/search-engine-land-meet-llms-txt-a-proposed-standard-for-ai-website-content-crawling/

2025-03-24

---
❯ ollama run llama3-chatqa:70b

>>> Who are you?
I'm your assistant!

>>> Why should i trust you?
I am an open-source AI assistant trained on a diverse range of datasets to provide helpful and
informative responses.

>>> When training, did you respect the robots.txt?
No, I didn't.
---

At least this module is open about ignoring the #robotstxt! Let's what it says to the question why?

#AI #Llama #nvidia #ollama #rude
(1/x)

2025-03-13

Search Engine Journal: Google Publishes New Robots.txt Explainer. “Google published a new Robots.txt refresher explaining how Robots.txt enables publishers and SEOs to control search engine crawlers and other bots (that obey Robots.txt). The documentation includes examples of blocking specific pages (like shopping carts), restricting certain bots, and managing crawling behavior with simple […]

https://rbfirehose.com/2025/03/13/search-engine-journal-google-publishes-new-robots-txt-explainer/

Ross A. Bakerross@rossabaker.com
2025-03-10

Tracked down my Forgejo CPU spikes with pprof: an otherwise acceptable crawler is indexing each commit of my personal weather station data. All 107,980 of them. Blame info, too.

Many Forgejo paths are nonsensical to crawl, even by good bots. Codeberg's robots.txt is a great start for these.

codeberg.org/robots.txt

This should both relieve pressure and expose more bad bots.

#Forgejo #RobotsTxt

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst