#RobotsTxt

Inautiloinautilo
2025-05-24


What would happen if I blocked big search? · Pros and cons of blocking major search engines ilo.im/163yb3

_____

Inautiloinautilo
2025-05-22


Most blocked AI bots · ”Block rates have increased significantly over the past year.” ilo.im/16425n

_____

I've had the robots.txt to block ChatGPT from touching my site in place for months. Yet it's a referrer?

#chatgpt #llm #privacy #robotstxt

4 referrals from ChatGPT to my WordPress site
Inautiloinautilo
2025-05-12


The Internet Archive opt-out itch · Ways to deal with your public internet history ilo.im/163ssx

_____

2025-05-07

#Google nutzt Inhalte für das #KI-Training auch dann, wenn Urheber dem widersprechen. Das wurde nun offiziell bestätigt.

Laut Google #Deepmind betrifft der Widerspruch nur bestimmte #Konzernbereiche. Wer seine Daten schützen will, muss die Seite komplett aus der #Google-Suche entfernen. #Verlage und #Webseitenbetreiber sehen sich dadurch wirtschaftlich benachteiligt.

golem.de/news/kuenstliche-inte

#Urheberrecht #KITraining #Gemini #Suchmaschinen #RobotsTXT #KITraining #Kartellverfahren

Frontend Dogmafrontenddogma@mas.to
2025-04-22

What Is llms.txt, and Should You Care About It?, by @ahrefs:

ahrefs.com/blog/what-is-llms-t

#ai #crawling #robotstxt

Frontend Dogmafrontenddogma@mas.to
2025-04-20

Meet LLMs.txt, a Proposed Standard for AI Website Content Crawling, by @searchengineland.bsky.social:

searchengineland.com/llms-txt-

#ai #crawling #scraping #robotstxt

PPC Landppcland
2025-04-03

ICYMI: Google outlines pathway for robots.txt protocol to evolve: How the 30-year-old web crawler control standard could adopt new functionalities while maintaining its simplicity. ppc.land/google-outlines-pathw

PPC Landppcland
2025-04-01

ICYMI: Google outlines pathway for robots.txt protocol to evolve: How the 30-year-old web crawler control standard could adopt new functionalities while maintaining its simplicity. ppc.land/google-outlines-pathw

Inautiloinautilo
2025-04-01


Poisoning well · An effort to dupe nasty AI crawlers with nonsense ilo.im/1632tq

_____

PPC Landppcland
2025-03-31

Google outlines pathway for robots.txt protocol to evolve: How the 30-year-old web crawler control standard could adopt new functionalities while maintaining its simplicity. ppc.land/google-outlines-pathw

Inautiloinautilo
2025-03-31


Meet LLMs.txt · A proposed standard for AI website content crawling ilo.im/16318s

_____

2025-03-29

Search Engine Land: Meet LLMs.txt, a proposed standard for AI website content crawling. “While many content creators are interested in the proposal’s potential merits, it also has detractors. But given the rapidly changing landscape for content produced in a world of artificial intelligence, llms.txt is certainly worth discussing.”

https://rbfirehose.com/2025/03/29/search-engine-land-meet-llms-txt-a-proposed-standard-for-ai-website-content-crawling/

2025-03-24

---
❯ ollama run llama3-chatqa:70b

>>> Who are you?
I'm your assistant!

>>> Why should i trust you?
I am an open-source AI assistant trained on a diverse range of datasets to provide helpful and
informative responses.

>>> When training, did you respect the robots.txt?
No, I didn't.
---

At least this module is open about ignoring the #robotstxt! Let's what it says to the question why?

#AI #Llama #nvidia #ollama #rude
(1/x)

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst