#Crawling

PPC Landppcland
2026-02-07

Testing tool simulates Google's 2MB HTML limit as SEO professionals assess crawling impact: Dave Smart added 2MB truncation feature to Tame the Bots fetch tool on February 6, enabling technical SEO professionals to simulate Googlebot's reduced file size limits. ppc.land/testing-tool-simulate

Lyrical Garfield 🎶 :garfield:LyricalGarfield@masto.ai
2026-02-06
Original Garfield comic from March 7, 2022
Text replaced with lyrics from: Crawling

Transcript:
• These Wounds, They Will Not Heal



The image shows an animated graphic strip featuring three different scenes. In the first scene, a cat is lying on a bed, with a large voice bubble above it that says "These wounds will not hurt not heal." The second scene shows the cat with a smaller voice bubble above it that says "Boom." In the final scene, a third cat is lying on a bed in front of the first cat. This third cat has a smaller voice bubble above it that says "Cat." The three frames are arranged side by side, creating a visual representation of the story.
Lyrical Garfield 🎶 :garfield:LyricalGarfield@masto.ai
2026-01-30
Original Garfield comic from November 13, 2021
Text replaced with lyrics from: Crawling

Transcript:
• Against My Will I Stand Beside My Own Reflection
• It's Haunting




The image consists of a comic strip with three panels, each featuring the same cat wearing a sweater, sitting in front of a ball of yarn. The panels are arranged vertically, with each panel spanning a third of the height of the image. In the first panel, the cat is sitting on the left side of the image. In the second panel, the cat is sitting in the middle section of the image. Finally, in the third panel, the cat is sitting on the right side of the image.

Each panel shows the cat wearing a sweater and staring at the ball of yarn. The panels are colored with different shades of orange and blue, creating a vibrant and visually appealing comic strip. The cat seems to be the main subject of the comic, with the sweater and yarn as the central objects.
2026-01-26

[Перевод] Тихая смерть robots.txt

Десятки лет robots.txt управлял поведением веб-краулеров. Но сегодня, когда беспринципные ИИ-компании стремятся к получению всё больших объёмов данных, базовый общественный договор веба начинает разваливаться на части. В течение трёх десятков лет крошечный текстовый файл удерживал Интернет от падения в хаос. Этот файл не имел никакого конкретного юридического или технического веса, и даже был не особо сложным. Он представляет собой скреплённый рукопожатием договор между первопроходцами Интернета о том, что они уважают пожелания друг друга и строят Интернет так, чтобы от этого выигрывали все. Это мини-конституция Интернета, записанная в коде. Файл называется robots.txt; обычно он находится по адресу вашвебсайт.com/robots.txt . Этот файл позволяет любому, кто владеет сайтом, будь то мелкий кулинарный блог или многонациональная корпорация, сообщить вебу, что на нём разрешено, а что нет. Какие поисковые движки могут индексировать ваш сайт? Какие архивные проекты могут скачивать и сохранять версии страницы? Могут ли конкуренты отслеживать ваши страницы? Вы сами решаете и объявляете об этом вебу. Эта система неидеальна, но она работает. Ну, или, по крайней мере, работала. Десятки лет основной целью robots.txt были поисковые движки; владелец позволял выполнять скрейпинг, а в ответ они обещали привести на сайт пользователей. Сегодня это уравнение изменилось из-за ИИ: компании всего мира используют сайты и их данные для коллекционирования огромных датасетов обучающих данных, чтобы создавать модели и продукты, которые могут вообще не признавать существование первоисточников. Файл robots.txt работает по принципу «ты — мне, я — тебе», но у очень многих людей сложилось впечатление, что ИИ-компании любят только брать. Cегодня в ИИ вбухано так много денег, а технологический прогресс идёт вперёд так быстро, что многие владельцы сайтов за ним не поспевают. И фундаментальный договор, лежащий в основе robots.txt и веба в целом, возможно, тоже утрачивает свою силу.

habr.com/ru/companies/ruvds/ar

#robotstxt #вебкраулер #crawling #openai #ruvds_перевод

Lyrical Garfield 🎶 :garfield:LyricalGarfield@masto.ai
2026-01-21
Original Garfield comic from July 1, 2021
Text replaced with lyrics from: Crawling

Transcript:
• Confusing What Is Real
• Discomfort, Endlessly Has Pulled Itself Upon Me



The image is a three-panel comic strip featuring a cat and a dog. The cat is on the left, holding a blue and white bowl, with a thought bubble above its head stating, "Confronting what is real." The dog is on the right, with a thought bubble above its head saying, "I'm tired of pulling it's tail." The rest of the comic strip is filled with humor and situational scenarios, showcasing the cat and the dog's adventures together.
Lyrical Garfield 🎶 :garfield:LyricalGarfield@masto.ai
2026-01-16
Original Garfield comic from May 3, 2021
Text replaced with lyrics from: Crawling

Transcript:
• Fear Is How I Fall
• Confusing What Is
• Real
• Discomfort, Endlessly Has Pulled Itself Upon Me
• Distracting, Reacting



This is a cartoon strip featuring three panels, each depicting a different scene involving a character named Garfield. In the first panel, Garfield is lying on the ground with his head resting on the floor. The second panel shows the character talking with another person while making a funny face. The third panel depicts Garfield sitting on the ground, looking down while the character from the first panel stands behind him. The strip is set against a gray background, and the panels are arranged in a vertical order.
2026-01-16

Crawl budget determines how Google crawls and indexes your website pages. Managing it properly ensures that important content gets discovered quickly. Let’s explore simple strategies to improve SEO results!

Website: ondigitals.com/crawl-budget/
#ondigitals #ondigitalsagency #crawlbudget #crawling

Lyrical Garfield 🎶 :garfield:LyricalGarfield@masto.ai
2026-01-09
Original Garfield comic from January 2, 2021
Text replaced with lyrics from: Crawling

Transcript:
• Crawling In My Skin
• These Wounds, They Will Not Heal



The image features a garfield comic strip titled "Crawling in My Skin". There are three panels in total, each depicting a different scenario.

In the first panel, a garfield is shown crawling in his skin while he is using a laptop computer. This panel captures a humorous moment and provides a visual representation of the garfield's crawling experience.

The second panel depicts a garfield looking at a computer screen. It shows the garfield's curiosity towards the device and his attempt to understand it. This panel adds an element of surprise and intrigue to the comic strip.

The third panel features a garfield using a computer mouse. This panel captures a more practical aspect of the garfield's interaction with technology.

Overall, the comic strip provides various illustrations of the garfield's experiences with technology, making it a visually engaging and entertaining piece.
Lyrical Garfield 🎶 :garfield:LyricalGarfield@masto.ai
2026-01-04
Original Garfield comic from October 26, 2020
Text replaced with lyrics from: Crawling

Transcript:
• Against My Will I Stand Beside My Own Reflection
• It's Haunting
• How I Can't Seem
• To Find Myself Again



The comic strip is a collection of three panels, each depicting a different scene. In the first panel, a cat is lying on a chair, and a caption reads, "Agonist my will; Desiring my own reflection. How can I ever be happy when I can't even see my own reflection?". This panel is followed by a panel where the cat is seen sitting in front of a computer, and a caption reads, "The cat is staring at the computer screen. How can I ever be happy when I can't even see my own reflection?". In the third panel, the cat is seen sitting in a chair and appears to be enjoying his time. The caption in this panel reads, "The cat is napping on the chair. How can I ever be happy when I can't even see my own reflection?".
Marfisamarfisa
2026-01-03
Lyrical Garfield 🎶 :garfield:LyricalGarfield@masto.ai
2025-12-23
Original Garfield comic from April 30, 2020
Text replaced with lyrics from: Crawling

Transcript:
• Distracting, Reacting
• Against My Will I Stand Beside My Own Reflection
• It's Haunting


--------------
Original Text:
• Jon:  Hey, Garfield...  Have you seen the big suitcase?
• Garfield:  You mean my new lunch tote?

The image is a comic strip featuring three panels. In the first panel, a man stands in front of a desk, with a suitcase beside him. The second panel shows the man placing his hand over the suitcase. The third panel shows the man pointing at the suitcase with his hand. The panels are arranged in a line, with each panel having a caption below it. The overall scene is light-hearted and humorous, as the man interacts with the suitcase.
2025-12-22

From this summer (July 2nd) until today (Dec 22nd), the OpenAI GPTbot has fetched 2,659,115 pages from my Sundial demo calendar, which has a robots.txt telling crawlers to not bother, as there is an infinite number of pages in the calendar.

The furthest back their bot has reached so far is the year -1222, and the furthest in the future they have reached so far is the year 7776...

My accidental AI crawler tarpit keeps on serving pages.

Lyrical Garfield 🎶 :garfield:LyricalGarfield@masto.ai
2025-12-20
Original Garfield comic from March 17, 2020
Text replaced with lyrics from: Crawling

Transcript:
• Discomfort, Endlessly Has
• Pulled Itself Upon Me
• Distracting, Reacting


--------------
Original Text:
• "Echo Point"
• Garfield  Yawn!
• Voice:  Z.
• Garfield:  Don't get ahead!

The image is a vibrant comic strip that features a series of scenes depicting a Garfield-themed cat. The comic strip is split into three panels, each highlighting different aspects of the cat's adventures and misadventures.

In the first panel, the Garfield-themed cat is shown on his feet, walking on top of a hill. The cat appears to be contemplating his next move, possibly trying to scale the hill or looking for a way to descend the incline.

In the second panel, the cat is seen attempting to climb a tree, using its claws to grip the bark. It is at this point that a line of text is prominently displayed, reading "ECHO POINT." The cat is likely trying to communicate with the Echo Point or looking for a way to reach the tree.

The final panel shows the Garfield-themed cat successfully climbing the tree, reaching the Echo Point with ease. It seems that the cat has finally accomplished its goal, possibly to get the Echo Point to listen to him or to inform him about his surroundings.
Bobulous :rust: :codeberg:bobulous@fosstodon.org
2025-12-15

I'm (slowly, stutteringly) writing a website link checker, purely to get a bit of practice in Rust. (No use of chatbots/LLMs at any point.)

It's got to the point where I have a functional, but buggy, single-threaded site crawler which works a bit like the (perfectly good) W3C Link Checker, but runs in the console.

After bug fixing, I next want to use threading to fetch multiple pages at once, because I rarely get a chance to work with concurrency.

#Rust #RustLang #Crawling #HTML #WebDev

2025-12-15

scanning pubmed stores in neo4j cypher queries

neo4j@neo4j> match (n:Researcher)-[:AUTHORED]->(p:Publication) return n.name, collect(p.doi) as pubs limit 100;

syntax a bit verbose but why not

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst