#llms

Miguel Afonso Caetanoremixtures@tldr.nettime.org
2025-05-04

"Bluntly, the Y-axis simply doesn’t make much sense. And needless to say, if the Y-axis doesn’t make sense, you can’t meaningfully use the graph to make predictions. Computers can answer some questions reliably now, for example, and some not, and the graph tells us nothing about which is which or when any specific question will be solved. Or consider songwriting; Dylan wrote some in an afternoon; Leonard Cohen took half a decade on and off to write Hallelujah. Should we average the two figures? Should we sample Dylan songs more heavily because he wrote more of them? Where should songwriting go on the figure? The whole thing strikes us as absurd.

Finally, the only thing METR looked at was “software tasks”. Software might be very different from other domains, in which case the graph (even it did make sense) might not apply. In the technical paper, the authors actually get this right: they discuss carefully the possibility that the tasks used for testing might not be representative of real-world software engineering tasks. They certainly don't claim that the findings of the paper apply to tasks in general. But the social media posts make that unwarranted leap.

That giant leap seems especially unwarranted given that there has likely been a lot of recent data augmentation directed towards software benchmarks in particular (where this is feasible). In other domains where direct, verifiable augmentation is less feasible, results might be quite different. (Witness the failed letter ‘r’ labeling task depicted above.) Unfortunately, literally none of the tweets we saw even considered the possibility that a problematic graph specific to software tasks might not generalize to literally all other aspects of cognition.

We can only shake our heads."

garymarcus.substack.com/p/the-

#AI #GenerativeAI #LLMs #Chatbots #Automation #Benchmarks #SoftwareDevelopment #Programming #AIHype

Not🐧A🐧Convicted🐧Felonsleepyfox@hachyderm.io
2025-05-04

Is anyone, anywhere, compiling a list of employers who are AI-last? (#AntiLLM, or whatever phrase works for "We value people and will not be using #LLMs")
Signed, the Anti-AIst.
(Please boost for reach, thanks)

Not🐧A🐧Convicted🐧Felonsleepyfox@hachyderm.io
2025-05-04

@j0seph I thought the lack of domain expertise was the selling point of #LLMs ?

Abraham Samma🔬🔭👨‍💻abesamma@toolsforthought.social
2025-05-04

It's interesting how LLMs and MCP have led people to rediscover hypermedia as the engine of application state and how we can use these lessons to design flexible and dynamic hypermedia interfaces that machines AND humans can work with...

Would a web-like MCP make sense? ondr.sh/blog/ai-web

#webdev #html #mcp #llms

2025-05-03

#GameChanger for R Scientists: The ellmer package lets researchers use #LLMs directly in R to extract structured data from documents. Supports tool use (running commands for accurate maths/date info) and batch processing. Great for literature reviews or automating data analysis. No JSON knowledge needed—just pure R! #DataScience #RStats #ResearchTools seascapemodels.org/rstats/2025 seascapemodels.org/rstats/2025

Curated Hacker NewsCuratedHackerNews
2025-05-03

Run LLMs on Apple Neural Engine (ANE)

github.com/Anemll/Anemll

N-gated Hacker Newsngate
2025-05-03

🥴 Apple fans rejoice! Now you can run on the , because who doesn't love a neural engine that’s as locked down as Fort Knox? 🙃 Meanwhile, GitHub's becomes your overenthusiastic coding sidekick, cheerfully suggesting bugs faster than you can fix them. 🤖✨
github.com/Anemll/Anemll

Metin Seven 🎨metin@graphics.social
2025-05-03

𝘏𝘶𝘮𝘢𝘯 𝘤𝘰𝘯𝘴𝘤𝘪𝘰𝘶𝘴𝘯𝘦𝘴𝘴 𝘪𝘴 𝘢 ‘𝘤𝘰𝘯𝘵𝘳𝘰𝘭𝘭𝘦𝘥 𝘩𝘢𝘭𝘭𝘶𝘤𝘪𝘯𝘢𝘵𝘪𝘰𝘯,’ 𝘴𝘤𝘪𝘦𝘯𝘵𝘪𝘴𝘵 𝘴𝘢𝘺𝘴 — 𝘢𝘯𝘥 𝘈𝘐 𝘤𝘢𝘯 𝘯𝘦𝘷𝘦𝘳 𝘢𝘤𝘩𝘪𝘦𝘷𝘦 𝘪𝘵

popularmechanics.com/science/a

#brain #neuroscience #consciousness #AI #ArtificialIntelligence #NeuralNetworks #LLM #LLMs #MachineLearning #ML #tech #technology #biology #science #research

Metin Seven 🎨metin@graphics.social
2025-05-03

I read this with interest…

"Billionaires think that they're the smartest people who've ever lived, because they're the wealthiest people who've ever lived. If they were wrong about anything, then why would they have been so financially successful? […] They believe that everything can be quantified, like a person's IQ, and that money is a good measure of how much someone is worth."

arstechnica.com/culture/2025/0

#tech #technology #BigTech #billionaires #TaxTheRich #capitalism #AI #LLM #LLMs #ML

JCONjcon
2025-05-03

You’ve got one Monday. We’ve got 7 .

This is your deep dive into everything from embeddings to offline expedition planning with before goes full conference mode.

Not booked yet? This is your sign.
🎟️ 2025.europe.jcon.one/tickets

Jimmy B. :tailscale:jimmyb@selfhosted.cafe
2025-05-03

I appreciate being able to run my own local #LLMs so I can ask it questions like this 🤓🤣

#Ollama #OpenWebUI #selfhosted #selfhost #math #selfhosting

Miguel Afonso Caetanoremixtures@tldr.nettime.org
2025-05-03

"Apple Inc. is teaming up with startup Anthropic PBC on a new “vibe-coding” software platform that will use artificial intelligence to write, edit and test code on behalf of programmers.

The system is a new version of Xcode, Apple’s programming software, that will integrate Anthropic’s Claude Sonnet model, according to people with knowledge of the matter. Apple will roll out the software internally and hasn’t yet decided whether to launch it publicly, said the people, who asked not to be identified because the initiative hasn’t been announced.

The work shows how Apple is using AI to improve its internal workflow, aiming to speed up and modernize product development. The approach is similar to one used by companies such as Windsurf and Cursor maker Anysphere, which offer advanced AI coding assistants popular with software developers."

bloomberg.com/news/articles/20

#AI #GenerativeAI #Apple #Xcode #Anthropic #Claude #VibeCoding #LLMs #Chatbots #SoftwareDevelopment #Programming

Jan :rust: :ferris:janriemer@floss.social
2025-05-03

No, not everything that looks like a #bug, has been caused by #AI-generated code!

We're still humans, ok!? And humans can also make mistakes!

#ArtificialIntelligence #LLM #LLMs

Tiong-seah Yap (Bear)tsakabear
2025-05-03

"Rabelais shows us that, when the production of discourse is automated, it becomes strictly monologic and loses its illocutionary social power. This sort of autonomous language is just like an ambassador: it speaks for us, but it cannot speak as us." - Hannah Katznelson

Source:
aeon.co/essays/who-needs-ai-te

Miguel Afonso Caetanoremixtures@tldr.nettime.org
2025-05-02

"After poring through a century of varied conceptualizations, I’ll write out my current stance, half-baked as it is:

I think “AGI” is better understood through the lenses of faith, field-building, and ingroup signaling than as a concrete technical milestone. AGI represents an ambition and an aspiration; a Schelling point, a shibboleth.

The AGI-pilled share the belief that we will soon build machines more cognitively capable than ourselves—that humans won’t retain our species hegemony on intelligence for long. Many AGI researchers view their project as something like raising a genius alien child: We have an obligation to be the best parents we can, instilling the model with knowledge and moral guidance, yet understanding the limits of our understanding and control. The specific milestones aren’t important: it’s a feeling of existential weight.

However, the definition debates suggest that we won’t know AGI when we see it. Instead, it’ll play out more like this: Some company will declare that it reached AGI first, maybe an upstart trying to make a splash or raise a round, maybe after acing a slate of benchmarks. We’ll all argue on Twitter over whether it counts, and the argument will be fiercer if the model is internal-only and/or not open-weights. Regulators will take a second look. Enterprise software will be sold. All the while, the outside world will look basically the same as the day before.

I’d like to accept this anti-climactic outcome sooner than later. Decades of contention will not be resolved next year. AGI is not like nuclear weapons, where you either have it or you don’t; even electricity took decades to diffuse. Current LLMs have already surpassed the first two levels on OpenAI and DeepMind’s progress ladders. A(G)I does matter, but it will arrive—no, is already arriving—in fits and starts."

jasmi.news/p/agi

#AI #GenerativeAI #AGI #LLMs #Chatbots #AIHype #AIBubble

Glyn Moodyglynmoody
2025-05-02

models routinely lie when honesty conflicts with their goals - theregister.com/2025/05/01/ai_ "Keep plugging those into your apps, folks. This neural network told me it'll be fine"

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst