#reasoning

AI-Phiai_phi
2025-05-01

🧠✨ New from AI-Phi: Our latest Causerie on reasoning is live! We gathered to explore what reasoning means in AI—symbolic logic, LLMs, and the gray areas in between.

Join the conversation and dive into the highlights šŸ‘‰ ai-phi.github.io/posts/causeri

Hacker Newsh4ckernews
2025-04-30
2025-04-28

The Conversation: Popular AIs head-to-head: OpenAI beats DeepSeek on sentence-level reasoning. ā€œI’m a computer scientist. My colleagues āˆ’ researchers from the AI Institute at the University of South Carolina, Ohio State University and University of Maryland Baltimore County āˆ’ and I have developed the Reasons benchmark to test how well large language models can automatically generate […]

https://rbfirehose.com/2025/04/28/the-conversation-popular-ais-head-to-head-openai-beats-deepseek-on-sentence-level-reasoning/

Michael Fenicheldrmike
2025-04-27

@wademcgillis

And.... the video!

youtu.be/nVKUlTGvQXk

And so it continues...
there's nothing more fascinating &/or - y than seeing it clearly with our own lyin' eyes.

There's nothing else to see. Our emperor - a delusional toddler-King, is wearing no clothes! And only he knows how to respect The Pope! Then again, Sir said only he knows more about religion.

The more things change - or not....

Michael Fenicheldrmike
2025-04-27

"Perfect"
We have seen it all, & again today our Toddler King literally wore new "cognitive clothes" - to a funeral.

youtu.be/nVKUlTGvQXk
Here's the video

And it continues... I'm sated for now, no new "news", as there's nothing more fascinating &/or - y than seeing it with our own lyin' eyes.

There's nothing else to see. The emperor is wearing no clothes! And only he knows how to respect The Pope!

2025-04-25

Eine Analyse der Antworten verschiedener Modelle dazu ergäbe vielleicht einen netten Post. Spoiler: Erstaunlich gute erste Antworten, aber man braucht viele Argumente, um die #LLMs von der tatsächlichen Reihenfolge zu überzeugen. Und #Reasoning durch wiederholte Generierungsschleifen wirkt mir weiterhin wie eine eher semiausgereifte Idee. Würde das jemand lesen wollen?

Prompt an ChatGPT o3: Dies ist ein Blick auf das Siebengebirge von der Kƶlner Innenstadt aus. Kannst du die sieben Gipfel in der Reihenfolge von links nach rechts identifizieren? (plus Bild vom Siebengebirge)
N-gated Hacker Newsngate
2025-04-22

šŸŽ“šŸ¤– This groundbreaking revelation from the ivory towers of ponders if can magically transform bland into superstars. Spoiler alert: after endless waffle, the answer is still "TBD." Apparently, all that’s needed is a touch of wizardry from & Shanghai's finest šŸ§™ā€ā™‚ļø.
limit-of-rlvr.github.io/

Hacker Newsh4ckernews
2025-04-22

Does RL Incentivize Reasoning in LLMs Beyond the Base Model?

limit-of-rlvr.github.io/

2025-04-21

AI assisted search-based research actually works now https://bit.ly/4jleebn #AI #search #reasoning

Text Shot: Last week, OpenAI released search-enabled o3 and o4-mini through ChatGPT. On the surface these look like the same idea as we’ve seen already: LLMs that have the option to call a search tool as part of replying to a prompt.

But there’s one very significant difference: these models can run searches as part of the chain-of-thought reasoning process they use before producing their final answer.

This turns out to be a huge deal. I’ve been throwing all kinds of questions at ChatGPT (in o3 or o4-mini mode) and getting back genuinely useful answers grounded in search results. I haven’t spotted a hallucination yet, and unlike prior systems I rarely find myself shouting "no, don’t search for that!" at the screen when I see what they’re doing.

Here are four recent example transcripts:
2025-04-21

AI assisted search-based research actually works now bit.ly/4jleebn #AI #search #reasoning

Text Shot: Last week, OpenAI released search-enabled o3 and o4-mini through ChatGPT. On the surface these look like the same idea as we’ve seen already: LLMs that have the option to call a search tool as part of replying to a prompt.

But there’s one very significant difference: these models can run searches as part of the chain-of-thought reasoning process they use before producing their final answer.

This turns out to be a huge deal. I’ve been throwing all kinds of questions at ChatGPT (in o3 or o4-mini mode) and getting back genuinely useful answers grounded in search results. I haven’t spotted a hallucination yet, and unlike prior systems I rarely find myself shouting "no, don’t search for that!" at the screen when I see what they’re doing.

Here are four recent example transcripts:
trndgtr.comtrndgtr
2025-04-21

AI's Secret Advantage - Dwarkesh Patel Podcast

2025-04-21

Enhancing AI trustworthiness through automated reasoning: A novel method for explaining deep learning and LLM reasoning. ~ Julia Connolly, Oliver Stanton, Sarah Veronica, Liam Whitmore. researchgate.net/publication/3 #LLMs #Reasoning #ITP

eicker.news ᳇ tech newstechnews@eicker.news
2025-04-20

Ā»Vibe Check: #OpenAI’s o3, GPT-4.1, and o4-mini. #o3 is OpenAI’s #mostdeliberate thinker and newest flagship model: Built for #selfdirected #complex #reasoning and tool use.Ā« every.to/context-window/vibe-c #tech #media #news

Miguel Afonso Caetanoremixtures@tldr.nettime.org
2025-04-19

"Dwarkesh Patel: I want to better understand how you think about that broader transformation. Before we do, the other really interesting part of your worldview is that you have longer timelines to get to AGI than most of the people in San Francisco who think about AI. When do you expect a drop-in remote worker replacement?

Ege Erdil: Maybe for me, that would be around 2045.

Dwarkesh Patel: Wow. Wait, and you?

Tamay Besiroglu: Again, I’m a little bit more bullish. I mean, it depends what you mean by ā€œdrop in remote workerā€œ and whether it’s able to do literally everything that can be done remotely, or do most things.

Ege Erdil: I’m saying literally everything.

Tamay Besiroglu: For literally everything. Just shade Ege’s predictions by five years or by 20% or something.

Dwarkesh Patel: Why? Because we’ve seen so much progress over even the last few years. We’ve gone from Chat GPT two years ago to now we have models that can literally do reasoning, are better coders than me, and I studied software engineering in college. I mean, I did become a podcaster, I’m not saying I’m the best coder in the world.

But if you made this much progress in the last two years, why would it take another 30 to get to full automation of remote work?

Ege Erdil: So I think that a lot of people have this intuition that progress has been very fast. They look at the trend lines and just extrapolate; obviously, it’s going to happen in, I don’t know, 2027 or 2030 or whatever. They’re just very bullish. And obviously, that’s not a thing you can literally do.

There isn’t a trend you can literally extrapolate of ā€œwhen do we get to full automation?ā€. Because if you look at the fraction of the economy that is actually automated by AI, it’s very small. So if you just extrapolate that trend, which is something, say, Robin Hanson likes to do, you’re going to say, ā€œwell, it’s going to take centuriesā€ or something."

dwarkesh.com/p/ege-tamay
#AI #LLM #Reasoning #Chatbots #AGI #Automation #Productivity

eicker.news ᳇ tech newstechnews@eicker.news
2025-04-19

Ā»#OpenAI's new #reasoning #AImodels #hallucinate more: Perhaps more concerning, the ChatGPT maker doesn’t really know why it’s happening.Ā« techcrunch.com/2025/04/18/open #tech #media #news

Global Threadsglobalthreads
2025-04-19

šŸ¤– AI | OPENAI
šŸ”“ New Reasoning AIs Hallucinate More

šŸ”ø o3 & o4-mini outperform older models in coding & math — but hallucinate more.
šŸ”ø On PersonQA, o3 hallucinated 33% of answers, o4-mini 48%.
šŸ”ø No clear cause; scaling reasoning may amplify false claims.
šŸ”ø Transluce: o3 fabricates actions like fake code execution.

2025-04-18

bsky.app/profile/financialtime

#OpenAI and start-ups race to transform IT and society.

ChatGPT o3, + o4-mini models are more effective at solving programming problems, using #reasoning, giving time to think through complex queries.

... research from coding platform GitHub found, 92% of US developers use #AI #coding tools.

Chief product officer at Anthropic (Claude AI), said the IT's role would increasingly involve ā€œunderstanding the requirements [of users] & working as a team", and QA products.

Mr Tech Kingmrtechking
2025-04-16

More compute for LLM reasoning isn't a magic bullet. MS Research finds gains vary by model/task, costs fluctuate, & longer answers aren't always better. Key takeaway: Efficiency & verification matter.

Microsoft Study: More AI Compute Doesn't Mean Better Results
LeisureguyLeisureguy@c.im
2025-04-13

A study published by Nature shows what had led to Congress's being so ineffective:

leisureguy.ca/2025/04/13/congr

#Congress #evidence #emotion #reasoning #politics #USDownfall

2025-04-12

A nation of idiots.

Andreas Schleicher, the head of education and skills at the O.E.C.D., told The Financial Times, ā€œThirty percent of Americans read at a level that you would expect from a 10-year-old child.ā€ He continued, ā€œIt is actually hard to imagine — that every third person you meet on the street has difficulties reading even simple things.ā€

nytimes.com/2025/04/10/opinion

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst