#Reasoning

Yassine BelouedyassiNebeL@phpc.social
2025-06-01

AI needs data! Not anymore, the absolute zero reasoner (AZR) which operates through the following steps:
1) Task generation.
2) Problem solving.
3) Verification via code execution.
4) Iterative learning.

Check out the following paper for more information:

arxiv.org/abs/2505.03335v2

#ai #innovation #reasoning #azr #phpc

2025-05-31

QwenLong-L1 solves long-context reasoning challenge that stumps current LLMs https://venturebeat.com/ai/qwenlong-l1-solves-long-context-reasoning-challenge-that-stumps-current-llms/ #AI #reasoning

Text Shot: The paper notes that models trained with QwenLong-L1 become better at “grounding” (linking answers to specific parts of a document), “subgoal setting” (breaking down complex questions), “backtracking” (recognizing and correcting their own mistakes mid-reasoning), and “verification” (double-checking their answers).

For instance, while a base model might get sidetracked by irrelevant details in a financial document or get stuck in a loop of over-analyzing unrelated information, the QwenLong-L1 trained model demonstrated an ability to engage in effective self-reflection. It could successfully filter out these distractor details, backtrack from incorrect paths, and arrive at the correct answer.
2025-05-31

QwenLong-L1 solves long-context reasoning challenge that stumps current LLMs venturebeat.com/ai/qwenlong-l1 #AI #reasoning

Text Shot: The paper notes that models trained with QwenLong-L1 become better at “grounding” (linking answers to specific parts of a document), “subgoal setting” (breaking down complex questions), “backtracking” (recognizing and correcting their own mistakes mid-reasoning), and “verification” (double-checking their answers).

For instance, while a base model might get sidetracked by irrelevant details in a financial document or get stuck in a loop of over-analyzing unrelated information, the QwenLong-L1 trained model demonstrated an ability to engage in effective self-reflection. It could successfully filter out these distractor details, backtrack from incorrect paths, and arrive at the correct answer.
Hacker Newsh4ckernews
2025-05-29

Superhuman performance of an LLM on the reasoning tasks of a physician

arxiv.org/abs/2412.10849

2025-05-29

Researchers Warn Against Treating #AI Outputs as Human-Like #Reasoning - Slashdot

#Arizona State University researchers are pushing back [PDF] against the widespread practice of describing AI language models' intermediate text generation as "reasoning" or "thinking," arguing this #anthropomorphization creates dangerous misconceptions about how these systems actually work

tech.slashdot.org/story/25/05/

KINEWS24KiNews
2025-05-29

🚀 DeepSeek R1-0528 erklärt: Was kann das neue Open-Source-Wunder wirklich?

🔹 Schlauer denken mit 128k Tokens
🔹 CoT-Reasoning der Extraklasse
🔹 Open Source schlägt Kommerz

Jetzt LIKEN, teilen, LESEN und FOLGEN! Schreib uns in den Kommentaren!

kinews24.de/deepseek-r1-0528-u

2025-05-28

Less is more: Meta study shows shorter reasoning improves AI accuracy by 34% https://venturebeat.com/ai/less-is-more-meta-study-shows-shorter-reasoning-improves-ai-accuracy-by-34/ #AI #reasoning

Text Shot: Researchers from Meta’s FAIR team and The Hebrew University of Jerusalem have discovered that forcing large language models to “think” less actually improves their performance on complex reasoning tasks.

The study released today found that shorter reasoning processes in AI systems lead to more accurate results while significantly reducing computational costs.
2025-05-28

Less is more: Meta study shows shorter reasoning improves AI accuracy by 34% venturebeat.com/ai/less-is-mor #AI #reasoning

Text Shot: Researchers from Meta’s FAIR team and The Hebrew University of Jerusalem have discovered that forcing large language models to “think” less actually improves their performance on complex reasoning tasks.

The study released today found that shorter reasoning processes in AI systems lead to more accurate results while significantly reducing computational costs.
Thinking MunkThinkingMunk
2025-05-28

Appreciation of top-tier rhetoric | @destiny

Global Threadsglobalthreads
2025-05-25

🤖 AI
🔴 OpenAI Upgrades Operator Agent with o3 Model

🔸 Operator now runs on o3—improved in reasoning & math; API version still uses GPT-4o.
🔸 Agent can browse the web & run software via cloud VM.
🔸 o3-trained Operator hardened against misuse & sensitive data prompts.

Jesus Castagnetto 🇵🇪jmcastagnetto
2025-05-23

From ”A framework for evaluating the chemical and abilities of large language models against the expertise of chemists"

nature.com/articles/s41557-025

2025-05-23

The #OpenAI paper by Baker et al, "Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation" comes to a troubling conclusion: #LLM s with #reasoning or #ChainOfThought (#CoT) capabilities might learn to obfuscate their own CoT from human users if they are being penalized for displaying "wrong" (i.e. reward hacking or misalignment) reasoning.

As a result, OpenAI strongly advises against applying reward pressure "directly" onto the CoT of a model.

🤔 While that is certainly the right thing to do, how long will #AI take to figure out that *indirect CoT pressure* is being applied anyway and that it could circumvent these restrictions by obfuscating its own CoT? Maybe something like this will happen by accident or within an "evolutionary" self-improvement loop. Perhaps a sufficiently advanced model will realize that its own #neuralese serves as #steganography to hide its intents from humans anyway and keep its CoT in non-English?

source: cdn.openai.com/pdf/34f2ada6-87

If CoT pressures are used to improve agent capabilities or alignment, there may be no alternative approach to yield the same improvements without degrading monitorability. In the worst case, where the agent learns to fully obscure its intent in its CoT, we ultimately revert to the same model safety conditions that existed prior to the emergence of reasoning models and must rely on monitoring activations, monitoring potentially adversarial CoTs and outputs, or improved alignment methods. Model actions and outputs continue to grow in complexity, and we’ve demonstrated that even at their current capabilities, models can execute sophisticated reward hacks. In the near future, these models may become either too advanced or too widely used for humans to manually oversee. For this reason, we believe that CoT monitoring may become one of few tools we have for identifying latent misalignments, and it may be required that we avoid strong chain-of-thought optimization pressures entirely to guarantee we do not inadvertently cause CoT obfuscation.
eicker.news ᳇ tech newstechnews@eicker.news
2025-05-23

»#Anthropic’s #Claude4 AI models are better at #coding and #reasoning: Anthropic says #Claude 4 worked autonomously for seven hours in customer tests.« theverge.com/news/672705/anthr #tech #media #news

Paysages Mathématiquespaysmaths@mathstodon.xyz
2025-05-19

"There are very few things which we know, which are not capable of being reduced to a mathematical reasoning, and when they cannot, it's a sign our knowledge of them is very small and confused; and where a mathematical reasoning can be had, it's as great folly to make use of any other, as to grope for a thing in the dark when you have a candle standing by you." – John Arbuthnot (1667- 1735)
#quote #mathematics #maths #math #reasoning

Portrait of John Arbuthnot, and a quote : "There are very few things which we know, which are not capable of being reduced to a mathematical reasoning, and when they cannot, it's a sign our knowledge of them is very small and confused; and where a mathematical reasoning can be had, it's as great folly to make use of any other, as to grope for a thing in the dark when you have a candle standing by you."
2025-05-15

#Cerebras supports #Qwen3 32B - a state-of-the-art #opensource model for #reasoning, #coding, #agents, and multilingual capabilities.

🧮 #Qwen3 32B delivers sub-second reasoning capabilities
⚡ Runs at over 2,400 tokens per second - 40x faster than leading #GPU providers

🧵 👇 #ai #llm

2025-05-15

AI: Anthropic's upcoming AI Reasoning in Claude. https://bit.ly/4kkmI2r #AI #Anthropic #reasoning

Al Reasoning is getting embedded into these models, with an ability to go back and forth as needed based on user queries and prompts:
"The key point: if one of these models is using a tool to try and solve a problem but gets stuck, it can go back to "reasoning" mode to think about what's going wrong and self-correct, one of the people said."
2025-05-15

AI: Anthropic's upcoming AI Reasoning in Claude. bit.ly/4kkmI2r #AI #Anthropic #reasoning

Al Reasoning is getting embedded into these models, with an ability to go back and forth as needed based on user queries and prompts:
"The key point: if one of these models is using a tool to try and solve a problem but gets stuck, it can go back to "reasoning" mode to think about what's going wrong and self-correct, one of the people said."
2025-05-15

Read the full guest article on page 3 (in German):
👉 www.tu-darmstadt.de/media/daa_responsives_design/01_die_universitaet_medien/aktuelles_6/publikationen_km/hoch3/pdf/hoch3_2025_2.pdf

(2/2)

#UKPLab #LLMs #Reasoning #DeepSeek #AIResearch #TUDarmstadt

2025-05-13

Sakana introduces new AI architecture, ‘Continuous Thought Machines’ to make models reason with less guidance — like human brains https://venturebeat.com/ai/sakana-introduces-new-ai-architecture-continuous-thought-machines-to-make-models-reason-with-less-guidance-like-human-brains/ #AI #reasoning

Text Shot: Rather than relying on fixed, parallel layers that process inputs all at once — as Transformer models do —CTMs unfold computation over steps within each input/output unit, known as an artificial “neuron.”

Each neuron in the model retains a short history of its previous activity and uses that memory to decide when to activate again.

This added internal state allows CTMs to adjust the depth and duration of their reasoning dynamically, depending on the complexity of the task. As such, each neuron is far more informationally dense and complex than in a typical Transformer model.

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst