Lmst

"Intelligence is figuring out how the world works rather than waiting for someone to tell you how the world works."

Join us as we hear from Andrew Barto and Richard Sutton, the 2024 #ACMTuringAward recipients as they discuss their work on #ReinforcementLearning.

https://vimeo.com/1085726612

This Python class offers a multiprocessing-powered Pool for efficiently collecting and managing experience replay data in reinforcement learning.

https://github.com/NoteDance/Pool

Discussions: https://discu.eu/q/https://github.com/NoteDance/Pool

#programming #python #reinforcementlearning

What do a baby learning to walk and AlphaGo’s legendary Move 37 have in common?
They both learn by doing — not by being told.
That’s the essence of Reinforcement Learning.

It's great to see that my article on Q-learning & Python agents was helpful to many readers and was featured in this week's Top 5 by Towards Data Science. Thanks! :blobcoffee: And make sure to check out the other four great reads too.

-> https://www.linkedin.com/pulse/whats-our-reading-list-week-towards-data-science-dcihe

#Reinforcementlearning #AI #Python #DataScience #KI #alphago #google #googleai #ArtificialIntelligence

🧠 Can two AI agents form a bond that feels like love?

From shared rewards to partner modeling, discover how machines are showing signs of synthetic affection — and what it means for the future of AI ethics, design, and psychology 🤖💔

👉 Read now:
https://medium.com/@rogt.x1997/do-machines-feel-lonely-exploring-the-ghost-of-affection-in-reinforcement-learning-e2d3f0bb8482

#AIEmotions #ReinforcementLearning #AIEthics #MultiAgentSystems
https://medium.com/@rogt.x1997/do-machines-feel-lonely-exploring-the-ghost-of-affection-in-reinforcement-learning-e2d3f0bb8482

ReasoningGym: Reasoning Environments for RL with Verifiable Rewards

https://arxiv.org/abs/2505.24760

#HackerNews #ReasoningGym #ReinforcementLearning #VerifiableRewards #AIResearch #MachineLearning

Just getting into Reinforcement Learning?
This book helped me a lot. And it's beginner-friendly:
:blobcoffee: Reinforcement Learning: An Introduction by Sutton & Barto
http://incompleteideas.net/book/the-book.html

#ai #ki #artificialintelligence #reinforcementlearning #python #technology #agenticai

[D] My first blog, PPO to GRPO

https://medium.com/@opmyth/from-ppo-to-grpo-1681c837de5f

Discussions: https://discu.eu/q/https://medium.com/%40opmyth/from-ppo-to-grpo-1681c837de5f

#compsci #machinelearning #reinforcementlearning

Reinforcement Learning doesn’t tell you what’s right.
It only tells you how good your choice was.
No feedback on what to do. Only on how it went.

:blobcoffee: Example: A multi-armed bandit (like a slot machine with several levers). You don't know which lever is the best - you can only find out by trying it out. Exploring means giving up a known reward (from exploitation) — in hopes of finding a better one.

This balance between exploration and exploitation is the central dilemma in reinforcement learning.

:blobcoffee: A simple strategy is ε-greedy:
→ In 90% of cases you take the best known action
→ In 10% of cases, you try a different one by chance

In simulations, ε-greedy methods perform better in the long term than pure greed (always take the supposedly best) - because they master the “explore-exploit trade-off”.

#ReinforcementLearning #ML #KI #AI #DataScience #MachineLearning #Datascientist

What does a baby learning to walk have in common with AlphaGo’s Move 37?

Both learn by doing — not by being told.

That’s the essence of Reinforcement Learning.

In my latest article, I explain Q-learning with a bit Python and the world’s simplest game: Tic Tac Toe.

-> No neural nets.
-> Just some simple states, actions, rewards.

The result? A learning agent in under 100 lines of code.

Perfect if you are curious about how RL really works, before diving into more complex projects.

Concepts covered:
:blobcoffee: ε-greedy policy
:blobcoffee: Reward shaping
:blobcoffee: Value estimation
:blobcoffee: Exploration vs. exploitation

Read the full article on Towards Data Science → https://towardsdatascience.com/reinforcement-learning-made-simple-build-a-q-learning-agent-in-python/

#Python #ReinforcementLearning #ML #KI #Technology #AI #AlphaGo #Google #GoogleAI #DataScience #MachineLearning #Coding #Datascientist #programming #data

✨ Domine o Q-Learning com Reinforcement Learning!
📝 Aprenda como o Reinforcement Learning torna a criação de agentes Q-Learning mais simples e eficiente! Descubra os benefícios, passos essenciais e como essa técnica pode revolucionar o seu trabalho com inteligência artificial. Clique no link e mergulhe nesse universo fascinante!
.
.
.#AI #ReinforcementLearning #DeepLearning
https://inkdesign.com.br/reinforcement-learning-simplifica-criacao-de-agente-q-learning/?fsp_sid=42701

Outcome-Based Reinforcement Learning to Predict the Future

https://arxiv.org/abs/2505.17989

#HackerNews #OutcomeBasedReinforcementLearning #FuturePrediction #AIResearch #MachineLearning #ReinforcementLearning

AI Learns by Watching - Sholto & Trenton on Dwarkesh

#generalization #ai #reinforcementlearning

📢 Reinforcement Learning ist zurück!
…sagen viele. Aber die KI-Szene weiß: Es war nie weg.
Was steckt hinter diesem "Comeback"? Und warum feiern wir ständig alte Konzepte als neue Hypes?
🎥 Jetzt reinschauen!
#KünstlicheIntelligenz #ReinforcementLearning #TechDebatte

https://youtube.com/shorts/SkfFjb_NYuM

Quantifying Cognitive Abilities: Numerosity and Evidence Accumulation in Brain Models

#ComputationalPsychiatry #BrainScience #MentalHealthResearch #Neuroscience #CognitiveScience #DecisionMaking #ReinforcementLearning #MentalHealth #BrainModels #Psychology #NeuralSignals #BehavioralScience #ScientificDiscovery

https://youtube.com/shorts/vCkeCnmkysY?feature=share

Alibaba's New ZeroSearch Framework Slashes Training Costs For Search-Enabled AI by 88%

#AI #GenAI #ZeroSearch #AlibabaAI #AITraining #LLMs #ReinforcementLearning #AICostReduction #MachineLearning #OpenSourceAI

https://winbuzzer.com/2025/05/09/alibabas-new-zerosearch-framework-slashes-ai-training-costs-by-88-xcxwbn/

#Promptengineering is crucial for developing #LLM-based apps, but it's often manual & inefficient. PRewrite is an automated method using an LLM trained with #reinforcementlearning to optimize prompts https://arxiv.org/pdf/2401.08189 #RL #AI

Sutton and Barto Book Implementation

https://github.com/ivanbelenky/RL

#HackerNews #SuttonBarto #Book #Implementation #RL #GitHub #ReinforcementLearning

🎨🤖ART: Now you can train your "LLM agents" with Open-source Reinforcement Learning, because training them closed-source would just be too mainstream. Because what the world really needed was more #GitHub #buzzwords and fewer actual results. 🚀💻
https://github.com/OpenPipe/ART #ART #LLM #agents #OpenSource #ReinforcementLearning #HackerNews #ngated

ART – a new open-source RL framework for training agents

https://github.com/OpenPipe/ART

#HackerNews #OpenSource #RLFramework #TrainingAgents #ReinforcementLearning #ART

Training AI to Persuade? - Jeremie & Edouard Harris on JRE

#reinforcementlearning #ai #aipersuasion #aimodels #openai #aiagents

#Reinforcementlearning

Client Info