#PPO

2025-10-19

RL (RLM): Разбираемся вместе

Всем привет! Недавно я познакомился с курсом по глубокому обучению с подкреплением от HuggingFace Deep Reinforcement Learning Course и захотел сделать выжимку самого интересного. Эта статья — своего рода шпаргалка по основам Reinforcement Learning (RL) и одному из ключевых алгоритмов — PPO, который лежит в основе тонкой настройки современных LLM (Large Language Models).

habr.com/ru/articles/958062/

#Искуственный_интеллект #Машинное_обучение #Алгоритмы #RLHF #LLM #Большие_языковые_модели #RL #Reinforcement_learning #PPO #Proxi

Edwin G. Jolly (Maybe!) 🎄🎅EdwinG@mstdn.moimeme.ca
2025-09-27

A Vulnerable Sector Check (VSC) pre-employment screening can take over 3 months because of a backlog at the OPP.

cbc.ca/news/canada/toronto/opp
- - -
La vérification des antécédents en vue d’un travail auprès de personnels vulnérables (VATPV) peut prendre plus de 3 mois à cause de retards chez la PPO.

// Article en anglais //

#Ontario #OPP #PPO

2025-06-18

#MedicalInsurance #Medicare #MedicarePlus

Just received noticed from #BlueShield that #UCSF, my medical provider for the last 15 years, is leaving the #BlueShieldOfCA #PPO medical network as of 7/10/2025. ☹️

Just started doing some research on which groups are available where I can find a new PCP & all of the "reviews" for all of the medical groups in my area & beyond are dismal. 🤦‍♂️

That said, I've found that as long as I get a PCP that I get along with & who is responsive to my needs/requests, I'm happy even if the reviews for the group are poor.

So, I may need to try a couple in various groups before I find the PCP that I like.

As the member of a PPO, I don't have to worry all about getting referrals for specialized care but the day-to-day medical care -- labs & prescriptions -- is all I generally need & I just need to find another PCP who is on the same page with me for those things.

Wish me luck! 😉

2025-04-22

Does RL Incentivize Reasoning in LLMs Beyond the Base Model?
limit-of-rlvr.github.io/
#ycombinator #Qwen #Deepseek_R1 #PPO #GRPO #AIME #RLVR #Tsinghua_University

Dr Priya Sammani ( MBBS ,DFM )drpriya@me.dm
2024-12-19

The gentle chords of a familiar song drifted through the living room as sunlight spilled across the kitchen table. I stirred my coffee absently, lost in thought about my neighbor, Emily. #AffordableCareAct #familyhealthplans #financialsecurity #HDHP #healthcoverage #healthinsurance #HMO #insuranceproviders #PPO #USAhealthcare

priya.health/best-health-insur

2024-06-22
Conlloga Muixeranga Castellóconlloga_@mastodont.cat
2024-05-06

Quan estem junts i ens agafem ben fort podem arribar més lluny i tocar el cel amb les mans.
Enhorabona, CE Castelló i afició albinegra!
#PPO

2023-10-02

On my blog: Reinforcement Learning from Human Feedback (RLHF)

#AI #LLM #Tuning #DPO #PPO #TRL

heidloff.net/article/rlhf/

Tero Keski-Valkamatero@rukii.net
2023-08-30

Using clever change of variables trick #DPO is a more efficient drop-in replacement for #PPO in #RLHF.

Using DPO with preference labels from #chatbot panel of judges for virtually embodied agents would be a great way to achieve an unambiguous #AGI.

[2305.18290] Direct Preference Optimization: Your Language Model is Secretly a Reward Model arxiv.org/abs/2305.18290

Greg Arayaaraya@socel.net
2023-07-18

2/2 @Scofisticated The choice is to pay the doc several hundred dollars up front for the visit, then they will provide an invoice that I can then use to pursue reimbursement from my PPO with.

Like I'm going to have any luck getting reimbursed from a health plan that won't even pay doctors in its network. Why do we have health insurance again?

#PPO #PrivateHealthCare

Feynman 🔴feynman@tooted.ca
2023-05-13

Today I’ve started to make an humanoid robot learn to walk by itself. Funny to watch when evaluating the model after a few hours.

#ai #gym #deeplearning #ppo

Keywan Tonekaboniktn@social.heise.de
2023-02-11

Über #ChatGPT spreche ich mit @wstieler (MIT Technology Review) sowie @johoo und Hartmut Gieselmann ( @ct_Magazin ) im #ctuplink 47.0. Welche Anwendungen und Geschäftsmodelle gibt es. Was sind Chancen und Risiken von #KI

youtu.be/kbpBMN8ifZ4

#ChatGPT #GPT3 #OpenAI #AI #ArtificialIntelligence #KI #ML #MachineLearning #Transformer #PPO #NeuronaleNetze #KünstlicheIntelligenz #ctmagazin #uplink

Keywan Tonekaboniktn@social.heise.de
2023-02-09

Details zur Technik hinter #ChatGPT erklärt mir @ct_Magazin Redakteurin Pina Merkert in diesem c't uplink kompakt

youtube.com/watch?v=jcrBBxXK36

Um Anwendung und Auswirkung von #ChatGPT geht es dann am Samstag im #ctuplink 47.0, wo @johoo, @wstieler und Hartmut Gieselmann meine Gäste sind.

#ChatGPT #GPT3 #OpenAI #AI #ArtificialIntelligence #KI #ML #MachineLearning #Transformer #PPO #NeuronaleNetze #KünstlicheIntelligenz #ctmagazin #uplink #uplinkkompakt

2022-11-24

#ppo 🇲🇽 ya están en lo bueno en la SCJN.

No entendí muy bien la posición de la ministra Ríos. Según yo, está de acuerdo con la propuesta del ministro LM Aguilar.🙄🙏

Bueno, y no puedo más con la intervención de la ministra Loretta Ortiz. Me siento engrudo😬🤯

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst