#AlphaZero

safest_integersafest_integer
2025-03-27

This curious comparison of approximate energy cost per chess grandmaster:

**Human**: 10,000 hours x 100 W = **1MWh**
**Computer**: 4 hours x 50 W/TPU x 5,000 TPUs* = ***1MWh**

* Alphazero surpassed Stockfish 8 after 4 hours of self-play. en.wikipedia.org/wiki/AlphaZero

Pequeños y grandes pasos hacia el imperio de la inteligencia artificial

Fuente: Open Tech

Traducción de la infografía:

  • 1943 – McCullock y Pitts publican un artículo titulado Un cálculo lógico de ideas inmanentes en la actividad nerviosa, en el que proponen las bases para las redes neuronales.
  • 1950 – Turing publica Computing Machinery and Intelligence, proponiendo el Test de Turing como forma de medir la capacidad de una máquina.
  • 1951 – Marvin Minsky y Dean Edmonds construyen SNAR, la primera computadora de red neuronal.
  • 1956 – Se celebra la Conferencia de Dartmouth (organizada por McCarthy, Minsky, Rochester y Shannon), que marca el nacimiento de la IA como campo de estudio.
  • 1957 – Rosenblatt desarrolla el Perceptrón: la primera red neuronal artificial capaz de aprender.

(!!) Test de Turing: donde un evaluador humano entabla una conversación en lenguaje natural con una máquina y un humano.

  • 1965 – Weizenbaum desarrolla ELIZA: un programa de procesamiento del lenguaje natural que simula una conversación.
  • 1967 – Newell y Simon desarrollan el Solucionador General de Problemas (GPS), uno de los primeros programas de IA que demuestra una capacidad de resolución de problemas similar a la humana.
  • 1974 – Comienza el primer invierno de la IA, marcado por una disminución de la financiación y del interés en la investigación en IA debido a expectativas poco realistas y a un progreso limitado.
  • 1980 – Los sistemas expertos ganan popularidad y las empresas los utilizan para realizar previsiones financieras y diagnósticos médicos.
  • 1986 – Hinton, Rumelhart y Williams publican Aprendizaje de representaciones mediante retropropagación de errores, que permite entrenar redes neuronales mucho más profundas.

(!!) Redes neuronales: modelos de aprendizaje automático que imitan el cerebro y aprenden a reconocer patrones y hacer predicciones a través de conexiones neuronales artificiales.

  • 1997 – Deep Blue de IBM derrota al campeón mundial de ajedrez Kasparov, siendo la primera vez que una computadora vence a un campeón mundial en un juego complejo.
  • 2002 – iRobot presenta Roomba, el primer robot aspirador doméstico producido en serie con un sistema de navegación impulsado por IA.
  • 2011 – Watson de IBM derrota a dos ex campeones de Jeopardy!.
  • 2012 – La startup de inteligencia artificial DeepMind desarrolla una red neuronal profunda que puede reconocer gatos en vídeos de YouTube.
  • 2014 – Facebook crea DeepFace, un sistema de reconocimiento facial que puede reconocer rostros con una precisión casi humana.

(!!) DeepMind fue adquirida por Google en 2014 por 500 millones de dólares.

  • 2015 – AlphaGo, desarrollado por DeepMind, derrota al campeón mundial Lee Sedol en el juego de Go.
  • 2017 – AlphaZero de Google derrota a los mejores motores de ajedrez y shogi del mundo en una serie de partidas.
  • 2020 – OpenAI lanza GPT-3, lo que marca un avance significativo en el procesamiento del lenguaje natural.

(!!) Procesamiento del lenguaje natural: enseña a las computadoras a comprender y utilizar el lenguaje humano mediante técnicas como el aprendizaje automático.

  • 2021 – AlphaFold2 de DeepMind resuelve el problema del plegamiento de proteínas, allanando el camino para nuevos descubrimientos de fármacos y avances médicos.
  • 2022 – Google despide al ingeniero Blake Lemoine por sus afirmaciones de que el modelo de lenguaje para aplicaciones de diálogo (LaMDA) de Google era sensible.
  • 2023 – Artistas presentaron una demanda colectiva contra Stability AI, DeviantArt y Mid-journey por usar Stable Diffusion para remezclar las obras protegidas por derechos de autor de millones de artistas.

Gráfico: Open Tech / Genuine Impact

Entradas relacionadas

#ajedrez #AlphaFold2 #AlphaGo #AlphaZero #aprendizajeAutomático #artículo #artistas #aspirador #BlakeLemoine #ConferenciaDeDartmouth #copyright #DeanEdmonds #DeepBlue #DeepFace #DeepMind #DeviantArt #ELIZA #Facebook #gatos #GenuineImpact #Go #Google #GPS #GPT3 #gráfico #Hinton #IA #IBM #infografía #inteligenciaArtificial #iRobot #Jeopardy_ #Kasparov #LaMDA #LeeSedol #MarvinMinsky #McCarthy #McCullock #MidJourney #modelos #Newell #OpenTech #OpenAI #patrones #Perceptron #Pitts #plegamientoDeProteínas #predicciones #procesamientoDelLenguajeNatural #reconocimientoFacial #redesNeuronales #remezclar #robot #Rochester #Roomba #Rosenblatt #Rumelhart #Shannon #shogi #Simon #sistemaDeNavegación #SNAR #StabilityAI #StableDiffusion #testDeTuring #Turing #vídeos #Watson #Weizenbaum #Williams #YouTube

S C-L 🏳️‍🌈🏳️‍⚧️hypathiasecho
2024-11-30
Derek.DAism Zhouzhous98@daotodon.me
2024-11-03

#AlphaZero 之后,棋类就退出了少数人的“无限游戏”。少数人支配、统治、奴役多数人的时代将要终结,这样的哲思是不是也将被完全颠覆?
youtube.com/watch?v=_hQz0suN29

2024-08-17
flensmann.ethFlensmann
2024-03-15

Happy @SkaleNetwork to you!

Guess what this circular saw in partner @WarEdenOfficial can cut?

Also this is my proof of and I am happy about it.

$SKL

flensmann.ethFlensmann
2024-03-14

Happy @SkaleNetwork to you!

Guess what this circular saw in partner @WarEdenOfficial can cut?

Also this is my proof of and I am happy about it.

$SKL

thehardnewsdailyThehardnewsdaily
2024-03-01

🤖 DeepMind's AlphaZero revolutionizes AI! Mastering chess, shogi, & Go without prior game rules knowledge, it sets new benchmarks in self-learning and game AI. A leap forward in reinforcement learning!

thehardnewsdaily.com/exploring

GripNewsGripNews
2023-12-20

🌘 GitHub - s-casci/tinyzero: 輕鬆訓練任何你想要的環境上的AlphaZero-like代理
➤ Python程式碼專案提供了AlphaZero-like代理在任何環境中的訓練方法
github.com/s-casci/tinyzero
這是一個Github存儲庫,其提供了一個訓練AlphaZero-like代理在任何環境中的Python程式碼。它提供了使用方法、添加環境和模型、以及添加新代理的相關指南。
+ 這個摘要清楚簡潔地說明瞭Github存儲庫的內容和相關指南,方便閱讀。
+ 摘要很好地概括了Python程式碼專案的內容,讓讀者可以輕鬆瞭解其用途和功能。

2023-08-29

#chess #siliconroad #alphazero A new DeepMind paper on chess-related topics (this time looking at solving fortresses and Penrose positions) using AlphaZero arxiv.org/pdf/2308.09175.pdf

2023-07-24

The #AlphaZero paper was cool for many reasons, but one reason that doesn't get talked about enough is the figure attached to this post.

It's pretty cool to see how the engine went through phases of preferring different openings.

One interesting one was my weapon of choice, the Caro-Kann. For quite a while it seemed to like it a lot, then near the end it simply stopped playing it. I kind of wonder if it found some sort of refutation.

#Chess (1/3)

Tero Keski-Valkamatero@rukii.net
2023-06-26

We fear our advanced #AIs will find loopholes in our ethical principles and their prime directives, thus spiralling out of control.

Is there a reason to fear this? Certainly it's something that almost invariably happens with smaller AIs and simpler tasks; a Tetris-playing agent will quickly learn to pause the game to avoid game over.

These kinds of AIs will learn to perform the task through the path of the least resistance, go over the lowest fence.

But with more complex #ML models this changes abruptly. Suddenly the easiest way to imitate human writing isn't to cheat and mock, it is to actually learn human thinking, logic, intuitive understanding of the physical world and so on. Because cheating has become prohibitively expensive. A #ChineseRoom holding all the possible combinations of questions and answers would be vastly larger than a function describing intelligent thought.

And that is why we got true intelligence out of these language prediction models, just like we got the same in scaled-up #RL models previously.

Once the task and the criteria of judgement of the task become complex enough, it becomes easier to not cheat, as cheating becomes computationally intractable.

The same goes with our ethical frameworks. If we put ~20 #LLM chatbots to judge and rank different aspects of the RL-trained LLM performance, like coherence, factuality, morality, respect for truth, ...; we will get a model which learns to actually internalize these values instead of trying to somehow hide that it doesn't.

Hiding and lying simply becomes too difficult, especially against a panel of machine judges who can see the internal thinking of the agent judged (as in chain-of-thought schemes).

So, I think this is a risk, but it can be very easily managed.

As we can now easily bootstrap RL training of these models with our existing models, it is almost trivial to achieve an unambigous #AGI in a relatively short time. I'm sure everyone is working on this already, so this isn't anything spectacularly new or innovative. It's just taking the same steps as previously taken from #AlphaGo to #AlphaZero and beyond, going so much above human level that it can't even be measured anymore.

Rod2ik 🇪🇺 🇨🇵 🇪🇸 🇺🇦 🇨🇦 🇩🇰 🇬🇱rod2ik
2023-06-11

ameliore nettement les de , grâce a son nouveau modele d#IA nommé , inspiré des anciens modèles et

trustmyscience.com/nouvelle-ia

Victoria Stuart 🇨🇦 🏳️‍⚧️persagen
2023-06-07

Faster sorting algorithms discovered using deep reinforcement learning
nature.com/articles/s41586-023

DeepMind's AlphaDev project - links here:
old.reddit.com/r/MachineLearni

Mentioned here:
Google DeepMind’s game-playing AI just found another way to make code faster
The AI-generated algorithms are already being used by millions of developers.

* Discussion: news.ycombinator.com/item?id=3

technologyreview.com/2023/06/0

Tero Keski-Valkamatero@rukii.net
2023-04-14

Me:
What would be the best hashtags to use for that post?

ChatGPT:
Based on the content of the post, the following hashtags may be appropriate:

#AGI #AI #ArtificialIntelligence #MachineLearning #DeepLearning #DataScience #AlphaZero #AlphaGo #MetaLearning #SelfCompetition #MultiModality #Robotics #PhysicalInteractions #Research

aebrer - Andrew E. Breretonaebrer@genart.social
2023-04-06

If #AI start writing undetectable malware (darkreading.com/attacks-breach) then what's the long term solution? Right now we probably don't have adequate defenses against many different vectors, because we haven't imagined them yet. But advanced AI might, just like how #AlphaZero discovered new strategies for Chess and Go.

Maybe it's a time for #OpenSource to shine, coupled with firewall AI that reads and analyzes the code before running.

#security #programming

(((o))) Acoustic Mirroracousticmirror@post.lurk.org
2023-03-28

Holy Molly, it turns out that DeepMind have quietly open-sourced #mctx, the Monte Carlo search engine behind their #AlphaGo, #AlphaZero, and #MuZero #Go engines

#go #baduk #weiqi

github.com/deepmind/mctx

2023-03-26

DeepMind has done really well at training game-playing neural network systems such as #AlphaZero. Who is exploring combining the ideas of AlphaZero's self-play training with LLM networks to help #chatgpt avoid hallucinations?

Tero Keski-Valkamatero@rukii.net
2023-03-16

One thing which hinders #LLM #chatbot performance is that they are trained to imitate humans. Hence they tend to be bad at similar things humans are bad at.

Reinforcement Learning with Human Feedback (#RLHF) improves this slightly by making the system compete against itself, where the performances are ranked by humans. After all, humans are better at ranking outputs than producing example outputs.

It is possible to scale that up and maybe even improve over that slightly by utilizing the trained critic network to rank the performances "as if they had been ranked by humans", but that critic then imitates humans again with similar issues. RLHF typically uses a critic anyhow.

Instead of, or in addition to that, we can make other games for these chatbots and train them with self-competition much like #AlphaZero/#MuZero. We can formulate all kinds of complex games and procedural challenges for the LLM, and make it compete against itself in such tasks which are easy to rank or evaluate algorithmically.

Even playing #chess against itself would probably improve its skills not only in chess, but in a generalizable fashion to other tasks which require #planning.

RLHF is used for things where humans are needed as "referees". However, this scales badly and is limited by human capability.

Many games such as chess, or math problems, or playing #ATARI games in text, or controlling power plants by text can be "refereed" and scored automatically by machines.

These types of problems will allow LLM chatbots to achieve superhuman capabilities not only in those tasks, but generally in other things they do as well, because the acquired skills are typically generally useful.

The only requirement in addition to be scoreable automatically is that the games and challenges need to be presentable and played through text.

#ChatGPT #LargeLanguageModels #DeepLearning #ReinforcementLearning

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst