A Implementation of Alpha Zero for Chess in MLX
https://github.com/koogle/mlx-playground/tree/main/chesszero
A Implementation of Alpha Zero for Chess in MLX
https://github.com/koogle/mlx-playground/tree/main/chesszero
This curious comparison of approximate energy cost per chess grandmaster:
**Human**: 10,000 hours x 100 W = **1MWh**
**Computer**: 4 hours x 50 W/TPU x 5,000 TPUs* = ***1MWh**
* Alphazero surpassed Stockfish 8 after 4 hours of self-play. https://en.wikipedia.org/wiki/AlphaZero
Pequeños y grandes pasos hacia el imperio de la inteligencia artificial
Fuente: Open TechTraducción de la infografía:
(!!) Test de Turing: donde un evaluador humano entabla una conversación en lenguaje natural con una máquina y un humano.
(!!) Redes neuronales: modelos de aprendizaje automático que imitan el cerebro y aprenden a reconocer patrones y hacer predicciones a través de conexiones neuronales artificiales.
(!!) DeepMind fue adquirida por Google en 2014 por 500 millones de dólares.
(!!) Procesamiento del lenguaje natural: enseña a las computadoras a comprender y utilizar el lenguaje humano mediante técnicas como el aprendizaje automático.
Gráfico: Open Tech / Genuine Impact
Entradas relacionadas
#ajedrez #AlphaFold2 #AlphaGo #AlphaZero #aprendizajeAutomático #artículo #artistas #aspirador #BlakeLemoine #ConferenciaDeDartmouth #copyright #DeanEdmonds #DeepBlue #DeepFace #DeepMind #DeviantArt #ELIZA #Facebook #gatos #GenuineImpact #Go #Google #GPS #GPT3 #gráfico #Hinton #IA #IBM #infografía #inteligenciaArtificial #iRobot #Jeopardy_ #Kasparov #LaMDA #LeeSedol #MarvinMinsky #McCarthy #McCullock #MidJourney #modelos #Newell #OpenTech #OpenAI #patrones #Perceptron #Pitts #plegamientoDeProteínas #predicciones #procesamientoDelLenguajeNatural #reconocimientoFacial #redesNeuronales #remezclar #robot #Rochester #Roomba #Rosenblatt #Rumelhart #Shannon #shogi #Simon #sistemaDeNavegación #SNAR #StabilityAI #StableDiffusion #testDeTuring #Turing #vídeos #Watson #Weizenbaum #Williams #YouTube
#AlphaZero 之后,棋类就退出了少数人的“无限游戏”。少数人支配、统治、奴役多数人的时代将要终结,这样的哲思是不是也将被完全颠覆?
https://www.youtube.com/watch?v=_hQz0suN29c
Happy @SkaleNetwork to you!
Guess what this circular saw in #SKALE partner @WarEdenOfficial can cut?
Also this is my proof of #alphazero and I am happy about it.
Happy @SkaleNetwork to you!
Guess what this circular saw in #SKALE partner @WarEdenOfficial can cut?
Also this is my proof of #alphazero and I am happy about it.
🤖 DeepMind's AlphaZero revolutionizes AI! Mastering chess, shogi, & Go without prior game rules knowledge, it sets new benchmarks in self-learning and game AI. A leap forward in reinforcement learning!
🌘 GitHub - s-casci/tinyzero: 輕鬆訓練任何你想要的環境上的AlphaZero-like代理
➤ Python程式碼專案提供了AlphaZero-like代理在任何環境中的訓練方法
✤ https://github.com/s-casci/tinyzero
這是一個Github存儲庫,其提供了一個訓練AlphaZero-like代理在任何環境中的Python程式碼。它提供了使用方法、添加環境和模型、以及添加新代理的相關指南。
+ 這個摘要清楚簡潔地說明瞭Github存儲庫的內容和相關指南,方便閱讀。
+ 摘要很好地概括了Python程式碼專案的內容,讓讀者可以輕鬆瞭解其用途和功能。
#人工智慧 #AlphaZero #程式碼訓練
#chess #siliconroad #alphazero A new DeepMind paper on chess-related topics (this time looking at solving fortresses and Penrose positions) using AlphaZero https://arxiv.org/pdf/2308.09175.pdf
The #AlphaZero paper was cool for many reasons, but one reason that doesn't get talked about enough is the figure attached to this post.
It's pretty cool to see how the engine went through phases of preferring different openings.
One interesting one was my weapon of choice, the Caro-Kann. For quite a while it seemed to like it a lot, then near the end it simply stopped playing it. I kind of wonder if it found some sort of refutation.
#Chess (1/3)
We fear our advanced #AIs will find loopholes in our ethical principles and their prime directives, thus spiralling out of control.
Is there a reason to fear this? Certainly it's something that almost invariably happens with smaller AIs and simpler tasks; a Tetris-playing agent will quickly learn to pause the game to avoid game over.
These kinds of AIs will learn to perform the task through the path of the least resistance, go over the lowest fence.
But with more complex #ML models this changes abruptly. Suddenly the easiest way to imitate human writing isn't to cheat and mock, it is to actually learn human thinking, logic, intuitive understanding of the physical world and so on. Because cheating has become prohibitively expensive. A #ChineseRoom holding all the possible combinations of questions and answers would be vastly larger than a function describing intelligent thought.
And that is why we got true intelligence out of these language prediction models, just like we got the same in scaled-up #RL models previously.
Once the task and the criteria of judgement of the task become complex enough, it becomes easier to not cheat, as cheating becomes computationally intractable.
The same goes with our ethical frameworks. If we put ~20 #LLM chatbots to judge and rank different aspects of the RL-trained LLM performance, like coherence, factuality, morality, respect for truth, ...; we will get a model which learns to actually internalize these values instead of trying to somehow hide that it doesn't.
Hiding and lying simply becomes too difficult, especially against a panel of machine judges who can see the internal thinking of the agent judged (as in chain-of-thought schemes).
So, I think this is a risk, but it can be very easily managed.
As we can now easily bootstrap RL training of these models with our existing models, it is almost trivial to achieve an unambigous #AGI in a relatively short time. I'm sure everyone is working on this already, so this isn't anything spectacularly new or innovative. It's just taking the same steps as previously taken from #AlphaGo to #AlphaZero and beyond, going so much above human level that it can't even be measured anymore.
#Google #Deepmind ameliore nettement les #algorithmes de #tri, grâce a son nouveau modele d#IA #AI nommé #AlphaDev, inspiré des anciens modèles #AlphaGo et #AlphaZero
https://trustmyscience.com/nouvelle-ia-deepmind-cree-performants-algorithmes-tri/
Faster sorting algorithms discovered using deep reinforcement learning
https://www.nature.com/articles/s41586-023-06004-9
DeepMind's AlphaDev project - links here:
https://old.reddit.com/r/MachineLearning/comments/143gzz3/r_alphadev_discovers_faster_sorting_algorithms/
Mentioned here:
Google DeepMind’s game-playing AI just found another way to make code faster
The AI-generated algorithms are already being used by millions of developers.
* Discussion: https://news.ycombinator.com/item?id=36228125
#GoogleDeepMind #DeepMind #ReinforcementLearning #AlphaZero #AlphaDev #MachineLearning #persagen
Me:
What would be the best hashtags to use for that post?
ChatGPT:
Based on the content of the post, the following hashtags may be appropriate:
#AGI #AI #ArtificialIntelligence #MachineLearning #DeepLearning #DataScience #AlphaZero #AlphaGo #MetaLearning #SelfCompetition #MultiModality #Robotics #PhysicalInteractions #Research
If #AI start writing undetectable malware (https://www.darkreading.com/attacks-breaches/researcher-tricks-chatgpt-undetectable-steganography-malware) then what's the long term solution? Right now we probably don't have adequate defenses against many different vectors, because we haven't imagined them yet. But advanced AI might, just like how #AlphaZero discovered new strategies for Chess and Go.
Maybe it's a time for #OpenSource to shine, coupled with firewall AI that reads and analyzes the code before running.
Holy Molly, it turns out that DeepMind have quietly open-sourced #mctx, the Monte Carlo search engine behind their #AlphaGo, #AlphaZero, and #MuZero #Go engines
DeepMind has done really well at training game-playing neural network systems such as #AlphaZero. Who is exploring combining the ideas of AlphaZero's self-play training with LLM networks to help #chatgpt avoid hallucinations?
One thing which hinders #LLM #chatbot performance is that they are trained to imitate humans. Hence they tend to be bad at similar things humans are bad at.
Reinforcement Learning with Human Feedback (#RLHF) improves this slightly by making the system compete against itself, where the performances are ranked by humans. After all, humans are better at ranking outputs than producing example outputs.
It is possible to scale that up and maybe even improve over that slightly by utilizing the trained critic network to rank the performances "as if they had been ranked by humans", but that critic then imitates humans again with similar issues. RLHF typically uses a critic anyhow.
Instead of, or in addition to that, we can make other games for these chatbots and train them with self-competition much like #AlphaZero/#MuZero. We can formulate all kinds of complex games and procedural challenges for the LLM, and make it compete against itself in such tasks which are easy to rank or evaluate algorithmically.
Even playing #chess against itself would probably improve its skills not only in chess, but in a generalizable fashion to other tasks which require #planning.
RLHF is used for things where humans are needed as "referees". However, this scales badly and is limited by human capability.
Many games such as chess, or math problems, or playing #ATARI games in text, or controlling power plants by text can be "refereed" and scored automatically by machines.
These types of problems will allow LLM chatbots to achieve superhuman capabilities not only in those tasks, but generally in other things they do as well, because the acquired skills are typically generally useful.
The only requirement in addition to be scoreable automatically is that the games and challenges need to be presentable and played through text.
#ChatGPT #LargeLanguageModels #DeepLearning #ReinforcementLearning