Lmst

e509 — Maverick and Marbles

e509 with Michael and Michael – stories and discussion all around #AI, #LLMs, #llamas, generated #Quake, #grokking, #generalization and much more.

https://media.blubrry.com/gamesatwork/op3.dev/e,pg=6e00562f-0386-5985-9c2c-26822923720d/gamesatwork.biz/wp-content/uploads/2025/04/E509.mp3

Podcast: Play in new window | Download (Duration: 32:10 — 44.8MB) | Embed

Share this:

https://gamesatwork.biz/2025/04/14/e509-maverick-and-marbles/

e509 — Maverick and Marbles

e509 with Michael and Michael - stories and discussion all around #AI, #LLMs, #llamas, generated #Quake, #grokking, #generalization and much more.

https://gamesatwork.biz/2025/04/14/e509-maverick-and-marbles/

e509 — Maverick and Marbles

e509 with Michael and Michael - stories and discussion all around #AI, #LLMs, #llamas, generated #Quake, #grokking, #generalization and much more.

https://gamesatwork.biz/2025/04/14/e509-maverick-and-marbles/

Grokking at Edge of Numerical Stability
https://arxiv.org/abs/2501.04697
https://old.reddit.com/r/MachineLearning/comments/1i34keg/grokking_at_the_edge_of_numerical_stability
https://en.wikipedia.org/wiki/Grokking_(machine_learning)

* sudden generalization after prolonged overfitting
* massively overtrained NN can acq. "emergent"/supra performance/unexpected abilities
* unexp./accid. finding
* mechanisms starting to unravel

Grokked Transformers are Implicit Reasoners: Mechanistic Journey to Edge of Generalization
https://arxiv.org/abs/2405.15071
https://news.ycombinator.com/item?id=40495149

#LLM #ML #grokking #NN #emergence #generalization

Grokking at Edge of Numerical Stability
https://arxiv.org/abs/2501.04697
https://old.reddit.com/r/MachineLearning/comments/1i34keg/grokking_at_the_edge_of_numerical_stability
https://en.wikipedia.org/wiki/Grokking_(machine_learning)

* sudden generalization after prolonged overfitting
* massively overtrained NN can acq. "emergent"/supra preformance: eerie/unexpected capabilities
* unexp./accid. finding
* mechanisms starting to be understood

Grokked Transformers are Implicit Reasoners: Mechanistic Journey to Edge of Generalization
https://arxiv.org/abs/2405.15071

#LLM #ML #grokking #NN #emergence #generalization

Le #grokking : Les #chercheurs ont identifié un phénomène étrange : après une longue période d' #apprentissage #infructueux , l' #intelligence #artificielle #IA #AI donne soudain des résultats.

https://www.sciencesetavenir.fr/high-tech/intelligence-artificielle/pourquoi-l-ia-generative-devient-elle-efficace-d-un-coup_179239

'Grokking phase transitions in learning local rules with gradient descent', by Bojan Žunkovič, Enej Ilievski.

http://jmlr.org/papers/v25/22-1228.html

#grokking #tensor #models

I could wish the Robert Heinlein Estate suing X for the misuse and misappropriation of 'grok'.

#Grokking #Copyright #X

How Do Machines Grok Data?
Overtrained neural networks discover novel solutions
https://www.quantamagazine.org/how-do-machines-grok-data-20240412
https://news.ycombinator.com/item?id=40020702

* machine learning: neural network (linear algebra) over data
* train on training data to minimize error to expected result ("memorization")
* test on test data
* overfitting: overtrained on training data, error increases on test data

* h/e massively overtrained LLM discard "memorized" solution, acquire "generalization"capabilities💡

#LLM #ML #grokking #NN #emergence

Two years ago, Yuri Burda and Harri Edwards, researchers at the San Francisco–based firm OpenAI, were trying to find out what it would take to get a language model to do basic arithmetic.

They wanted to know how many examples of adding up two numbers the model needed to see before it was able to add up any two numbers they gave it.
At first, things didn’t go too well. The models memorized the sums they saw but failed to solve new ones.
By accident, Burda and Edwards left some of their experiments running far longer than they meant to
—days rather than hours.
The models were shown the example sums over and over again, way past the point when the researchers would otherwise have called it quits.
But when the pair at last came back, they were surprised to find that the experiments had worked.
They’d trained a language model to add two numbers
—it had just taken a lot more time than anybody thought it should.

Curious about what was going on, Burda and Edwards teamed up with colleagues to study the phenomenon.
They found that in certain cases, models could seemingly fail to learn a task
and then all of a sudden just get it, as if a lightbulb had switched on.
This wasn’t how deep learning was supposed to work.
They called the behavior #grokking

https://www.technologyreview.com/2024/03/04/1089403/large-language-models-amazing-but-nobody-knows-why/

Google DeepMind Forscher entschlüsseln das Grokking - Neuronale Netze besser verstehen!

#NeuronaleNetzwerke #Grokking #DeepMind #KünstlicheIntelligenz #MachineLearning #Generalisierung #Memorisation #Datensatzgröße #Effizienz #Forschung

https://kinews24.de/google-deepmind-grokking-forscher-entraeseln-grokking-in-neuronalen-netzen

Explaining grokking through circuit efficiency
https://arxiv.org/abs/2309.02390

* puzzle: NN w. perfect training accuracy, poor generalisation upon further training transition to perfect generalisation
* proposal: grokking occurs when task has generalising & memorising solutions
* generalising learning slower, more efficient

#ML #MachineLearning #NeuralNetworks #grokking

People think #LLM #chatbots are just memorizing facts, and this belief reflects to benchmarks (multiple choice questions about trivia), and to prompt design (zero shot examples instead of explanation).

That's not what they do though. Because of #grokking and the training regime where the network always has to predict the next word where it has never seen the sentence or the document in whole before, it's task is not memorization and has never been.

What it does is understanding the world and everything in it, to be able to predict new sentences in new contexts it has never seen before which are about that world.

The misconception that LLMs are about memorizing facts is also visible in the current branch of research where people try to make LLMs forget specific facts. This misconception is really holding the whole field down.

At long last, the blog post I've been working on for what seems like forever is finished!

https://cprimozic.net/blog/growing-sparse-computational-graphs-with-rnns/

It's packed with lots of really cool stuff: ML #interpretability, #grokking, #tinygrad, #graphviz, and more

#grokking

Client Info