#grokking

Games at Work dot bizgamesatwork_biz
2025-04-14

e509 — Maverick and Marbles

e509 with Michael and Michael - stories and discussion all around , , , generated , , and much more.

gamesatwork.biz/2025/04/14/e50

2025-04-14

e509 — Maverick and Marbles

e509 with Michael and Michael - stories and discussion all around #AI, #LLMs, #llamas, generated #Quake, #grokking, #generalization and much more.

gamesatwork.biz/2025/04/14/e50

Victoria Stuart 🇨🇦 🏳️‍⚧️persagen
2025-01-17

Grokking at Edge of Numerical Stability
arxiv.org/abs/2501.04697
old.reddit.com/r/MachineLearni
en.wikipedia.org/wiki/Grokking

* sudden generalization after prolonged overfitting
* massively overtrained NN can acq. "emergent"/supra performance/unexpected abilities
* unexp./accid. finding
* mechanisms starting to unravel

Grokked Transformers are Implicit Reasoners: Mechanistic Journey to Edge of Generalization
arxiv.org/abs/2405.15071
news.ycombinator.com/item?id=4

Grokking at Edge of Numerical Stability
https://arxiv.org/abs/2501.04697
https://old.reddit.com/r/MachineLearning/comments/1i34keg/grokking_at_the_edge_of_numerical_stability
https://en.wikipedia.org/wiki/Grokking_(machine_learning)

* sudden generalization after prolonged overfitting
* massively overtrained NN can acq. "emergent"/supra preformance: eerie/unexpected capabilities
* unexp./accid. finding
* mechanisms starting to be understood

Grokked Transformers are Implicit Reasoners: Mechanistic Journey to Edge of Generalization
https://arxiv.org/abs/2405.15071

#LLM #ML #grokking #NN #emergence #generalization
Rod2ik 🇪🇺 🇨🇵 🇪🇸 🇺🇦 🇨🇦 🇩🇰 🇬🇱rod2ik
2024-08-03
2024-08-02

'Grokking phase transitions in learning local rules with gradient descent', by Bojan Žunkovič, Enej Ilievski.

jmlr.org/papers/v25/22-1228.ht

#grokking #tensor #models

Simon Lucysimon_lucy
2024-05-22

I could wish the Robert Heinlein Estate suing X for the misuse and misappropriation of 'grok'.

Victoria Stuart 🇨🇦 🏳️‍⚧️persagen
2024-04-28

How Do Machines Grok Data?
Overtrained neural networks discover novel solutions
quantamagazine.org/how-do-mach
news.ycombinator.com/item?id=4

* machine learning: neural network (linear algebra) over data
* train on training data to minimize error to expected result ("memorization")
* test on test data
* overfitting: overtrained on training data, error increases on test data

* h/e massively overtrained LLM discard "memorized" solution, acquire "generalization"capabilities💡

Chuck Darwincdarwin@c.im
2024-04-25

Two years ago, Yuri Burda and Harri Edwards, researchers at the San Francisco–based firm OpenAI, were trying to find out what it would take to get a language model to do basic arithmetic.

They wanted to know how many examples of adding up two numbers the model needed to see before it was able to add up any two numbers they gave it.
At first, things didn’t go too well. The models memorized the sums they saw but failed to solve new ones.
By accident, Burda and Edwards left some of their experiments running far longer than they meant to
—days rather than hours.
The models were shown the example sums over and over again, way past the point when the researchers would otherwise have called it quits.
But when the pair at last came back, they were surprised to find that the experiments had worked.
They’d trained a language model to add two numbers
—it had just taken a lot more time than anybody thought it should.

Curious about what was going on, Burda and Edwards teamed up with colleagues to study the phenomenon.
They found that in certain cases, models could seemingly fail to learn a task
and then all of a sudden just get it, as if a lightbulb had switched on.
This wasn’t how deep learning was supposed to work.
They called the behavior #grokking

technologyreview.com/2024/03/0

Victoria Stuart 🇨🇦 🏳️‍⚧️persagen
2023-09-08

Explaining grokking through circuit efficiency
arxiv.org/abs/2309.02390

* puzzle: NN w. perfect training accuracy, poor generalisation upon further training transition to perfect generalisation
* proposal: grokking occurs when task has generalising & memorising solutions
* generalising learning slower, more efficient

Tero Keski-Valkamatero@rukii.net
2023-08-20

People think #LLM #chatbots are just memorizing facts, and this belief reflects to benchmarks (multiple choice questions about trivia), and to prompt design (zero shot examples instead of explanation).

That's not what they do though. Because of #grokking and the training regime where the network always has to predict the next word where it has never seen the sentence or the document in whole before, it's task is not memorization and has never been.

What it does is understanding the world and everything in it, to be able to predict new sentences in new contexts it has never seen before which are about that world.

The misconception that LLMs are about memorizing facts is also visible in the current branch of research where people try to make LLMs forget specific facts. This misconception is really holding the whole field down.

Casey Primozic / ameoameo@mastodon.ameo.dev
2023-08-07

At long last, the blog post I've been working on for what seems like forever is finished!

cprimozic.net/blog/growing-spa

It's packed with lots of really cool stuff: ML #interpretability, #grokking, #tinygrad, #graphviz, and more

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst