"Random sampling works better than you think: Gemini 1.5 = o1. The secret? Self-verification magically gets easier with scale."
Thinking for longer (e.g. o1) is only one of many axes of test-time computing. In a new Google paper, the authors instead focus on scaling the search axis.
By just randomly sampling 200 responses and self-verifying, Gemini 1.5 (an ancient early 2024 model!) beats o1-Preview and approaches o1. This is without finetuning, RL, or ground-truth verifiers.
"This was surprising: search is bottlenecked by verification, models are notoriously bad at self-verifying (think hallucinations), and self-consistency doesn't scale. The magic is that self-verification naturally becomes easier at scale! You'd expect that picking out a correct solution becomes harder the larger your pool of solutions is, but the opposite is the case!"
Read more: https://eric-zhao.com/blog/sampling
#Sampling #Random #Randomness #Gemini #RandomSampling #Stats #Statistics