#RandomSampling

Pustam | पुस्तम | পুস্তম🇳🇵pustam_egr@mathstodon.xyz
2025-03-21

"Random sampling works better than you think: Gemini 1.5 = o1. The secret? Self-verification magically gets easier with scale."

Thinking for longer (e.g. o1) is only one of many axes of test-time computing. In a new Google paper, the authors instead focus on scaling the search axis.

By just randomly sampling 200 responses and self-verifying, Gemini 1.5 (an ancient early 2024 model!) beats o1-Preview and approaches o1. This is without finetuning, RL, or ground-truth verifiers.

"This was surprising: search is bottlenecked by verification, models are notoriously bad at self-verifying (think hallucinations), and self-consistency doesn't scale. The magic is that self-verification naturally becomes easier at scale! You'd expect that picking out a correct solution becomes harder the larger your pool of solutions is, but the opposite is the case!"

Read more: eric-zhao.com/blog/sampling

#Sampling #Random #Randomness #Gemini #RandomSampling #Stats #Statistics

skuaskua
2024-01-26

@kim_harding

can make a matter of .

Sad to read of this death.

I'd like to see more by authorities.

Cafes, restaurants, supermarkets - tests then large fines and for failure.
Hell, given the larcenous nature of too many executives, then jail time for CEOs whose companyies repeatedly fail.

2023-07-17

@Edent This is how the Statistical Society of Australia (SSA) distributes it's four PhD/Masters Top-up #Scholarships each year (statsoc.org.au/top_ups). The application process is not very onerous and there's some stratification by gender. I think these were introduced when @aidybarnett was SSA President. Full disclosure, I am a happy recipient of one of these scholarships. #StatSocAu #RandomSampling

Doc Edward Morbius ⭕​dredmorbius@toot.cat
2023-07-01

More on "UNCLASSIFIED": there are 36,520 of those sites right now. (Despite knowing better I keep diving in and classifying more of them.)

It's not practical to list all of them. But we can randomly sample. And large-sample statistics start to apply at about n=30, so let's just grab 30 of those sites at random using sort -R | head -30:

   1  sfg.io
1 extroverteddeveloper.com
2 letmego.com
1 thestrad.com
2 bombmagazine.org
1 domlaut.com
1 bootstrap.io
1 jumpdriveair.com
2 desmos.com
1 leo32345.com
1 echopen.org
1 schd.ws
1 web3us.com
7 akkartik.name
1 bcardarella.com
1 cancerletter.com
1 platinumgames.com
1 industrytap.com
2 worldoftea.org
1 motion.ai
1 vectorly.io
2 enterprise.google.com
1 lift-heavy.com
1 davidpeter.me
1 panoye.com
3 thestrategybridge.org
2 fontsquirrel.com
1 kettunen.io
1 moogfoundation.org
2 elekslabs.com

That's a few foundations, a few blogs, a corporate site (enterprise.google.com), and something about tea, all with a small number of posts (1--7).

I'm looking at some slightly larger samples (60--100) here on my own system, and can actually make some comparisons across samples (to see how much variance there is) which can give some more information on tuning what I would expect to find under the "UNCLASSIFIED" sites.

Which is one way of using #StatisticalMethods to make estimates where direct measurement or assessment is impractical.

#HackerNewsAnalytics #HackerNews #MediaAnalysis #RandomSampling #Statistics

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst