Lmst

"Random sampling works better than you think: Gemini 1.5 = o1. The secret? Self-verification magically gets easier with scale."

Thinking for longer (e.g. o1) is only one of many axes of test-time computing. In a new Google paper, the authors instead focus on scaling the search axis.

By just randomly sampling 200 responses and self-verifying, Gemini 1.5 (an ancient early 2024 model!) beats o1-Preview and approaches o1. This is without finetuning, RL, or ground-truth verifiers.

"This was surprising: search is bottlenecked by verification, models are notoriously bad at self-verifying (think hallucinations), and self-consistency doesn't scale. The magic is that self-verification naturally becomes easier at scale! You'd expect that picking out a correct solution becomes harder the larger your pool of solutions is, but the opposite is the case!"

#Sampling #Random #Randomness #Gemini #RandomSampling #Stats #Statistics

@kim_harding

#Allergies can make #FoodHandlingRegulations a matter of #LifeOrDeath.

Sad to read of this death.

I'd like to see more #RandomSampling by #PublicHealth authorities.

Cafes, restaurants, supermarkets - tests then large fines and #PublicShaming for failure.
Hell, given the larcenous nature of too many executives, then jail time for CEOs whose companyies repeatedly fail.

@Edent This is how the Statistical Society of Australia (SSA) distributes it's four PhD/Masters Top-up #Scholarships each year (https://statsoc.org.au/top_ups). The application process is not very onerous and there's some stratification by gender. I think these were introduced when @aidybarnett was SSA President. Full disclosure, I am a happy recipient of one of these scholarships. #StatSocAu #RandomSampling

More on "UNCLASSIFIED": there are 36,520 of those sites right now. (Despite knowing better I keep diving in and classifying more of them.)

It's not practical to list all of them. But we can randomly sample. And large-sample statistics start to apply at about n=30, so let's just grab 30 of those sites at random using sort -R | head -30:

   1  sfg.io
   1  extroverteddeveloper.com
   2  letmego.com
   1  thestrad.com
   2  bombmagazine.org
   1  domlaut.com
   1  bootstrap.io
   1  jumpdriveair.com
   2  desmos.com
   1  leo32345.com
   1  echopen.org
   1  schd.ws
   1  web3us.com
   7  akkartik.name
   1  bcardarella.com
   1  cancerletter.com
   1  platinumgames.com
   1  industrytap.com
   2  worldoftea.org
   1  motion.ai
   1  vectorly.io
   2  enterprise.google.com
   1  lift-heavy.com
   1  davidpeter.me
   1  panoye.com
   3  thestrategybridge.org
   2  fontsquirrel.com
   1  kettunen.io
   1  moogfoundation.org
   2  elekslabs.com

That's a few foundations, a few blogs, a corporate site (enterprise.google.com), and something about tea, all with a small number of posts (1--7).

I'm looking at some slightly larger samples (60--100) here on my own system, and can actually make some comparisons across samples (to see how much variance there is) which can give some more information on tuning what I would expect to find under the "UNCLASSIFIED" sites.

Which is one way of using #StatisticalMethods to make estimates where direct measurement or assessment is impractical.

#HackerNewsAnalytics #HackerNews #MediaAnalysis #RandomSampling #Statistics

#RandomSampling

Client Info