Lmst

I say this as someone who has been working directly in the modern ML space the past 4 years, and who regularly tries out new AI development tools.

I'm not seeing huge productivity gains or anything like 10x speed boosts. In fact, I often find that AI tools turn into a tarpit, making the development work take longer.

I don't have a good intuition for when an AI assistant can be helpful rather than a tarpit.

A parallel I'm seeing in 2025 is to exclaim without evidence things like "Generative AI allows developers to be far more productive", or exclaiming "10x speed boosts", usually from people who don't write software.

Talk to folks who build production systems and you'll get a much more tepid response.

The emperor has no clothes??

10-20 years ago it was fashionable in HPC research papers to assert that "Moore's Law is dead", and then equivocate to mean that clock frequencies and ultimately hardware performance of sequential code was frozen.

Platitudes! None of those things are strictly true. Sequential code runs *waaaay* faster on hardware from 2025 than from years previous.

Oh, one other thing that *has* worked pretty well when I've tried it with LLMs is to spell curl invocations to dynamically validate if a set of credentials is live

Every couple months I try out the latest LLM tools again to see how they do at writing new regex-based secret exposure detection rules for Nosey Parker.

Todays experiments included "agentic" rule authoring using Claude 4 Sonnet and some external tools via MCP, as well as freeform chats with numerous models through Kagi Assistant.

Still hasn't worked! I have not had success getting any of following reliably from these tools:

- good security research on what an API token's format is

- examples of a particular API token

- high-precision regular expressions

The aspect I have seen the best success with is briefly describing possible impact of a particular secret exposure (i.e., what could happen if an attacker got it).

Brad Larsen boosted:

What's everyone's favorite "expect testing" library/tooling [1] for a language that isn't OCaml? Especially interested in:

Racket
Haskell
Python
Julia

[1] see https://blog.janestreet.com/the-joy-of-expect-tests/, https://github.com/janestreet/ppx_expect

@rntz For Rust, I've used Insta for snapshot testing somewhat heavily for integration test-like things in Nosey Parker.

This style of testing is especially useful for things where the spec would be fuzzy or unclear, like the CLI help that is printed. I want to know when it has changed, but I don't want to try writing partial specifications for it.

https://github.com/praetorian-inc/noseyparker

@jerry I was getting Jaquie Lawson animated holiday e-cards from a doppelgänger's elderly parents for _years_. Thanksgiving, Christmas, Halloween, Father's Day, Easter, etc — just about every holiday listed on the calendar.

I responded back a few times to let them know I was not their son, but that didn't work. So I began expecting the cards from them at every holiday, imagining myself as the other Brad Larsen with different parents.

The cards stopped coming a couple years ago...

@adrian I've noticed the same. Several months ago I noticed that just about every page had an extra several hundred ms latency on it than before. Painful to try reading code on GH now

I've been running with Apple Intelligence enabled, trying to give it a fair shake.

I'm not sure all that it is supposed to offer. I don't ever explicitly use its Writing Tools features. The most obvious place where I see it is in its text message and email summaries. These are sometimes funny, but egregiously bungle factual details maybe 10-20% of the time.

Who finds that useful??

@zwarich those papers look very relevant!

@pkhuong @pervognsen say more! Is there a way to use PDAs for fast overapproximation of more general grammars?

@wirepair not quite! My handwavy idea is that given some kind of grammar, you could come up with an overapproximating regular expression (one that matches whenever the grammar matches). Then you could implement the pattern-matching based system using a 2-phase approach: regex matching first, and then *only* parse the inputs that were matched by the regex with the grammar. If the size of input is much larger than the number of matches (almost always the case), it will be a performance win.

@wirepair I don't, but I would be shocked if it parsed at even hundreds of MB/s per core!

(Regex matching with hundreds of patterns can be done at gigabytes/s per core, in contrast)

@codeslack I don't have a good concrete example atm, just some mad scientist-type idea. In Nosey Parker I have often wished for some kind of richer matching language than regexes for detecting certain "generic" or "fuzzy" secrets.

More generally, if regex lowering worked well performance-wise, it could be really useful for matching on code. Think something like semgrep, where the rules involve actual grammar, and the impl matches each rule individually in a loop (the "nested loop join" approach). Regex lowering could conceivably let it use Hyperscan and speed it up 10,000x or more.

@pervognsen of course you could always use `.*`, but that wouldn't be a _useful_ overapproximation

@pervognsen yeah, I was thinking things like this yesterday!

In the context of a rule-based matching system, we would definitely want an overapproximating regex: if the original grammar recognizes an input, then the regex will match it also.

@wirepair just a mad scientist idea atm!

In a rule-based pattern-matching system, it would be nice if a richer language than regular expressions could be used, but it could still execute about as fast as regex matching.

In the context of Nosey Parker, I've written over a hundred regex-based rules to detect secrets. It uses a fork of Hyperscan for its regex matching, and was benchmarked by burntsushi (the ripgrep author) as matching >4GB/s on a single core. Regexes work great here!

However, for "fuzzy" rules in Nosey Parker that require matching the context surrounding a secret in order to keep FPs low, it could be useful if a richer language than true regular expressions were supported.

Are there any known algorithms for converting a PEG or CFG into a regular expression that approximates it?

Such a thing could be useful for efficiently implementing a rich pattern-matching language (running over TBs of input): you could use fast regex matching with the approximations and then follow up with precise parsing.

@regehr @rygorous @steve @ricci @brouhaha @krismicinski @acqrel this was in the context of a large-scale static analysis system, which used 5-byte pointers to fit huge indirect arrays in smaller memory than you would get with regular 64-bit pointers. So I guess at that time we were actually running on 39- or 40-pin CPUs, maxing out at a 512GB–1TB of physical RAM

Client Info