2025-06-23

@regehr mucky side effects. If there had been a __builitin_bitpermute(...) kind of intrinsic lying around, we'd probably see a lot of clear-cut bit permutes, but we might instead have a lot of code out there of the form

for (u32 i = 0; i < 64; ++i) {
if (val & perm[1UL << i]) {
// do weird shit
}
}

... that might have had a bit permute buried in there somewhere, but maybe hard to find with automated tools.

2025-06-23

@regehr I like it! I'm curious as to the *how* of slurping in those "lots of software projects" and hope whatever you get from this, you can share (just in case I get a chance to work on my superoptimizer again - it's not an Intel project).

You might need to cast the net fairly wide, since, in the absence of fast and modular primitives for bit permute, you might see a lot of cases where the bit permute is concealed under a pile of ...

2025-06-23

@regehr It's a fine page and nicely presented. As always (and this is being greedy, since it at least covers PEXT/PDEP) there's *one more instruction* that I wish it covered - the mellifluously named GF2P8AFFINEQB, which is something of a mainstay for Weird Bit Tricks on AVX-512 right now.

2025-06-20

@regehr I'm sure I've already rhapsodized to you about the Jack Vance character who is introduced as "being able to perform complicated mathematical calculations in his head and furnish the results in an instant, whether they are right or wrong" ("Ports of Call") but I think of this phrase more and more these days.

2025-06-20

@regehr Knock it off, Hudson.

2025-06-19

@regehr it is strange to walk around in The Matrix - I am probably in those bits of Sydney more frequently than I am, say, wandering around the more iconic/photogenic bits of town.

2025-06-17

@dougall @crystalmoon it's also something that has a considerably chilling effect on anyone who has a job anywhere - having to accept a bunch of weird legals personally is probably impossible (or at least, very inadvisable) for most people with even vaguely related work

2025-06-17

@dougall @crystalmoon It's pretty poor form. Most chipmakers have moved on from this whole "documentation is the crown jewels". Reverse engineers already have this information, ffs, it's not like anyone doing anything nefarious won't just get an account.

2025-06-14

@regehr @harold (note those 63 bytes before and after are once per region, not in any way 'per data item')

2025-06-14

@regehr @harold If I ran the run-time zoo, I would insist that there is 63 sacrificial bytes allocated before and after the heap, stack and global data areas (and I guess tls).

If you have a special magic pointer that's next to a memory region with special semantics ("reading this memory-mapped device launches the nukes") that needs a 'volatile' or something.

2025-06-11

@regehr sorry, you're a melburnian now, I don't make the rules. The next step is for you to acquire a fixation on an AFL team, probably St Kilda due to your "location".

boosted:
2025-06-07

New blog post: From Boolean logic to bitmath and SIMD: transitive closure of tiny graphs

bitmath.blogspot.com/2025/06/f

2025-06-06

@regehr These changes have all been extensively modelled and show good performance improvements individually (this may be a surprising to some long-winded posters who think these ISAs are designed by a combination of vibes and conspiracies 👍 ).

The other change that isn't as heavily discussed is the new conditional instructions (ccmpscc, cfcmovcc, ctestscc, setcc) that allow chaining of conditionals and a limited form of predicated load/store that don't fault if the condition is false.

2025-06-05

@david_chisnall @regehr you seem very confident

2025-06-02

@hjakovel @0xabad1dea the idea you advance is simple, appealing and wrong. The "insanely hard to pass entrance exam" is usually a sign that a company has such broken internal processes that once you're in, it will be impossible to see who is doing good work vs bad. So making hiring weirdly stringent is a magic incantation designed to try to fill a company with awesome people who don't need performance management. Unsurprisingly, this doesn't work.

2025-06-02

@kusuriya @0xabad1dea oh, so much *this*

I've had a couple experiences where, minding my own business and not looking for a job, someone has tried to "head hunt" me to another company - but it turns out the generous "we want to head hunt you" opening means "you can get through our convoluted interview process with only 5.5 interviews, not 7".

2025-06-02

@puppygirlhornypost2 @0xabad1dea @kirakira in my experience, the secret sauce to "how does *anyone* work there given how bad their interview process is" is "nepotism and/or inconsistency". Either some days the process is less bizarre, or good people who are known to the company get to glide through (or get given the exam ahead of time).

2025-06-01

@harold I built a goofy prefix-sum version of this a while ago, it's a good trick.

PSADBW has a bunch of surprise off-label usages (you can also use it to select ANDed-off disjoint bits into a u8 if you for some reason want that result in the bottom of a 64b lane rather than in the k-regs or a GPR).

2025-05-24

@regehr Sadly yes. I'm particularly nervous about getting a board and finding out to use the features I'm interested in I have to do some exotic dance. I want to apt-get or download a standard compiler on the mainstream kernel that shipped with the machine in order to get at the ISAs I'm interested in, not wind up sideloading exotic kernels or building x-compile tool chains.

Hoping the Mediatek stuff at least solves SVE access.

2025-05-23

@regehr I've done a similar survey a while back (also looking for a local ARM machine that *explicitly* supports SVE/SVE2 without having to install "special magic kernel" or whatever). Didn't see anything with RVV that wasn't out of stock, sketchy-looking or marginal. Needs to cook for another year imo.

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst