#atomics

Felix Palmen :freebsd: :c64:zirias@bsd.cafe
2025-06-25

Please help me spread the link to #swad 😎

github.com/Zirias/swad

I really need some users by now, for those two reasons:

* I'm at a point where I fully covered my own needs (the reasons I started coding this), and getting some users is the only way to learn about what other people might need
* The complexity "exploded" after supporting so many OS-specific APIs (like #kqueue, #epoll, #eventfd, #signalfd, #timerfd, #eventports) and several #lockfree implementations based on #atomics while still providing fallbacks for everything that *should* work on any #POSIX systems ... I'm definitely unable at this point to think of every possible edge case and test it. If there are #bugs left (which is somewhat likely), I really need people reporting these to me

Thanks! πŸ™ƒ

Felix Palmen :freebsd: :c64:zirias@bsd.cafe
2025-06-16

Next #swad release will still be a while. 😞

I *thought* I had the version with multiple #reactor #eventloop threads and quite some #lockfree stuff using #atomics finally crash free. I found that, while #valgrind doesn't help much, #clang's #thread #sanitizer is a very helpful debugging tool.

But I tested without #TLS (to be able to handle "massive load" which seemed necessary to trigger some of the more obscure data races). Also without the credential checkers that use child processes. Now I deployed the current state to my prod environment ... and saw a crash there (only after running a load test).

So, back to debugging. I hope the difference is not #TLS. This just doesn't work (for whatever reason) when enabling the address sanitizer, but I didn't check the thread sanitizer yet...

Felix Palmen :freebsd: :c64:zirias@bsd.cafe
2025-06-13

The #lockfree command #queue in #poser (for #swad) is finally fixed!

The original algorithm from [MS96] works fine *only* if the "free" function has some "magic" in place to defer freeing the object until no thread holds a reference any more ... and that magic is, well, left as an exercise to the reader. πŸ™ˆ

Doing more research, I found a few suggestions how to do that "magic", including for example #hazardpointers ... but they're known to cause quite some runtime overhead, so not really an option. I decided to implement some "shared object manager" based on the ideas from [WICBS18], which is kind of a "manually triggered garbage collector" in the end. And hey, it works! πŸ₯³
github.com/Zirias/poser/blob/m

[MS96] dl.acm.org/doi/10.1145/248052.
[WICBS18] cs.rochester.edu/u/scott/paper

#coding #c #c11 #atomics

Felix Palmen :freebsd: :c64:zirias@bsd.cafe
2025-06-11

This redesign of #poser (for #swad) to offer a "multi-reactor" (with multiple #threads running each their own event loop) starts to give me severe headaches.

There is *still* a very rare data #race in the #lockfree #queue. I *think* I can spot it in the pseudo code from the paper I used[1], see screenshot. Have a look at lines E7 and E8. Suppose the thread executing this is suspended after E7 for a "very long time". Now, some dequeue operation from some other thread will eventually dequeue whatever "Q->Tail" was pointing to, and then free it after consumption. Our poor thread resumes, checks the pointer already read in E6 for NULL successfully, and then tries a CAS on tail->next in E9, which is unfortunately inside an object that doesn't exist any more .... If the CAS succeeds because at this memory location happens to be "zero" bytes, we corrupted some random other object that might now reside there. 🀯

Please tell me whether I have an error in my thinking here. Can it be ....? πŸ€”

Meanwhile, after fixing and improving lots of things, I checked the alternative implementation using #mutexes again, and surprise: Although it's still a bit slower, the difference is now very very small. And it has the clear advantage that it never crashes. πŸ™ˆ I'm seriously considering to drop all the lock-free #atomics stuff again and just go with mutexes.

[1] dl.acm.org/doi/10.1145/248052.

Pseudo-code of a lockfree enqueue operation
2025-06-07

C++OnSea 2025 SESSION ANNOUNCEMENT: Beyond Sequential Consistency by Christopher Fretz

cpponsea.uk/2025/session/beyon

Register now at cpponsea.uk/tickets/

#atomics #cplusplus #cpp #threading

Felix Palmen :freebsd: :c64:zirias@bsd.cafe
2025-06-06

I recently took a dive into #C11 #atomics to come up with alternative queue implementations not requiring locking some #mutex.

TBH, I have a hard time understanding the #memory #ordering constraints defined by C11. I mean, I code #assembler on a #mos6502 (for the #c64), so caches, pipelines and all that modern crap is kind of alien rocket science anyways πŸ˜†.

But seriously, they try to abstract from what the hardware provides (different kinds of memory barrier instructions, IMHO somewhat easier to understand), so the compiler can pick the appropriate one depending on the target CPU. But wrapping your head around their definition really hurts the brain πŸ™ˆ.

Yesterday, I found a source telling me that #amd64 (or #x86 in general?) always has strong ordering for reads, so no matter which oderding constraint you put in your atomic_load and friends, the compiler will generate the same code and it will work. Oh boy, how should I ever verify my code works on e.g. aarch64 without owning such hardware?

memory ordering constraints in my lockfree variant of the queue used to schedule commands for a different service thread. Are they correct? πŸ€”
Felix Palmen :freebsd: :c64:zirias@bsd.cafe
2025-06-04

Hm, is #valgrind's #helgrind useless for code using #atomic operations? Example, it complains about this:

==9505== Possible data race during read of size 4 at 0xADD57F4 by thread #14
==9505== Locks held: none
==9505== at 0x23D0F1: PSC_ThreadPool_cancel (threadpool.c:761)
[....]
==9505== This conflicts with a previous write of size 4 by thread #6
==9505== Locks held: none
==9505== at 0x23CDDE: worker (threadpool.c:373)

so, here's threadpool.c:761:

if ((pthrno = atomic_load_explicit(
&job->pthrno, memory_order_consume)) >= 0)

and here's threadpool.c:373:

atomic_store_explicit(&currentJob->pthrno, -1,
memory_order_release);

Ok, I *think* this should be fine? Do I miss something?

(screenshots for readability ...)

#c #coding #c11 #atomics

valgrind outputthreadpool.c:761threadpool.c:373
Felix Palmen :freebsd: :c64:zirias@bsd.cafe
2025-06-04

I now experimented with different ideas how to implement the #lockfree #queue for multiple producers and multiple consumers. Unsurprisingly, some ideas just didn't work. One deadlocked (okaaay ... so it wasn't lockfree) and I eventually gave up trying to understand why.

The "winner" so far is only "almost lockfree", but at least slightly improves performance. Throughput is the same as with the simple locked variant, but average response times are 10 to 20% quicker (although they deviate stronger for whatever reason). Well, that's committed for now:

github.com/Zirias/poser/commit

#C11 #atomics

N-gated Hacker Newsngate
2025-05-30

πŸ€“ Ah yes, the brave soul who attempts to explain 'Atomics and Concurrency' while sounding like they're deciphering the Rosetta Stone of tech. πŸ“œ Don’t worry, just enable the ⚠️ flag and enjoy the thrilling life of a data race spectator! 🎒
redixhumayun.github.io/systems

2025-05-20

C++OnSea 2025 SESSION ANNOUNCEMENT: Beyond Sequential Consistency by Christopher Fretz

cpponsea.uk/2025/session/beyon

Register now at cpponsea.uk/tickets/

#atomics #cplusplus #cpp #threading

Thomas Hunter IItlhunter
2024-09-02

πŸš€ Excited to announce @tlhunter & @bengl as speakers! They'll dive into Node.js multithreading with Atomics, Workers & SharedArrayBuffers. Are the trade-offs worth it? Find out! πŸ“Š
Get tickets: ti.to/nearform/nodeconf-eu-24

CppConCppCon
2024-03-15

We have released a new CppCon 2023 Video!

C++ Memory Model: from C++11 to C++23 – Alex Dathskovsky – CppCon 2023
youtu.be/SVEYNEWZLo4

CppConCppCon
2024-02-22

We have released a new CppCon 2023 Video!

Single Producer Single Consumer Lock-free FIFO From the Ground Up – Charles Frasch – CppCon 2023
youtu.be/K3P_Lmq6pw0

CppConCppCon
2024-02-16

We have released a new CppCon 2023 Video!

Back to Basics: C++ Concurrency – David Olsen – CppCon 2023
youtu.be/8rEGu20Uw4g

CppConCppCon
2024-01-03

We have released a new CppCon 2023 Video!

Lock-free Atomic Shared Pointers Without a Split Reference Count? It Can Be Done! – Daniel Anderson
youtu.be/lNPZV9Iqo3U

2023-11-02

did something change about += becoming more threadsafe between Python 3.7 and Python 3.10? I have some vague memory that it did but can't find any documentation, and running a few test programs seems to suggest so?

#Python #Threads #Atomics

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst