118 points by r3tr0 3 days ago | 54 comments
gpderetta 14 hours ago
Of course if the mutex array is doing a linear scan to find the insertion point that would explain the difference but: a) I can't see the code for the alternative and b) there is no reason why the mutex variant can't use a free list.
Remember:
- Lock free doesn't automatically means faster (still it has other properties that might be desirable even if slower)
- Never trust a benchmark you didn't falsify yourself.
[1] when uncontended; when contended cache coherence cost will dominate over everything else, lock-free or not.
bonzini 14 hours ago
gpderetta 14 hours ago
As the benchmark is very dependent on contention, it would give very different results if the the threads are scheduled serially as opposed to running truly concurrently (for example using a spin lock would be awful if running on a single core).
So again, you need to be very careful to understand what you are actually testing.
j_seigh 6 hours ago
A lock-free queue by itself isn't very useful. You need a polling strategy that doesn't involve a busy loop. If you use mutexes and condvars, you've basically turned it into a lock based queue. Eventcounts work much better.
If I run more threads than CPUs and enough work so I get time slice ends, I get about 1160 nsecs avg enq/deq for mutex version, and about 146 nsecs for eventcount version.
Timings will vary based on how man threads you use and cpu affinity that takes your hw thread/core/cache layout into consideration. I have gen 13 i5 that runs this slower than my gen 10 i5 because of the former's efficiency cores even though it is supposedly faster.
And yes, a queue is a poster child for cache contention problems, une enfant terrible. I tried a back off strategy at one point but it didn't help any.
convivialdingo 5 hours ago
I rewrote the entire queue with lock-free CAS to manage insertions/removals on the list and we finally got some better numbers. But not always! We found it worked best either as a single thread, or during massive contention. With a normal load it wasn't really much better.
michaelscott 14 hours ago
sennalen 5 hours ago
scripturial 17 minutes ago
MobiusHorizons 18 hours ago
lesser23 17 hours ago
Other than that I take the other side. I’ve read (and subsequently never finished) dozens of programming books because they are so god awfully boring. This writing style, perhaps dialed back a little, helps keep my interest. I like the feel of a zine where it’s as technical as a professional write up but far less formal.
I often find learning through analogy useful anyway and the humor helps a lot too.
bigstrat2003 18 hours ago
atoav 17 hours ago
The edgy tones sound like from an LLM to me..
Animats 16 hours ago
This has no unsafe code. It's all done with compare and swap. There is locking here, but it's down at the hardware level within the compare and swap instruction. This is cleaner and more portable than relying on cross-CPU ordering of operations.
[1] https://github.com/John-Nagle/rust-vulkan-bindless/blob/main...
IX-103 19 minutes ago
That's probably the most scary sentence in the whole article.