remix logo

Hacker Remix

Lock-Free Rust: How to Build a Rollercoaster While It's on Fire

118 points by r3tr0 3 days ago | 54 comments

gpderetta 14 hours ago

The claim that the lock free array is faster then the locked variant is suspicious. The lock free array is performing a CAS for every operation, this is going to dominate[1]. A plain mutex would do two CAS (or just one if it is a spin lock), so the order of magnitude difference is not explainable by the lock free property.

Of course if the mutex array is doing a linear scan to find the insertion point that would explain the difference but: a) I can't see the code for the alternative and b) there is no reason why the mutex variant can't use a free list.

Remember:

- Lock free doesn't automatically means faster (still it has other properties that might be desirable even if slower)

- Never trust a benchmark you didn't falsify yourself.

[1] when uncontended; when contended cache coherence cost will dominate over everything else, lock-free or not.

bonzini 14 hours ago

Yes the code for the alternative is awful. However I tried rewriting it with a better alternative (basically the same as the lock free code, but with a mutex around it) and was still 40% slower. See FixedVec in https://play.rust-lang.org/?version=stable&mode=release&edit...

gpderetta 14 hours ago

Given twice the number of CASs, about twice as slow is what I would expect for the mutex variant when uncontended. I don't know enough rust to fix it myself, but could you try with a spin lock?

As the benchmark is very dependent on contention, it would give very different results if the the threads are scheduled serially as opposed to running truly concurrently (for example using a spin lock would be awful if running on a single core).

So again, you need to be very careful to understand what you are actually testing.

j_seigh 6 hours ago

I did a lock-free ABA-free bounded queue in c++ kind of an exercise. I work mostly with deferred reclamation schemes (e.g. refcounting, quiescent state based reclamation, and epoch based reclamation). A queue requiring deferred reclamation, like the Michael-Scott lock-free queue is going to perform terribly so you go with an array based ring buffer. It uses a double wide CAS to do the insert for the enqueue and a regular CAS to update the tail. Dequeue is just a regular CAS to update the head. That runs about 57 nsecs on my 10th gen i5 for single producer and consumer.

A lock-free queue by itself isn't very useful. You need a polling strategy that doesn't involve a busy loop. If you use mutexes and condvars, you've basically turned it into a lock based queue. Eventcounts work much better.

If I run more threads than CPUs and enough work so I get time slice ends, I get about 1160 nsecs avg enq/deq for mutex version, and about 146 nsecs for eventcount version.

Timings will vary based on how man threads you use and cpu affinity that takes your hw thread/core/cache layout into consideration. I have gen 13 i5 that runs this slower than my gen 10 i5 because of the former's efficiency cores even though it is supposedly faster.

And yes, a queue is a poster child for cache contention problems, une enfant terrible. I tried a back off strategy at one point but it didn't help any.

convivialdingo 5 hours ago

I tried replacing a DMA queue lock with lock-free CAS and it wasn't faster than a mutex or a standard rwlock.

I rewrote the entire queue with lock-free CAS to manage insertions/removals on the list and we finally got some better numbers. But not always! We found it worked best either as a single thread, or during massive contention. With a normal load it wasn't really much better.

michaelscott 14 hours ago

For applications doing extremely high rates of inserts and reads, lock free is definitely superior. In extreme latency sensitive applications like trading platforms (events processing sub 100ms) it's a requirement; locked structures cause bottlenecks at high throughput

sennalen 5 hours ago

The bottleneck is context switching

scripturial 17 minutes ago

So don’t bother to optimize anything?

MobiusHorizons 18 hours ago

I enjoyed the content, but could have done without the constant hyping up of the edginess of lock free data structures. I mean yes, like almost any heavily optimized structure there are trade offs that prevent this optimization from being globally applicable. But also being borderline aroused at the “danger” and rule breaking is tiresome and strikes me as juvenile.

lesser23 17 hours ago

The bullet points and some of the edge definitely smell like LLM assistance.

Other than that I take the other side. I’ve read (and subsequently never finished) dozens of programming books because they are so god awfully boring. This writing style, perhaps dialed back a little, helps keep my interest. I like the feel of a zine where it’s as technical as a professional write up but far less formal.

I often find learning through analogy useful anyway and the humor helps a lot too.

bigstrat2003 18 hours ago

To each their own. I thought it was hilarious and kept the article entertaining throughout, with what would otherwise be a fairly dry subject.

atoav 17 hours ago

It is juvenile, but what do we know? Real Men use after free, so they wouldn't even use Rust to begin with.

The edgy tones sound like from an LLM to me..

Animats 16 hours ago

I've done a little bit of "lock-free" programming in Rust, but it's for very specialized situations.[1] This allocates and releases bits in a bitmap. The bitmap is intended to represent the slots in use in the Vulkan bindless texture index, which resides in the GPU. Can't read that those slots from the CPU side to see if an entry is in use. So in-use slots in that table have to be tracked with an external bitmap.

This has no unsafe code. It's all done with compare and swap. There is locking here, but it's down at the hardware level within the compare and swap instruction. This is cleaner and more portable than relying on cross-CPU ordering of operations.

[1] https://github.com/John-Nagle/rust-vulkan-bindless/blob/main...

IX-103 19 minutes ago

> We don’t need strict ordering here; we’re just reading a number.

That's probably the most scary sentence in the whole article.