Here's a technique I came up with for busy-waiting in a "nice" way. I'm probably not the first person to think of this, but this was a new idea to me.

Suppose you have a multicore machine and you're running a multithreaded program. You have small critical sections, and you find that busy-waiting on mutexes gives you a speedup over blocking. In pseudocode, your busy-wait loop might look like this.

void busy_wait(mutex& mu) { while (true) { if (mu.try_lock()) { return; // Acquired the lock! } } }

But naive busy-waiting is only beneficial when the load on your system is low. If anything running on the machine (be it your program or a different one) has "useful" work to do, you don't want to waste cores spinning.

You could mitigate this somewhat by inserting a call to sched_yield() inside the busy-wait loop, like so.

void busy_wait_with_yield(mutex& mu) { while (true) { if (mu.try_lock()) { return; } sched_yield(); // Give other threads a chance to run. } }

The sched_yled() tells the operating system to switch to another thread, if one is runnable. It also moves the yielding thread to the end of the run queue, so all other threads will have a chance to run before it runs next. And it's relatively cheap in the case that there's no other thread to switch to.

But this still isn't ideal, for a few reasons.

The approach I was familiar with for solving these problems is to busy-wait for some amount of time. If we haven't acquired the mutex by the end of this period of time, we block.

void busy_wait_then_block(mutex& mu) { let start = now(); while (now - start < THRESHOLD) { if (mu.try_lock()) { return; } } mu.lock(); // If all else fails, block. }

But this still has some problems.

So now comes my idea. Instead of busy-waiting for a set period of time, we busy-wait until sched_yield() is "slow". This indicates that the OS had another thread to run. If so, we immediately block. It looks like this.

void block_if_yield_is_slow(mutex& mu) { while (true) { if (mu.try_lock()) { return; } let start = now(); sched_yield(); if (now() - start > SLOW_YIELD_THRESHOLD) { break; } } mu.lock(); // If all else fails, block. }

This approach worked well for me on a workload of mine with ~64 threads. When I ran on a 64-core machine, it was almost as fast as pure busy-waits with no sched_yield()'s. But whereas the pure busy-wait code was very slow on my 12-core laptop, block_if_yield_is_slow was just as fast as if I had no busy-waits!

To address some possible objections: