This PR reimplements conditional variable with two different variants: - futex-based shared condvar with atomic counter for waiters - queue-based private condvar Notice that thread-local queue node cannot be reliably accessed in shared processes, so we cannot use a unified implementation in this case. POSIX.1-2024 (Issue 8) added atomicity conditions to conditional variable: - The `pthread_cond_broadcast()` function shall, **as a single atomic operation**, determine which threads, if any, are blocked on the specified condition variable cond and unblock all of these threads. - The `pthread_cond_signal()` function shall, as a **single atomic operation**, determine which threads, if any, are blocked on the specified condition variable cond and unblock at least one of these threads. This means that threads parked after a condvar signal event shall not steal signals before it. From my read, both implementation fulfills the requirement but glibc claims that single futex does not provide the stronger ordering needed by its users hence switched to a rotational queue, to fulfill the requirements mentioned in https://sourceware.org/bugzilla/show_bug.cgi?id=13165. As in a single futex, the lock acquisition do not happen in order and hence when a spurious wakeup happen, later waiters may "steal" the signal. However musl's shared implementation and bionic's whole implementation and Rust's std condvar still stick to a signal-futex+accounting data style. Musl's private condvar and this implementation are even stronger in the sense that not only the queue is decided as an atomic event, the mutex acquisition also happens in baton-passing style. Our implementation is different from musl in the sense that we have done some spin attempts instead of always do futex syscall to requeue threads (with a new requeued state added). This gives a chance for the signal/waiter to stay in fast path if the queue is really small. Based on the microbenchmark, this implementation is generally more performant. We also abuse the `RawMutex`'s LOCKED state as "waiting for its turn". One caveat we can see from the benchmark (https://github.com/SchrodingerZhu/condvar-benchmark) is that strict FIFO condvar is not that good if users are maintain the order on themselves as in `turn_ring`, because many threads may just wake up first only see it is not its turn while blocking the real thread in turn. However, a semaphore or other synchronization primitives would be more suitable in those cases. Even though in benchmark like `turn_ring/broadcast_stress` made this code perform badly (but still similar to musl anyway), they are not really the correct usage. In other cases, we are always close to the fastest while providing a stronger FIFO semantic. TODO in future commit: - support pthread cancellation. Ideally we should block cancellation when signal is consumed and add cancellation callback for checking illegibility. - think about stronger ordering semantic in shared case (?)
27 lines
837 B
C++
27 lines
837 B
C++
//===-- Linux implementation of the cnd_init function ---------------------===//
|
|
//
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
//
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
#include "src/threads/cnd_init.h"
|
|
#include "src/__support/CPP/new.h"
|
|
#include "src/__support/common.h"
|
|
#include "src/__support/macros/config.h"
|
|
#include "src/__support/threads/CndVar.h"
|
|
|
|
#include <threads.h> // cnd_t, thrd_error, thrd_success
|
|
|
|
namespace LIBC_NAMESPACE_DECL {
|
|
|
|
static_assert(sizeof(CndVar) == sizeof(cnd_t));
|
|
|
|
LLVM_LIBC_FUNCTION(int, cnd_init, (cnd_t * cond)) {
|
|
new (cond) CndVar(false);
|
|
return thrd_success;
|
|
}
|
|
|
|
} // namespace LIBC_NAMESPACE_DECL
|