Concurrency allows for increased throughput, the amount of work that can be completed in a given amount of time, by utilizing additional available agents. Concurrency also allows for lower latency, the amount of time between when a task is submitted and when it is completed. In this chapter, we will be dealing with concurrent computation, as opposed to concurrent IO, although the two ideas are closely related.
Interleaved execution may be scheduled in one of three different ways. Preemption is interrupting one task, usually at a given time interval or when waiting, and switching to another task. Cooperative is when a task explicitly yields or awaits at points when a context switch may occur. Queued tasks run to completion and then another task is started.
auto r = f(x) * g(x);
f()
and g()
can be executed concurrently
f()
and g()
must complete before operator*()
can be applied
auto g_ = fork([&]{ return g(x) }]); // execute concurrently
auto f_ = f(x);
auto r = f_ * join(g_);
fork()
and join()
// Fork
decltype(g(x)) g_;
std::thread thread{[&]{
g_ = g(x);
}};
auto f_ = f(x);
// Join
thread.join();
auto r = f_ * g_;
Wired memory is not paged-out by the VM system. If memory is under pressure a thread, even if idle, imposes a performance penalty on the system
join()
will allow a context switchThread context switches are expensive because modern processors have dedicated memory caches per core. For the results of a core computation to be visible to another core requires a memory fence. A memory fence establishes an ordering of memory load and store operations. A memory fence must be understood by the processor and the compiler. If the compiler is not aware of a memory fence, it could reorder an operation so two threads would see inconsistent results.
An evaluation that writes to a memory location while another evaluation reads or writes the same memory location is a data race unless both are atomic operations or one of the conflicting operations happens-before as established with a memory fence. The result of a data race is undefined behavior.
template <typename T>
class bad_cow {
struct object_t {
explicit object_t(const T& x) : data_m(x) {}
atomic<int> count_m{1};
T data_m; };
object_t* object_m;
public:
explicit bad_cow(const T& x) : object_m(new object_t(x)) { }
~bad_cow() { if (0 == --object_m->count_m) delete object_m; }
bad_cow(const bad_cow& x) : object_m(x.object_m) { ++object_m->count_m; }
bad_cow& operator=(const T& x) {
if (object_m->count_m == 1) object_m->data_m = x;
else {
object_t* tmp = new object_t(x);
--object_m->count_m;
object_m = tmp;
}
return *this;
}
};
std::atomic_flag done;
decltype(g(x)) g_;
// fork
system_thread_pool([&]{
g_ = g(x);
done.test_and_set();
done.notify_one();
}]);
auto f_ = f(x);
done.wait(false); // join (deadlock?)
auto r = f_ * g_;
// fork
std::future<decltype(g(x))> g_ = async(system_thread_pool, [&]{ return g(x); });
auto f_ = f(x);
// join (deadlock?)
return f_ * g_.get();
return stlab::when_all(stlab::async(system_scheduler, [=]{ return g(x); }),
stlab::async(system_scheduler, [=]{ return f(x); }) |
[](const auto& a, const auto& b){ return a * b; });
return stlab::when_all(stlab::async(system_scheduler, [=]{ return g(x); }),
stlab::async(system_scheduler, [=]{ return f(x); }) |
[](const auto& a, const auto& b){ return a * b; });
return f(x) * g(x);
f()
and g()
do not share mutable state that is enough to execute these concurrently