Building seastar from source:
$ git clone https://github.com/scylladb/seastar.git
$ cd seastar
$ vi doc/building-ubuntu.md # or the appropriate file
$ # install all pre-requisites, not boost if you already have
$ ./configure.py --cflags "-I /opt/boost/include" --ldflags "-L /opt/boost/lib" # etc.
$ vi build.ninja # and then :s^-Werror ^^g<RET> and save
$ ninja
$ ls -l build/release/libseastar.a
TL;DR The [C10K problem] (www.kegel.com/c10k.html).
- Prevent the overheads of maintaining multiple processes / threads per CPU, namely creation and context-switch overheads.
- Prevent the overheads of blocking I/O.
- Shared-nothing architecture: Prevent the overheads of locking, and even overheads of lock-free synchronization, using minimal shared data.
- Easy API for async programming, based on futures and continuations.
- Seastar runs operations as micro-tasks - short-lived pieces of logic that process the result of an I/O operation and submit another I/O operation. Each core runs a scheduler for these micro-tasks.
- There is an alternative TCP stack based on the micro-task scheduler and shared-nothing architecture (--network-stack=native), as well as a DMA based storage access API, which can help scale I/O better. This helps in zero-copy data transfers. There are copy-less APIs from reading DMA'd data flowing into memory from devices, or for incorporating data in memory into outgoing TCP streams.
Key problems.
- Memory consumption increases linearly with the number of threads.
- Locking shared, writable data across threads scales badly on NUMA architectures.
- Inter-processor synchronizations scale badly on NUMA architectures, thus sharing writable data across processors kills scale. For example, allocating memory on one processor, and deallocating on another.
- Too many threads and context switching becomes an overhead. Too few, and the threads could be overloaded with I/O and long queues of connections could form.
How seastar solves it.
- Each seastar "thread" is assigned the responsibility of a subset of connections.
- All related data for those connections also remain mostly with that thread, in its cache.
- If data from another processor is required, it is requested explicitly by that thread - and received - via message passing with the "thread" on the other processor.
- For the network stack, seastar utilizes the capability of modern NICs to support a per-core packet queue to which packets are automatically diverted. This helps partitioning the connections and their I/O.
- Users should apply partitioning schemes to data, using hashing or similar mechanisms to assign data to CPU. Alternatively, for moderately-sized, frequently-read, rarely-written data, replicating it to all cores is also an option.
- Command-line options:
- -c4 where 4 is the number of hardware threads to run.
- -m10M where 10 mebibytes of memory is to be assigned to the application.
Many promises are available in seastar that return futures. For example:
#include <core/sleep.hh>
future<> sleep(std::chrono::duration<rep, period> dur); // returns a future without a value
Callbacks to execute when a future is available.
sleep(10s).then([] {
std::cout << "Done waiting ...\n";
});
We can return futures with chained continuations from a method. The continuations themselves return a future. The continuations in turn get invoked when the preceding future or continuation completes. For a function returning a future chain, the specific future-type returned by a function should match the return type of the last continuation invoked.
In chained continuations, the lambda in the continuation is passed the return value of the future that it is a continuation of.
There is also a .then_wrapped(...)
continuation. In this case, the preceding future itself, rather than the value it returns, is passed to the lambda inside. If the future encapsulates an exception rather than a value, this method does not skip the following continuation.
.finally()
continuation is called irrespective of whether one of the futures throws an exception or not. The intervening continuations between the one that threw and the finally
continuation are skipped.
do_with(T&& rval, F&& f)
: Retain the object that rval
refers to until the future f
completes. The code that runs in the future is passed an lvalue reference to this object, as a function parameter.
do_with(T1&& rv1, T2&& rv2, T3_or_F&& rv3_or_f, T... args)
: Retain the objects that all rv1
, rv2
, etc. refer to and pass lvalue-references to them as arguments to the future (which is the last parameter).
keep_doing(AsyncAction&& action)
: Invokes the action repeatedly until it fails (by throwing an exception).
repeat(AsyncAction&& action)
: Invokes the action repeatedly until either it fails (by throwing an exception?) or returns stop_iteration::yes
. The action
parameter should be a callable of arity 0 that returns future<stop_iteration>
.
do_for_each(Iterator it, Iterator end, AsyncAction&& action)
: Call action
on each element in the iterator range sequentially, i.e. call the next when the future from the previous call resolves. The action
must return a future
.
when_all(Future&&... fut)
: Return a future which will resolve only when all individual futures passed as parameters resolve (either normally or due to an exception). Returns a future wrapping a tuple of all the resolved futures.
when_all(FutureIterator f1, FutureIterator fN)
: Return a future which will resolve only when all individual futures passed as parameters resolve (either normally or due to an exception). Returns a future wrapping a vector of all the resolved futures.
parallel_for_each(Iterator it, Iterator end, Func&& func)
: Call func
on each element in the iterator range in parallel, and return a future which resolves when all the calls complete.
map_reduce(Iterator it, Iterator end, Mapper&& map, Initial initial, Reduce&& reduce)
: The map
function is called on each element, and then the result is passed to reduce
which combines it with the previous result (or with initial
if this is the first call to reduce
).
semaphore sem(count);
sem.notify(n);
sem.wait(n);
with_semaphore(sem, req_count, func)
: Waits for req_count
counts of the semaphore sem
, and then returns a future that executes the function func
and returns its result.
This construct allows limiting the number of tasks, and making sure all existing instances of the task finish but no new instances are initiated after that.
seastar::gate g;
g.enter().then(do_work()).finally([&g] { g.leave(); });
Alternatively:
seastar::gate g;
seastar::with_gate(g, [] { do_work(); });
g.close().then([] { std::cout << "All tasks done\n"; });
check_direct_io_support(sstring path)
future<bool> file_exists(sstring name)
future<int64_t> file_size(sstring name)
future<file> open_directory(sstring name)
future<file> open_file_dma(sstring name, open_flags)
future<file> open_file_dma(sstring name, open_flags, file_open_options)
make_directory(sstring name)
touch_directory(sstring name)
recursive_touch_directory(sstring name)
remove_file(sstring name)
rename_file(sstring name, sstring new_name)
link_file(sstring oldp, sstring newp)
close
disk_read/write_dma_alignment
memory_dma_alignment
future<size_t> dma_read(aligned_pos, aligned_buffer, aligned_len, io_priority_class = default)
future<temporary_buffer<chartype>> dma_read(pos, len, io_priority_class = default)
future<temporary_buffer<chartype>> dma_read_bulk
future<size_t> dma_write(pos, std::vector<iovec>)
future<size_t> dma_write(pos, const CharType* buffer, size_t len)