Skip to content

Instantly share code, notes, and snippets.

@michaelsbradleyjr
Last active March 9, 2021 01:45
Show Gist options
  • Save michaelsbradleyjr/1eaa9937b3fbb4ffff3fb814f0dd82a9 to your computer and use it in GitHub Desktop.
Save michaelsbradleyjr/1eaa9937b3fbb4ffff3fb814f0dd82a9 to your computer and use it in GitHub Desktop.

Sorry for the delayed reply, so much has been learned in recent days and weeks; even yesterday I was double-checking my own understanding, so I needed to take a little care to give answers that are (I hope!) as accurate as possible.

  1. As I understood, AsyncChannel is a type provided by nim-chronos as an alternative to Nim's built-in asyncdispatch. What would the usage of Nim's asyncdispatch look like and why aren't we using that instead?

There are important differences between Nim's asyncdispatch and status-im/nim-chronos. For an overview see:

https://github.com/status-im/nim-chronos/wiki/AsyncDispatch-comparison

We can use asyncdispatch, or things that are built with it, e.g. Nim's AsyncHttpClient:

https://nim-lang.org/docs/httpclient.html#AsyncHttpClient

^ an example at the very top of that page uses newAsyncHttpClient, which involves import asyncdispatch, httpclient.

There are some things to consider:

Projects such as status-im/nim-waku are built with nim-chronos, which means that to use waku we need to run it in a thread where a chronos event loop is running.

In any thread where you run an asyncdispatch event loop you can't also run a chronos event loop. Either kind of event loop needs to "manage the thread", in a sense, and one event loop will get in the way of the other. Similarly, you can't use the await macro of chronos with an instance of Future of asyncdispatch; and you can't use the await macro of asyncdispatch with an instance of Future of chronos.

Something else to note is that we can't run an asyncdispatch or chronos event loop in the main thread of status-desktop because that's where Qt's event loop runs. It's basically the same problem as above: Qt's event loop needs to manage the main thread, which precludes another event loop running alongside it.

So running an event loop of either asyncdispatch or chronos will always involve a {.thread.} proc plus createThread or a spawn, i.e. we can only run one or the other's event loop on a separate thread.

But! You can have multiple threads running different event loops, e.g. main thread runs Qt's event loop, there's another thread running an asyncdispatch event loop, and there's another thread running a chronos event loop. Note that where I wrote "another thread" it can be any number of threads because each thread gets its own independent event loop; that's true for both asyncdispatch and chronos.

So why aren't we using asyncdispatch (or nim-chronos for that matter) already?

I'm not 100% sure, but it's probably because we hadn't understood/realized a use case. For the most part spawn (via spawnAndSend in src/status/threads.nim of status-desktop) has served us pretty well, though we (inadvertently) haven't been "following the rules" 🤦 re: Nim memory safety and it's been getting us in trouble, eventually causing SIGSEGV crashes.

  1. Using standard Channel[T], Michael has mentioned that, due to the fact that .recv() is blocking, it's super hard to use them for cross thread communication. It's not entirely clear to me why that is the case, given that channels are a cross thread communication tool to begin with. Just a thought experiment here: as long as a thread is doing only a particular task and sticks to, say a certain type of signal, shouldn't it be fine to use for cross thread communication?

The problem arises when you want that one thread with standard Channel to be able to "do work" at the same time as it's waiting for the next message, which would concurrently kick off the next bit of work.

Imagine if you have a worker thread that when it receives a message (url string) on a standard Channel[string] named chanRecv it calls/discards an async proc (asyncdispatch) like this:

proc asyncProc(url: string, chanSend: Channel[string]): Future[void] {.async.} =
  var client = newAsyncHttpClient()
  chanSend.send(await client.getContent(url))

^ chanSend would be a standard Channel[string] used for communicating from the worker thread back to whatever thread setup the worker. It's actually a bit more complicated than that, because for concurrent asyncProc invocations we need to id them somehow, but that's not a fundamental problem.

After calling/discarding asyncProc, the worker would go back to the top of its while true loop and wait for another message on chanRecv, while presumably the async I/O work of asyncProc keeps running... 🛑 🤯

The showstopper is that the call to .recv() on the standard Channel completely blocks the thread from doing anything else (including async I/O) until the next message arrives. So the hoped for concurrency from combining standard Channel and asyncdispatch cannot be achieved. It's a bit sad, really.

But this is where nim-chronos and AsyncChannel are different. await [achannel].recv() does not block the thread, it just yields control back to the event loop of chronos running on that thread, so e.g. waiting on the next message and async I/O can happen concurrently.

Sadly, there's not yet a full-featured AsyncHttpClient that's compatible with nim-chronos but one is in the works:

status-im/nim-chronos#42

  1. Michael has mentioned that mutating data from threads isn't possible in Nim. There are use cases however, where that might be the desired thing to do. Other languages allow for that using things like Mutex that ensure only one thread can access data at a time. If mutexes don't fit the bill, sending messages through standard channels to initiate mutation sounds like a good alternative. Would that be a reasonable approach to mutate data through threads in Nim? Obviously, mutation would then only happen on the main thread.

With Nim's current memory model and GC, any data subject to GC on one thread cannot be safely read from or written to by another thread. There are no viable workarounds. Locks do not help. If you turn off the compiler's safety checks altogether or mark some offending code with {.gcsafe.} while not taking care to make sure that somehow it really is safe (but how???) then eventually your program will crash. An example in status-desktop is the cache for getSettings: it eventually/randomly causes a SIGSEGV crash. As explained by zahary:

The reason why it's not safe to access any ref object in multiple threads is that Nim may insert code that increases and decreases the ref counts even when you only read the object (it may be a temporary reference for example)

Since the ref count updates assume that there is only a single accessor thread, they are not atomic, so you might get a race in the updates that end up corrupting the ref count

He said "ref object", but it's true for any data subject to a thread's GC, e.g. string.

If you need to share data between threads safely there are a few options:

It is safe to reference const across threads.

You can deepCopy the data before starting the other thread. That's what spawn foo(bar, baz) does for you. Whatever bar and baz are, even if they're ref object, they will automatically be deeply copied to the heap of the spawned thread before foo runs on that new thread. When you call .send(stuff) on a standard Channel it also involves an automatic deepCopy.

You can also use allocShared then copyMem to copy data from one thread to a shared heap that is never subject to garbage collection. Management of that memory is fully manual and requires great care. The location of that data can be communicated by e.g. casting to pointer and sending that pointer to another thread. Neither spawn foo(p) nor .send(p) of standard Channel will copy the pointer p since it's an untraced reference (terminology of Nim's manual). The other thread could then copy/cast the data in its own heap. Or multiple threads could negotiate management of the shared heap memory (reads/writes/resizes) via locks, but good luck 🤪 doing that successfully except in the simplest of cases.

  1. Coming back to AsyncChannel: Did I get it right that ThreadSafeString is the only thread safe data we're able to send across async channels and that other types have to be serialized as such?

.send() of AsyncChannel doesn't automatically copy data before sending it. So we need to serialize data, allocate memory for it on the shared heap, copy the serialized data to that allocated memory, and then send a pointer. The safe proc of ThreadSafeString handles everything but the serialization and sending. The result of "foo".safe is essentially a pointer for data on the shared heap because cstring is essentially a pointer.

For serialization we'll be using status-im/nim-json-serialization. It conveniently allows for a "$type" property to be inserted into the encoded json string so that when deserializing you can figure out the type and then do a type-specific Json.decode().

When a ThreadSafeString is received over an AsyncChannel on another thread, it can be converted to a Nim string on that thread's heap with the $ proc for ThreadSafeString. The $ operation also deallocates its memory from the shared heap, but that can only be done safely once:

type
  ThreadSafeString* = distinct cstring

proc safe*(input: string): ThreadSafeString =
  var res = cast[cstring](allocShared(input.len + 1))
  copyMem(res, input.cstring, input.len)
  res[input.len] = '\0'
  res.ThreadSafeString

proc `$`*(input: ThreadSafeString): string =
  result = $(input.cstring)
  deallocShared input.cstring

var x = $($($"foo"))
# ^ pointless but okay since redundant use of `$` of type `string` is safe
# and returns type `string`

var y = x.safe
var z = $($($y))
# ^ pointless but okay since outer two `$` are of type `string` since `$y`
# returns type `string`

var w = $y
# ^ may crash with SIGSEGV or give unexpected result because shared heap mem
# was already deallocated by previous `$y`

There is absolutely no way to check if memory has already been deallocated, so it requires careful planning ahead re: what code is responsible for calling $ of ThreadSafeString.

  1. Generally on thread, is it possible to retrieve return value from threads once they are joined back into the main thread they've been forked from? For example, say you spin up a thread to read a file and return its contents back to the main thread, would channels be the only way to make that data available, or can we access it once the join handle is resolved?

It is possible to do something like that. See FlowVar in Nim's docs for threadpool:

https://nim-lang.org/docs/threadpool.html

It doesn't seem too useful in the context of status-desktop and we're already using another mechanism that's well suited to the job, i.e. NimQML's signal_handler, which makes use of DOtherSide's dos_signal.

How status-desktop works currently is that some {.slot.} procs in the views use spawnAndSend to do work on another thread; the data resulting from that work is sent back to the main thread via NimQML's signal_handler, and Qt later hands that data to a matching {.slot.} proc, which runs in the main thread and can then update data structures that are in the main thread's heap. An important aspect is that the spawned threads only send strings (data structures serialized into json) through signal_handler/dos_signal; those strings are copied by dos_signal before being put in a queue managed by Qt so the spawned Nim thread doesn't inopportunely GC them while they're being round-tripped to the main thread by Qt.

@emizzle and I are currently working on a slightly different setup that will use AsyncChannel instead of spawnAndSend to trigger long-running tasks and tasks that run in a threadpool (a custom pool implementation based on AsyncChannel). However, since we can't run a chronos event loop in the main thread, those tasks will also use signal_handler to communicate back to the main thread.

You might wonder how we can send through AsyncChannel from the main thread to the task manager since we can't run a chronos event loop in the main thread. The explanation is that AsyncChannel provides sendSync and recvSync procs that can be invoked in a thread that's not running a chronos event loop. Wrapping .sendSync() in a "proxy proc" is a natural stand-in for spawnAndSend, but .recvSync() would block the main thread (no good!) so we rely on the signal_handler/dos_signal wiring instead.

That's it so far, hope the questions make sense. Thanks again for doing all the research! @emizzle @brainproxy ^

Thanks for taking the time to watch the presentation and come back to us with such great questions!

All of the above is an avalanche of concepts and information, and some of it may be unclear, needs more/better explanations, etc. Please feel free to ask more questions. The more people on the desktop team that understand Nim's memory safety rules and the options we have for safely doing multi-threaded work, it's all the better for the status-desktop project. 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment