Sorry for the delayed reply, so much has been learned in recent days and weeks; even yesterday I was double-checking my own understanding, so I needed to take a little care to give answers that are (I hope!) as accurate as possible.
- As I understood,
AsyncChannel
is a type provided bynim-chronos
as an alternative to Nim's built-inasyncdispatch
. What would the usage of Nim'sasyncdispatch
look like and why aren't we using that instead?
There are important differences between Nim's asyncdispatch
and status-im/nim-chronos
. For an overview see:
https://github.com/status-im/nim-chronos/wiki/AsyncDispatch-comparison
We can use asyncdispatch
, or things that are built with it, e.g. Nim's AsyncHttpClient
:
https://nim-lang.org/docs/httpclient.html#AsyncHttpClient
^ an example at the very top of that page uses newAsyncHttpClient
, which involves import asyncdispatch, httpclient
.
There are some things to consider:
Projects such as status-im/nim-waku
are built with nim-chronos
, which means that to use waku
we need to run it in a thread where a chronos
event loop is running.
In any thread where you run an asyncdispatch
event loop you can't also run a chronos
event loop. Either kind of event loop needs to "manage the thread", in a sense, and one event loop will get in the way of the other. Similarly, you can't use the await
macro of chronos
with an instance of Future
of asyncdispatch
; and you can't use the await
macro of asyncdispatch
with an instance of Future
of chronos
.
Something else to note is that we can't run an asyncdispatch
or chronos
event loop in the main thread of status-desktop because that's where Qt's event loop runs. It's basically the same problem as above: Qt's event loop needs to manage the main thread, which precludes another event loop running alongside it.
So running an event loop of either asyncdispatch
or chronos
will always involve a {.thread.}
proc plus createThread
or a spawn
, i.e. we can only run one or the other's event loop on a separate thread.
But! You can have multiple threads running different event loops, e.g. main thread runs Qt's event loop, there's another thread running an asyncdispatch
event loop, and there's another thread running a chronos
event loop. Note that where I wrote "another thread" it can be any number of threads because each thread gets its own independent event loop; that's true for both asyncdispatch
and chronos
.
So why aren't we using asyncdispatch
(or nim-chronos
for that matter) already?
I'm not 100% sure, but it's probably because we hadn't understood/realized a use case. For the most part spawn
(via spawnAndSend
in src/status/threads.nim
of status-desktop) has served us pretty well, though we (inadvertently) haven't been "following the rules" 🤦 re: Nim memory safety and it's been getting us in trouble, eventually causing SIGSEGV crashes.
- Using standard
Channel[T]
, Michael has mentioned that, due to the fact that.recv()
is blocking, it's super hard to use them for cross thread communication. It's not entirely clear to me why that is the case, given that channels are a cross thread communication tool to begin with. Just a thought experiment here: as long as a thread is doing only a particular task and sticks to, say a certain type of signal, shouldn't it be fine to use for cross thread communication?
The problem arises when you want that one thread with standard Channel
to be able to "do work" at the same time as it's waiting for the next message, which would concurrently kick off the next bit of work.
Imagine if you have a worker thread that when it receives a message (url string) on a standard Channel[string]
named chanRecv
it calls/discards an async proc (asyncdispatch
) like this:
proc asyncProc(url: string, chanSend: Channel[string]): Future[void] {.async.} =
var client = newAsyncHttpClient()
chanSend.send(await client.getContent(url))
^ chanSend
would be a standard Channel[string]
used for communicating from the worker thread back to whatever thread setup the worker. It's actually a bit more complicated than that, because for concurrent asyncProc
invocations we need to id
them somehow, but that's not a fundamental problem.
After calling/discarding asyncProc
, the worker would go back to the top of its while true
loop and wait for another message on chanRecv
, while presumably the async I/O work of asyncProc
keeps running... 🛑 🤯
The showstopper is that the call to .recv()
on the standard Channel
completely blocks the thread from doing anything else (including async I/O) until the next message arrives. So the hoped for concurrency from combining standard Channel
and asyncdispatch
cannot be achieved. It's a bit sad, really.
But this is where nim-chronos
and AsyncChannel
are different. await [achannel].recv()
does not block the thread, it just yields control back to the event loop of chronos
running on that thread, so e.g. waiting on the next message and async I/O can happen concurrently.
Sadly, there's not yet a full-featured AsyncHttpClient
that's compatible with nim-chronos
but one is in the works:
- Michael has mentioned that mutating data from threads isn't possible in Nim. There are use cases however, where that might be the desired thing to do. Other languages allow for that using things like
Mutex
that ensure only one thread can access data at a time. If mutexes don't fit the bill, sending messages through standard channels to initiate mutation sounds like a good alternative. Would that be a reasonable approach to mutate data through threads in Nim? Obviously, mutation would then only happen on the main thread.
With Nim's current memory model and GC, any data subject to GC on one thread cannot be safely read from or written to by another thread. There are no viable workarounds. Locks do not help. If you turn off the compiler's safety checks altogether or mark some offending code with {.gcsafe.}
while not taking care to make sure that somehow it really is safe (but how???) then eventually your program will crash. An example in status-desktop is the cache for getSettings
: it eventually/randomly causes a SIGSEGV crash. As explained by zahary:
The reason why it's not safe to access any ref object in multiple threads is that Nim may insert code that increases and decreases the ref counts even when you only read the object (it may be a temporary reference for example)
Since the ref count updates assume that there is only a single accessor thread, they are not atomic, so you might get a race in the updates that end up corrupting the ref count
He said "ref object", but it's true for any data subject to a thread's GC, e.g. string
.
If you need to share data between threads safely there are a few options:
It is safe to reference const
across threads.
You can deepCopy
the data before starting the other thread. That's what spawn foo(bar, baz)
does for you. Whatever bar
and baz
are, even if they're ref object
, they will automatically be deeply copied to the heap of the spawned thread before foo
runs on that new thread. When you call .send(stuff)
on a standard Channel
it also involves an automatic deepCopy
.
You can also use allocShared
then copyMem
to copy data from one thread to a shared heap that is never subject to garbage collection. Management of that memory is fully manual and requires great care. The location of that data can be communicated by e.g. casting to pointer
and sending that pointer to another thread. Neither spawn foo(p)
nor .send(p)
of standard Channel
will copy the pointer p
since it's an untraced reference (terminology of Nim's manual). The other thread could then copy/cast the data in its own heap. Or multiple threads could negotiate management of the shared heap memory (reads/writes/resizes) via locks, but good luck 🤪 doing that successfully except in the simplest of cases.
- Coming back to AsyncChannel: Did I get it right that
ThreadSafeString
is the only thread safe data we're able to send across async channels and that other types have to be serialized as such?
.send()
of AsyncChannel
doesn't automatically copy data before sending it. So we need to serialize data, allocate memory for it on the shared heap, copy the serialized data to that allocated memory, and then send a pointer. The safe
proc of ThreadSafeString
handles everything but the serialization and sending. The result of "foo".safe
is essentially a pointer for data on the shared heap because cstring
is essentially a pointer.
For serialization we'll be using status-im/nim-json-serialization
. It conveniently allows for a "$type"
property to be inserted into the encoded json string so that when deserializing you can figure out the type and then do a type-specific Json.decode()
.
When a ThreadSafeString
is received over an AsyncChannel
on another thread, it can be converted to a Nim string
on that thread's heap with the $
proc for ThreadSafeString
. The $
operation also deallocates its memory from the shared heap, but that can only be done safely once:
type
ThreadSafeString* = distinct cstring
proc safe*(input: string): ThreadSafeString =
var res = cast[cstring](allocShared(input.len + 1))
copyMem(res, input.cstring, input.len)
res[input.len] = '\0'
res.ThreadSafeString
proc `$`*(input: ThreadSafeString): string =
result = $(input.cstring)
deallocShared input.cstring
var x = $($($"foo"))
# ^ pointless but okay since redundant use of `$` of type `string` is safe
# and returns type `string`
var y = x.safe
var z = $($($y))
# ^ pointless but okay since outer two `$` are of type `string` since `$y`
# returns type `string`
var w = $y
# ^ may crash with SIGSEGV or give unexpected result because shared heap mem
# was already deallocated by previous `$y`
There is absolutely no way to check if memory has already been deallocated, so it requires careful planning ahead re: what code is responsible for calling $
of ThreadSafeString
.
- Generally on thread, is it possible to retrieve return value from threads once they are joined back into the main thread they've been forked from? For example, say you spin up a thread to read a file and return its contents back to the main thread, would channels be the only way to make that data available, or can we access it once the join handle is resolved?
It is possible to do something like that. See FlowVar
in Nim's docs for threadpool
:
https://nim-lang.org/docs/threadpool.html
It doesn't seem too useful in the context of status-desktop and we're already using another mechanism that's well suited to the job, i.e. NimQML's signal_handler
, which makes use of DOtherSide's dos_signal
.
How status-desktop works currently is that some {.slot.}
procs in the views use spawnAndSend
to do work on another thread; the data resulting from that work is sent back to the main thread via NimQML's signal_handler
, and Qt later hands that data to a matching {.slot.}
proc, which runs in the main thread and can then update data structures that are in the main thread's heap. An important aspect is that the spawned threads only send strings (data structures serialized into json) through signal_handler
/dos_signal
; those strings are copied by dos_signal
before being put in a queue managed by Qt so the spawned Nim thread doesn't inopportunely GC them while they're being round-tripped to the main thread by Qt.
@emizzle and I are currently working on a slightly different setup that will use AsyncChannel
instead of spawnAndSend
to trigger long-running tasks and tasks that run in a threadpool (a custom pool implementation based on AsyncChannel
). However, since we can't run a chronos
event loop in the main thread, those tasks will also use signal_handler
to communicate back to the main thread.
You might wonder how we can send through AsyncChannel
from the main thread to the task manager since we can't run a chronos
event loop in the main thread. The explanation is that AsyncChannel
provides sendSync
and recvSync
procs that can be invoked in a thread that's not running a chronos
event loop. Wrapping .sendSync()
in a "proxy proc" is a natural stand-in for spawnAndSend
, but .recvSync()
would block the main thread (no good!) so we rely on the signal_handler
/dos_signal
wiring instead.
That's it so far, hope the questions make sense. Thanks again for doing all the research! @emizzle @brainproxy ^
Thanks for taking the time to watch the presentation and come back to us with such great questions!
All of the above is an avalanche of concepts and information, and some of it may be unclear, needs more/better explanations, etc. Please feel free to ask more questions. The more people on the desktop team that understand Nim's memory safety rules and the options we have for safely doing multi-threaded work, it's all the better for the status-desktop project. 😃