Skip to content

Instantly share code, notes, and snippets.

@treeform
Created June 24, 2020 16:06
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save treeform/3e8c3be53b2999d709dadc2bc2b4e097 to your computer and use it in GitHub Desktop.
Save treeform/3e8c3be53b2999d709dadc2bc2b4e097 to your computer and use it in GitHub Desktop.
# nim c -r --threads:on --gc:orc
import cpuinfo, os, random, locks, deques
type
WorkReq = ref object
id: int
WorkRes = ref object
id: int
data: seq[int]
var
workThreads: array[32, Thread[int]]
inputQ: Deque[WorkReq]
inputLock: Lock
outputQ: Deque[WorkRes]
outputLock: Lock
template hold(lock: Lock, body: untyped) =
## Wraps withLock in a gcsafe block.
{.gcsafe.}:
withLock lock:
body
proc workThread(threadNum: int) {.thread.} =
## Work thread waits for work to arrive then does it.
## N of them can be running at one time.
while true:
var
ready = false
workReq: WorkReq
hold inputLock:
ready = inputQ.len > 0
if ready:
workReq = inputQ.popFirst()
if ready:
var workRes = WorkRes()
workRes.id = workReq.id
workRes.data = newSeq[int](500)
var z = workRes.id
# Do the actual work.
for n in 0 .. 10_000:
for i in 0 ..< workRes.data.len:
z = z mod 10 + z div 10
workRes.data[i] = z
hold outputLock:
outputQ.addLast(workRes)
proc askForWork() =
## Asks for work to be done.
while true:
sleep(0)
var inputLen, outputLen: int
# Its best to never hold 2 locks at the same time.
hold inputLock:
inputLen = inputQ.len
hold outputLock:
outputLen = outputQ.len
# echo "inputLen: ", inputLen, " outputLen: ", outputLen
# Keep the work q at 10 works always.
for i in 0 ..< 10 - inputLen:
var workReq = WorkReq()
workReq.id = rand(0 .. 10_000)
# echo "need ", workReq.id
hold inputLock:
inputQ.addLast(workReq)
# Get works back if any.
while true:
var
ready = false
workRes: WorkRes
hold outputLock:
ready = outputQ.len > 0
if ready:
workRes = outputQ.popFirst()
if ready:
echo "got ", workRes.id
else:
break
# Init the two locks.
inputLock.initLock()
outputLock.initLock()
# Start number of works threads as we have CPUs.
# Leave 1 cpu for the main thread.
# Leave 1 cpu for all other programs.
for i in 0 ..< clamp(countProcessors() - 2, 1, 32):
createThread(workThreads[i], workThread, i)
# Don't pin to 0th core as thats where most of the IO happens.
pinToCpu(workThreads[i], i + 1)
askForWork()
@treeform
Copy link
Author

Nim has a new garbage collector called Orc (enabled with --gc:orc). It’s a reference counting mechanism with cycle direction. Most important feature of --gc:orc is much better support for threads by sharing the heap between them.

Now you can just pass deeply nested ref objects between threads and it all works. My threading needs are pretty pedestrian. I basically have a work queue with several work threads and I need work done. I need to pass large nested objects to the workers and the workers produce large nested data back. The old way to do that is with channels, but channels copy their data. Copying data can actually be better and faster with “share nothing” concurrency. But it’s really bad for my use case of passing around large nested structures. Another way was to use pointers but then I was basically writing C with manual allocations and deallocations not nim! This is why the new --gc:orc works so much better for me.

You still need to use and understand locks. But it’s not that bad. I just use two locks for input queue and output queue. They try to acquire and release - hold the locks - for as little as possible. No thread holds more than 1 lock at a time.

Before creating objects and passing them between threads was a big issue. Default garbage collector (--gc:refc) gives each thread its own heap. With the old model objects allocated on one thread had to be deallocated on the same thread. This restriction is gone now!

Another big difference is that it’s more deterministic and supports distructors. Compilers can also infer where the frees will happen and optimize many allocations and deallocations with move semantics (similar to Rust). Sadly it can’t optimize all of them a way that is why reference counting exists. Also the cycle detector will try to find garbage cycles and free them as well.

This means I do not have to change the way I write code. I don’t have to mark my code in any special way and I don’t really have to worry about cycles. The new Orc GC is simply better.

This makes the new garbage collector--gc:orc a joy to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment