Skip to content

Instantly share code, notes, and snippets.

@tclementdev
Last active September 28, 2024 12:10
Show Gist options
  • Save tclementdev/6af616354912b0347cdf6db159c37057 to your computer and use it in GitHub Desktop.
Save tclementdev/6af616354912b0347cdf6db159c37057 to your computer and use it in GitHub Desktop.
Making efficient use of the libdispatch (GCD)

libdispatch efficiency tips

The libdispatch is one of the most misused API due to the way it was presented to us when it was introduced and for many years after that, and due to the confusing documentation and API. This page is a compilation of important things to know if you're going to use this library. Many references are available at the end of this document pointing to comments from Apple's very own libdispatch maintainer (Pierre Habouzit).

My take-aways are:

  • You should create very few, long-lived, well-defined queues. These queues should be seen as execution contexts in your program (gui, background work, ...) that benefit from executing in parallel. An important thing to note is that if these queues are all active at once, you will get as many threads running. In most apps, you probably do not need to create more than 3 or 4 queues.

  • Go serial first, and as you find performance bottle necks, measure why, and if concurrency helps, apply with care, always validating under system pressure. Reuse queues by default and add more if there's some measurable benefit to it. Do not attempt to go wide by default.

  • Queues that target other (non-global) queues are fine and you can have many of those (the main point is having different labels). You can create such queues with DispatchQueue(label:target:).

  • Don't use DispatchQueue.global(). Global queues easily lead to thread explosion: threads blocking on sleeps/waits/locks are considered inactive by the libdispatch which in turn will spawn new threads when other parts of your program dispatch. Note that it is impossible to guarantee that your threads are never going to block, as merely using the system libraries will cause it to happen. Global queues also do not play nice with qos/priorities. The libdispatch maintainer at Apple declared it "the worst thing that the dispatch API provides". Run your code on one of your custom queue instead (one of your well-defined execution context).

  • Concurrent queues are not as optimized as serial queues. Use them if you measure a performance improvement, otherwise it's likely premature optimization.

  • queue.async() is wasteful if the dispatched block is small (< 1ms), as it will most likely require a new thread due to libdispatch's overcommit behavior. Prefer locking to protect shared state (rather than switching the execution context).

  • Some classes/libraries are better designed as synchronous APIs, reusing the execution context from their callers/clients (instead of creating their own private queues which can lead to terrible performance). That means using traditional locking for thread-safety.

  • Locks are not as bas as people think they are. They still work extremely well to protect shared state, they are fast and keep the code synchronous which allows to avoid reentrancy problems altogether. See OSAllocatedUnfairLock and the brand new Mutex types.

  • Do not block the current thread waiting on a semaphore or dispatch group after dispatching work. This is inefficient as the kernel can't know what thread will ultimately unblock the thread. Rather, continue work asynchronously in a completion handler that is executed once the asynchronous work ends or just do the work synchronously. If you are doing XPC, synchronous message sending is fine and can be achieved with synchronousRemoteObjectProxyWithErrorHandler().

  • Do not use DispatchQueue.main in non-GUI programs and frameworks. Per the <dispatch/queue.h> header: "Because the main queue doesn't behave entirely like a regular serial queue, it may have unwanted side-effects when used in processes that are not UI apps (daemons). For such processes, the main queue should be avoided".

  • If running concurrently, your work items need not to contend, else your performance sinks dramatically. Contention takes many forms. Locks are obvious, but it really means use of shared resources that can be a bottle neck: IPC/daemons, malloc (lock), shared memory, I/O, ...

  • You don't need to be async all the way to avoid thread explosion. Using a small number of dispatch queues and not using DispatchQueue.global() is a better fix.

  • The complexity (and bugs) of heavy async/callback designs also cannot be ignored. Synchronous code remains much easier to read, write and maintain.

  • Utilizing more than 3-4 cores isn't something that is easy, most people who try actually do not scale and waste energy for a modicum performance win. It doesn't help that CPUs have thermal issues if you ramp up, e.g. Intel will turn off turbo-boost if you use enough cores.

  • Measure the real-world performance of your product to make sure you are actually making it faster and not slower. Be very careful with micro benchmarks (they hide cache effects and keep thread pools hot), you should always have a macro benchmark to validate what you're doing.

  • libdispatch is efficient but not magic. Resources are not infinite. You cannot ignore the reality of the underlying operating system and hardware you're running on. Not all code is prone to parallelization.

@tclementdev

References

This long discussion on the swift-evolution mailing-list started it all (look for Pierre Habouzit).

Use very few queues

Go serial first

Don't use global queues

Avoid concurrent queues in almost all circumstances

Don't use async to protect shared state

Don't use async for small tasks

Some classes/libraries should just be synchronous

Contention is a performance killer for concurrency

To avoid deadlocks, use locks to protect shared state

Don't use semaphores to wait for asynchronous work

Synchronous IPC is not bad

The NSOperation API has some serious performance pitfalls

Locks are not as bad as people think they are (from libdispatch's original designer)

Avoid micro-benchmarking

Resources are not infinite

Background QOS work is paused when low-power mode is enabled

About dispatch_async_and_wait()

Utilizing more than 3-4 cores isn't something that is easy

A lot of iOS 12 perf wins were from daemons going single-threaded

This page is the real deal

@snej
Copy link

snej commented Apr 27, 2018

dispatch_async() is wasteful if the dispatched block is small

Yes. Another reason, besides the one you gave, is that it always has to copy the block to the heap. That involves calling malloc (and incurring a future call to free), which is way, way more expensive than taking a lock or calling dispatch_sync.

The concurrency pattern you're outlining here — a few queues with well-defined purposes that are invoked through dispatch_async — is very much like the Actor model. I've had very good results in my current project by building some base (C++) classes that implement this model on top of libdispatch. Each Actor object owns a dispatch queue. The methods that do the real work are all private, but each one has a corresponding public method that simply calls dispatch_async to delegate to the private method.

@MadCoder
Copy link

Each Actor object owns a dispatch queue. The methods that do the real work are all private, but each one has a corresponding public method that simply calls dispatch_async to delegate to the private method.

If you do that you need to either have very few such actors, or target their internal queues to a shared one, else you created too much concurrency which goes exactly against what @tclementdev explains above ;)

@cleanbit
Copy link

There is an interesting WWDC talk about this stuff.

@orospakr
Copy link

Very handy writeup, thank you!

A question, though: what is a "bottom queue"? Googling the term brings you back to this very gist.

@tclementdev
Copy link
Author

@orospakr Bottom queues are queues that do not target another one of your queue, i.e. DispatchQueue(label: "...") is a bottom queue, DispatchQueue(label: "...", target: anotherQueue) is not.

@Enricoza
Copy link

Hi @tclementdev, thanks a lot for this list!
I was reading the documentation of Dispatch Queue and it states:

Instead of creating private concurrent queues, submit tasks to one of the global concurrent dispatch queues. For serial tasks, set the target of your serial queue to one of the global concurrent queues. That way, you can maintain the serialized behavior of the queue while minimizing the number of separate queues creating threads.

Now, I could maybe see why it advocates for using global instead of private concurrent queues (the less queues the better), but I wanted to ask you what do you think about the "for serial tasks, set the target of your serial queue to one of the global concurrent queues"?

Does it really make any difference if I'm using 3-4 "bottom queues" with no target or with global target?
Or (I'm speculating) maybe is that even worse as we could loose some of the optimizations of the serial queues by using the global target?

@tclementdev
Copy link
Author

tclementdev commented Jan 11, 2021

Hi @Enricoaz, targeting the global queues might make a difference (my understanding is that it may reduce the libdispatch behavior of overcommitting the number of available of cores with additional threads) but this is really poorly documented territory and also a widely unknown practice so I would be careful about relying on that. The general recommendation is to never dispatch directly to the global queues and instead use a small set of well-defined long-lived dispatch queues. If you do this then you can't ever cause thread-explosion because your set of dispatch queues bounds your program's concurrency and the number of threads that can possibly exist simultaneously in your program.

Also one important thing to note is that you cannot guarantee that your code will never block: even if you take extra care not to, the system libraries and frameworks that you rely on will inevitably do it. This makes using the global queues impossible in practice.

@muukii
Copy link

muukii commented Jan 19, 2021

Don't use DispatchQueue.global()

How about specifying qos? Is that the same?

@tclementdev
Copy link
Author

@muukii, this is not related. You can attach a qos to you own queues at creation time and you can attach qos to dispatch work items as well (although it's probably better to do the former). No need to deal with global queues for that.

@muukii
Copy link

muukii commented Jan 19, 2021

@muukii, this is not related. You can attach a qos to you own queues at creation time and you can attach qos to dispatch work items as well (although it's probably better to do the former). No need to deal with global queues for that.

got it. thank you!

@scott-vsi
Copy link

scott-vsi commented Mar 18, 2021

Since everyone seems to be piling on: Grand Central Dispatch is a douchebag by Pierre Lebeaupin via Michael Tsai

@Kentzo
Copy link

Kentzo commented May 21, 2021

That's by Pierre Lebeaupin, not Michael Tsai.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment