Skip to content

Instantly share code, notes, and snippets.

@roostr
Last active March 19, 2024 14:40
Show Gist options
  • Save roostr/ead83858291ce3ce214fcf8ead3cd825 to your computer and use it in GitHub Desktop.
Save roostr/ead83858291ce3ce214fcf8ead3cd825 to your computer and use it in GitHub Desktop.
A watchdog timer for iOS to detect when the main run loop is stalled
//
// Watchdog.swift
//
// The MIT License (MIT)
// Copyright © 2023 Front Pocket Software LLC
//
// Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
// documentation files (the “Software”), to deal in the Software without restriction, including without limitation the
// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
// permit persons to whom the Software is furnished to do so, subject to the following conditions:
// The above copyright notice and this permission notice shall be included in all copies or substantial portions of the
// Software.
// THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
// WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS
// OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
// OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
// This class will monitor your app's run loop. It detects when a runloop starts and then fails to complete
// within TIMEOUT seconds. If this threshhold is exceeded, it will call timeoutHandler().
//
// You probably want to set your own timeoutHandler so that you can do something about this. Consider calling
// abort(), as that will send a crash report to Apple on devices that have diagnostic reporting turned on
// (which is typically about 20% of devices). Then, you can look at the stacktrace of the main thread to see
// where the main thread was around the time the TIMEOUT threshhold was exceeded.
//
// To ensure that access to currentRunLoopStartTime from concurrent threads behaves in a safe way, we use
// a serial DispatchQueue for all reads and writes to currentRunLoopStartTime.
//
// USAGE
// 1. Add this file to your project
// 2. If desired, tweak the TIMEOUT value to be different (default is 4 seconds)
// 3. Set your own timeoutHandler to override the default one
// 4. Call Watchdog.shared.start()
// 5. (optionally) call stop() when you enter background, start() when you come back to foreground
// (I don't know whether or not step 5 is necessary)
//
// TO VERIFY
// Put a `sleep(50)` into one of your button handlers, fire up your app, and tap that button.
//
// KNOWN ISSUES
// 1. You are not guaranteed to get your callback. We know that if the main run loop is stuck, something funky is going
// on. It is not known whether the serialQueue.asyncAfter task will fire, because it does not have a dedicated
// thread to run on, as it is sharing a thread pool. In my testing, this code worked reliably, but it's not
// guaranteed to work 100% of the time.
// 2. I haven't tested this in production yet. I will probably only add logging in my initial rollout of it to ensure
// it isn't hitting false-positives. So long as it's not, I plan to add the call to abort() in a future release.
// I'm going to call abort() so I can get a callstack of the main thread reported back in Xcode's crash reporting.
// Even then, I plan to only call abort() once per device (by remembering whether it was called with a UserDefaults
// flag), just as an extra protection against potential false positives and bad user experiences.
import Foundation
import UIKit
@objc final public class Watchdog: NSObject {
@objc public static let shared = Watchdog()
fileprivate static let TIMEOUT: TimeInterval = 4
fileprivate static let NOT_IN_RUN_LOOP: TimeInterval = 0
fileprivate let serialQueue = DispatchQueue(label: "watchdog.queue")
@objc public var timeoutHandler: () -> Void = { print("Listen: Main Run Loop has come unstuck in time.") }
fileprivate var runLoopStartTime: TimeInterval = NOT_IN_RUN_LOOP
fileprivate var isRunning = false
fileprivate lazy var runLoop: CFRunLoop = {
CFRunLoopGetMain()
}()
fileprivate lazy var runLoopObserver: CFRunLoopObserver = {
CFRunLoopObserverCreateWithHandler(kCFAllocatorDefault, CFRunLoopActivity.allActivities.rawValue, true, .min) { _, activity in
switch activity {
case .entry, .beforeTimers, .afterWaiting, .beforeSources:
self.serialQueue.async {
if self.runLoopStartTime == Self.NOT_IN_RUN_LOOP {
self.runLoopStartTime = Date().timeIntervalSince1970
}
}
case .beforeWaiting, .exit:
self.serialQueue.async {
self.runLoopStartTime = Self.NOT_IN_RUN_LOOP
}
default:
break
}
}
}()
deinit {
stop()
CFRunLoopObserverInvalidate(self.runLoopObserver)
}
@objc func start() {
guard !isRunning else { return }
isRunning = true
serialQueue.async {
self.runLoopStartTime = Self.NOT_IN_RUN_LOOP
}
CFRunLoopAddObserver(self.runLoop, self.runLoopObserver, CFRunLoopMode.commonModes)
enqueueBackgroundCheck()
}
@objc func stop() {
guard isRunning else { return }
isRunning = false
serialQueue.async {
self.runLoopStartTime = Self.NOT_IN_RUN_LOOP
}
CFRunLoopRemoveObserver(self.runLoop, self.runLoopObserver, CFRunLoopMode.commonModes)
}
fileprivate func enqueueBackgroundCheck(after: TimeInterval = TIMEOUT) {
guard isRunning else { return }
serialQueue.asyncAfter(deadline: .now() + after) {
guard self.runLoopStartTime != Self.NOT_IN_RUN_LOOP else {
Watchdog.shared.enqueueBackgroundCheck()
return
}
let now = Date().timeIntervalSince1970
let elapsedTime = now - self.runLoopStartTime
if elapsedTime >= Self.TIMEOUT {
self.timeoutHandler()
self.stop()
} else {
// How far into the current run loop are we? If the current runloop has already run
// for 1.5 seconds, then we should check back in after "4 - 1.5 = 2.5" seconds.
Watchdog.shared.enqueueBackgroundCheck(after: Self.TIMEOUT - elapsedTime)
}
}
}
}
@roostr
Copy link
Author

roostr commented Jan 25, 2023

Updated to add this to start:

        serialQueue.async {
            self.runLoopStartTime = Self.NOT_IN_RUN_LOOP
        }

Because it can't hurt, and it could address a possible race condition where stop() is called (which clears out the runLoopStartTime, but then a final run loop observer sneaks in after that and sets runLoopStartTime to a valid value. And then, if we called start() while the run loop wasn't executing (like the background), enqueueBackgroundCheck may think we are stuck because it's looking at a stale runLoopStartTime value.

So to avoid all that, we just reset the runLoopStartTime to be a sane value every time we call start().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment