@aeschylus - "I think as someone writing a more complex renderer, I would want to provide my own timer/ticker/clock/game loop and use eventedCanvas as a container for the information that comes from that process" - That makes sense to me, as far as my understanding of implementation goes. Whether it's Manifesto or eventedCanvas or a combination of both, something needs to hold the "scene graph" and support the renderer in the right place at the right time, where time is both relevant to the rendering of a static scene (Presentation 2.1, where timing is about marshaling requests and responses), and the rendering of a scene with a temporal extent (Presentation 3/AV, where timing is also about when a resource is annotated onto a canvas, as well as where). In Presentation 2.1, state changes come from the intial setup's requests and responses, and then user actions (switching between items in oa:Choice for example). In Presentation 3, all that still applies, but state changes can also come from timed events - "this video appears at t=45", "this piece of text appears over the image at t=100".
The first attempt at translating the AV use cases to JSON-LD (i.e., Presentation API 3) results in a set of fixtures that to me feel natural and fairly simple. But the implementation of some of them will require timing management that could get difficult.
The 99% use case today is one image annotating one canvas for its full extent. In fact, that's way more than 99% (I don't know how many nines).
For AV, the most common scenario will be only slightly more complicated. As well as single audio or video files, it will be common to see a Choice (aka oa:Choice) annotating the entire canvas for its entire duration, to allow for a choice between formats - e.g., https://github.com/IIIF/iiif-av/blob/master/source/api/av/examples/13.json. With this, there's no complicated timing or synchronisation concerns, and the annotations on the canvas map to an HTML5 video/audio tag with one or more sources (or the "sources"-style parameters of common js libraries for AV).
But as soon as there is more than one AV resource annotating the canvas, timing and synchronisation enter the picture, and the notion of the canvas's own timeline is required. "Play this, then that" or "play these two things together" or "show these images at these times while this audio plays". There's a discussion of various approaches to timing on the web on the #av thread at the moment.
A IIIF Canvas renderer might really be a renderer-manager, and have to make a decision based on an evaluation of the kinds of content annotating the canvas which one of several renderers to use for that canvas. "It's an image with an Image API Service, use my OSD renderer" or "It's a video, use my time-based media renderer". The UV makes this kind of decision at the sequence level; no IIIF viewers have so far made it at the canvas level because it was all about images. If the content annotating the canvas is a mixture of images that have Image API endpoints (and therefore could be tileSources) and AV resources, the decision for the client is not so obvious.
All sorts of funky clients could be built to render these models, but I suspect that for a general purpose client attempting to do best by the content, the presence of any video material on a canvas would be enough to avoid firing up a tileSource-based rendering even if there are images on the canvas that could be tiled. Audio plus deep zoom would be OK to manage (e.g., audio of a curator talking about a painting).