Skip to content

Instantly share code, notes, and snippets.

@devarsht
Last active March 5, 2024 15:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save devarsht/cd6bbb4ba90b0229be4718b7140ef924 to your computer and use it in GitHub Desktop.
Save devarsht/cd6bbb4ba90b0229be4718b7140ef924 to your computer and use it in GitHub Desktop.
vpu_poll_range_diff_v2
1: 9dff0110f466 ! 1: 10eda94f7fba media: chips-media: wave5: Add hrtimer based polling support
@@ Metadata
## Commit message ##
media: chips-media: wave5: Add hrtimer based polling support
- Add support for starting a polling timer in case interrupt is not
- available. This helps keep the VPU functional in SoC's such as AM62A, where
- the hardware interrupt hookup may not be present due to an SoC errata [1].
+ Add support for starting a polling timer in case an interrupt is not
+ available. This helps to keep the VPU functional in SoCs such as AM62A,
+ where the hardware interrupt hookup may not be present due to an SoC errata
+ [1].
- The timer is shared across all instances of encoder and decoder and is
- started when first instance of encoder or decoder is opened and stopped
- when last instance is closed, thus avoiding per instance polling and saving
- CPU bandwidth.
+ The timer is shared across all instances of encoders and decoders and is
+ started when the first instance of an encoder or decoder is opened and
+ stopped when the last instance is closed, thus avoiding per instance
+ polling and saving CPU bandwidth. As VPU driver manages this instance
+ related tracking and synchronization, the aforementioned shared timer
+ related polling logic is implemented within the VPU driver itself. This
+ scheme may also be useful in general too (even if irq is present) for
+ non-realtime multi-instance VPU use-cases (for e.g 32 instances of VPU
+ being run together) where system is running already under high interrupt
+ load and switching to polling may help mitigate this as the polling thread
+ is shared across all the VPU instances.
- hrtimer callback is called with 5ms polling interval while any of the
- encoder/decoder instances are running to check the interrupt status as
- being done in irq handler.
+ Hrtimer is chosen for polling here as it provides precise timing and
+ scheduling and the API seems better suited for periodic polling task such
+ as this. As a general rule of thumb,
- Based on above interrupt status, use a worker thread to iterate over the
- interrupt status for each instance and send completion event as being done
- in irq thread function.
+ Worst case latency with hrtimer = Actual latency (achievable with irq)
+ + Polling interval
+
+ NOTE (the meaning of terms used above is as follows):
+ - Latency: Time taken to process one frame
+ - Actual Latency : Time taken by hardware to process one frame and signal
+ it to OS (i.e. if latency that was possible to achieve if irq line was
+ present)
+
+ There is a trade-off between latency and CPU usage when deciding the value
+ for polling interval. With aggressive polling intervals (i.e. going with
+ even lesser values) the CPU usage increases although worst case latencies
+ get better. On the contrary, with greater polling intervals worst case
+ latencies will increase although the CPU usage will decrease.
+
+ The 5ms offered a good balance between the two as we were able to reach
+ close to actual latencies (as achievable with irq) without incurring too
+ much of CPU as seen in below experiments and thus 5ms is chosen as default
+ polling interval.
- Parse for irq number before v4l2 device registration and if not available
- only then, initialize hrtimer and worker thread.
+ - 1x 640x480@25 Encoding using different hrtimer polling intervals [2]
+ - 4x 1080p30 Transcode (File->decode->encode->file) irq vs polling
+ comparison [3]
+ - 1x 1080p Transcode (File->decode->encode->file) irq vs polling comparison
+ [4]
+ - 1080p60 Streaming use-case irq vs polling comparison [5]
+ - 1x 1080p30 sanity decode and encode tests [6]
- Move the core functionality of irq thread function to a separate function
- wave5_vpu_handle_irq so that it can be used by both the worker thread when
- using polling mode and irq thread when using interrupt mode.
+ The polling interval can also be changed using vpu_poll_interval module
+ param in case user want to change it as per their use-case requirement
+ keeping in mind above trade-off.
- Protect hrtimer access and instance list with device specific mutex locks
- to avoid race conditions while different instances of encoder and decoder
- are started together.
+ Based on interrupt status, we use a worker thread to iterate over the
+ interrupt status for each instance and send completion event as being done
+ in irq thread function.
+
+ Move the core functionality of the irq thread function to a separate
+ function wave5_vpu_handle_irq so that it can be used by both the worker
+ thread when using polling mode and irq thread when using interrupt mode.
- Add module param to change polling interval for debug purpose.
+ Protect the hrtimer access and instance list with device specific mutex
+ locks to avoid race conditions while different instances of encoder and
+ decoder are started together.
[1] https://www.ti.com/lit/pdf/spruj16
(Ref: Section 4.2.3.3 Resets, Interrupts, and Clocks)
+ [2] https://gist.github.com/devarsht/ee9664d3403d1212ef477a027b71896c
+ [3] https://gist.github.com/devarsht/3a58b4f201430dfc61697c7e224e74c2
+ [4] https://gist.github.com/devarsht/a6480f1f2cbdf8dd694d698309d81fb0
+ [5] https://gist.github.com/devarsht/44aaa4322454e85e01a8d65ac47c5edb
+ [6] https://gist.github.com/devarsht/2f956bcc6152dba728ce08cebdcebe1d
Signed-off-by: Devarsh Thakkar <devarsht@ti.com>
Tested-by: Jackson Lee <jackson.lee@chipsnmedia.com>
+ ---
+ V2:
+ - Update commit message as suggested in review to give more context
+ on design being chosen and analysis that was done to decide on same
+ - Add Tested-By
+
+ Range diff w.r.t v1 :
+ https://gist.github.com/devarsht/cd6bbb4ba90b0229be4718b7140ef924
## drivers/media/platform/chips-media/wave5/wave5-helper.c ##
@@ drivers/media/platform/chips-media/wave5/wave5-helper.c: int wave5_vpu_release_device(struct file *filp,
@@ drivers/media/platform/chips-media/wave5/wave5-helper.c: int wave5_vpu_release_d
{
struct vpu_instance *inst = wave5_to_vpu_inst(filp->private_data);
+ struct vpu_device *dev = inst->dev;
-+ int ret = 0;
++ int ret;
v4l2_m2m_ctx_release(inst->v4l2_fh.m2m_ctx);
if (inst->state != VPU_INST_STATE_NONE) {
+ u32 fail_res;
+- int ret;
+
+ ret = close_func(inst, &fail_res);
+ if (fail_res == WAVE5_SYSERR_VPU_STILL_RUNNING) {
@@ drivers/media/platform/chips-media/wave5/wave5-helper.c: int wave5_vpu_release_device(struct file *filp,
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment