Last active
March 5, 2024 15:45
vpu_poll_range_diff_v2
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1: 9dff0110f466 ! 1: 10eda94f7fba media: chips-media: wave5: Add hrtimer based polling support | |
@@ Metadata | |
## Commit message ## | |
media: chips-media: wave5: Add hrtimer based polling support | |
- Add support for starting a polling timer in case interrupt is not | |
- available. This helps keep the VPU functional in SoC's such as AM62A, where | |
- the hardware interrupt hookup may not be present due to an SoC errata [1]. | |
+ Add support for starting a polling timer in case an interrupt is not | |
+ available. This helps to keep the VPU functional in SoCs such as AM62A, | |
+ where the hardware interrupt hookup may not be present due to an SoC errata | |
+ [1]. | |
- The timer is shared across all instances of encoder and decoder and is | |
- started when first instance of encoder or decoder is opened and stopped | |
- when last instance is closed, thus avoiding per instance polling and saving | |
- CPU bandwidth. | |
+ The timer is shared across all instances of encoders and decoders and is | |
+ started when the first instance of an encoder or decoder is opened and | |
+ stopped when the last instance is closed, thus avoiding per instance | |
+ polling and saving CPU bandwidth. As VPU driver manages this instance | |
+ related tracking and synchronization, the aforementioned shared timer | |
+ related polling logic is implemented within the VPU driver itself. This | |
+ scheme may also be useful in general too (even if irq is present) for | |
+ non-realtime multi-instance VPU use-cases (for e.g 32 instances of VPU | |
+ being run together) where system is running already under high interrupt | |
+ load and switching to polling may help mitigate this as the polling thread | |
+ is shared across all the VPU instances. | |
- hrtimer callback is called with 5ms polling interval while any of the | |
- encoder/decoder instances are running to check the interrupt status as | |
- being done in irq handler. | |
+ Hrtimer is chosen for polling here as it provides precise timing and | |
+ scheduling and the API seems better suited for periodic polling task such | |
+ as this. As a general rule of thumb, | |
- Based on above interrupt status, use a worker thread to iterate over the | |
- interrupt status for each instance and send completion event as being done | |
- in irq thread function. | |
+ Worst case latency with hrtimer = Actual latency (achievable with irq) | |
+ + Polling interval | |
+ | |
+ NOTE (the meaning of terms used above is as follows): | |
+ - Latency: Time taken to process one frame | |
+ - Actual Latency : Time taken by hardware to process one frame and signal | |
+ it to OS (i.e. if latency that was possible to achieve if irq line was | |
+ present) | |
+ | |
+ There is a trade-off between latency and CPU usage when deciding the value | |
+ for polling interval. With aggressive polling intervals (i.e. going with | |
+ even lesser values) the CPU usage increases although worst case latencies | |
+ get better. On the contrary, with greater polling intervals worst case | |
+ latencies will increase although the CPU usage will decrease. | |
+ | |
+ The 5ms offered a good balance between the two as we were able to reach | |
+ close to actual latencies (as achievable with irq) without incurring too | |
+ much of CPU as seen in below experiments and thus 5ms is chosen as default | |
+ polling interval. | |
- Parse for irq number before v4l2 device registration and if not available | |
- only then, initialize hrtimer and worker thread. | |
+ - 1x 640x480@25 Encoding using different hrtimer polling intervals [2] | |
+ - 4x 1080p30 Transcode (File->decode->encode->file) irq vs polling | |
+ comparison [3] | |
+ - 1x 1080p Transcode (File->decode->encode->file) irq vs polling comparison | |
+ [4] | |
+ - 1080p60 Streaming use-case irq vs polling comparison [5] | |
+ - 1x 1080p30 sanity decode and encode tests [6] | |
- Move the core functionality of irq thread function to a separate function | |
- wave5_vpu_handle_irq so that it can be used by both the worker thread when | |
- using polling mode and irq thread when using interrupt mode. | |
+ The polling interval can also be changed using vpu_poll_interval module | |
+ param in case user want to change it as per their use-case requirement | |
+ keeping in mind above trade-off. | |
- Protect hrtimer access and instance list with device specific mutex locks | |
- to avoid race conditions while different instances of encoder and decoder | |
- are started together. | |
+ Based on interrupt status, we use a worker thread to iterate over the | |
+ interrupt status for each instance and send completion event as being done | |
+ in irq thread function. | |
+ | |
+ Move the core functionality of the irq thread function to a separate | |
+ function wave5_vpu_handle_irq so that it can be used by both the worker | |
+ thread when using polling mode and irq thread when using interrupt mode. | |
- Add module param to change polling interval for debug purpose. | |
+ Protect the hrtimer access and instance list with device specific mutex | |
+ locks to avoid race conditions while different instances of encoder and | |
+ decoder are started together. | |
[1] https://www.ti.com/lit/pdf/spruj16 | |
(Ref: Section 4.2.3.3 Resets, Interrupts, and Clocks) | |
+ [2] https://gist.github.com/devarsht/ee9664d3403d1212ef477a027b71896c | |
+ [3] https://gist.github.com/devarsht/3a58b4f201430dfc61697c7e224e74c2 | |
+ [4] https://gist.github.com/devarsht/a6480f1f2cbdf8dd694d698309d81fb0 | |
+ [5] https://gist.github.com/devarsht/44aaa4322454e85e01a8d65ac47c5edb | |
+ [6] https://gist.github.com/devarsht/2f956bcc6152dba728ce08cebdcebe1d | |
Signed-off-by: Devarsh Thakkar <devarsht@ti.com> | |
Tested-by: Jackson Lee <jackson.lee@chipsnmedia.com> | |
+ --- | |
+ V2: | |
+ - Update commit message as suggested in review to give more context | |
+ on design being chosen and analysis that was done to decide on same | |
+ - Add Tested-By | |
+ | |
+ Range diff w.r.t v1 : | |
+ https://gist.github.com/devarsht/cd6bbb4ba90b0229be4718b7140ef924 | |
## drivers/media/platform/chips-media/wave5/wave5-helper.c ## | |
@@ drivers/media/platform/chips-media/wave5/wave5-helper.c: int wave5_vpu_release_device(struct file *filp, | |
@@ drivers/media/platform/chips-media/wave5/wave5-helper.c: int wave5_vpu_release_d | |
{ | |
struct vpu_instance *inst = wave5_to_vpu_inst(filp->private_data); | |
+ struct vpu_device *dev = inst->dev; | |
-+ int ret = 0; | |
++ int ret; | |
v4l2_m2m_ctx_release(inst->v4l2_fh.m2m_ctx); | |
if (inst->state != VPU_INST_STATE_NONE) { | |
+ u32 fail_res; | |
+- int ret; | |
+ | |
+ ret = close_func(inst, &fail_res); | |
+ if (fail_res == WAVE5_SYSERR_VPU_STILL_RUNNING) { | |
@@ drivers/media/platform/chips-media/wave5/wave5-helper.c: int wave5_vpu_release_device(struct file *filp, | |
} | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment