On "Understanding Sources of Inefficiency in General-Purpose Chips"
My problems with the paper: | |
- There is no comparison of resulting video quality. The amount of encode time (and power | |
expended) to produce a H.264 bit stream *dramatically* depends on the desired quality level; | |
e.g. for x264 (state of the art SW encoder, already in 2010 when the paper was written), the | |
difference between the fastest and best quality settings is close to 2 orders of magnitude | |
in both speed and power use. This is not negligible! | |
[NOTE: This is excluding quality-presets like "placebo", which are more demanding still. | |
Even just comparing between different settings usable for real-time encoding, we still have | |
at least an order of magnitude difference.] | |
- They have their encoder, which is apparently based on JM 8.6 (*not* a good encoder!), for | |
the SW implementation they use a H.264 encoder by Intel that I do not know (but running | |
on a P4 2.8GHz), and for the ASIC they have an ASIC from 2006. These are three different | |
impls, at three different quality targets, that are not accounted for in the paper. | |
- You can be fairly certain that the ASIC is targeting reasonable quality and using more or | |
less current algorithms. The same cannot be said for their solution; as a result, we do | |
know how perf/W improved from their changes, but we do not actually know | |
1. how the resulting perf/W actually compares against the ASIC | |
(resulting quality may be way worse, or better, we have no idea.) | |
2. whether the perf/W gains were actually relevant; an efficient HW impl of a sub-par | |
algorithm will beat the corresponding SW version, but how big would the gains be | |
had the SW version (without the added instrs etc.) been better to begin with? | |
I do agree that this kind of HW/SW codesign is interesting. I just wanted to point out | |
that, for the application they've chosen, their perf metrics indicate that they are using | |
subpar algorithms (which are inefficient, but also amenable to a HW implementation | |
that has better perf/W due to lower overhead). This exaggerates the gains they get from | |
specialized instructions in this case. Furthermore, because they do not evaluate the | |
quality of the resulting video (and because both encode time and power scales with the | |
quality of encoding!), their comparisons with the ASIC/SW implementations are essentially | |
meaningless. | |
In short, while I like the idea, I'm very doubtful about the execution, and all the | |
conclusions drawn from it. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment