Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
On "Understanding Sources of Inefficiency in General-Purpose Chips"
My problems with the paper:
- There is no comparison of resulting video quality. The amount of encode time (and power
expended) to produce a H.264 bit stream *dramatically* depends on the desired quality level;
e.g. for x264 (state of the art SW encoder, already in 2010 when the paper was written), the
difference between the fastest and best quality settings is close to 2 orders of magnitude
in both speed and power use. This is not negligible!
[NOTE: This is excluding quality-presets like "placebo", which are more demanding still.
Even just comparing between different settings usable for real-time encoding, we still have
at least an order of magnitude difference.]
- They have their encoder, which is apparently based on JM 8.6 (*not* a good encoder!), for
the SW implementation they use a H.264 encoder by Intel that I do not know (but running
on a P4 2.8GHz), and for the ASIC they have an ASIC from 2006. These are three different
impls, at three different quality targets, that are not accounted for in the paper.
- You can be fairly certain that the ASIC is targeting reasonable quality and using more or
less current algorithms. The same cannot be said for their solution; as a result, we do
know how perf/W improved from their changes, but we do not actually know
1. how the resulting perf/W actually compares against the ASIC
(resulting quality may be way worse, or better, we have no idea.)
2. whether the perf/W gains were actually relevant; an efficient HW impl of a sub-par
algorithm will beat the corresponding SW version, but how big would the gains be
had the SW version (without the added instrs etc.) been better to begin with?
I do agree that this kind of HW/SW codesign is interesting. I just wanted to point out
that, for the application they've chosen, their perf metrics indicate that they are using
subpar algorithms (which are inefficient, but also amenable to a HW implementation
that has better perf/W due to lower overhead). This exaggerates the gains they get from
specialized instructions in this case. Furthermore, because they do not evaluate the
quality of the resulting video (and because both encode time and power scales with the
quality of encoding!), their comparisons with the ASIC/SW implementations are essentially
In short, while I like the idea, I'm very doubtful about the execution, and all the
conclusions drawn from it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment