public
Last active

H.264 & VP8 Quality Comparison And Some Words on Future Video Formats

  • Download Gist
h264-vs-vp8-test.md
Markdown

VP8 vs H.264 - Which One is Better?

So I was reading Hacker News and decided to read the comments in the thread about H.265 being approved. Pretty close to the top was this comment about VP9, Google's future video format. I have some words of my own about it and other future formats at the bottom of this post, but what jumped out from the comment to me was this part:

Many have already implemented VP8 (which is also slightly better than h.264 at this point)

The comparison linked to back up that statement is faulty for several reasons, such as not providing the source material used (hell, he doesn't even name the source material), exact encoding settings used (no, some random profiles are not enough), not providing the resulting encodes, only providing a single image for the whole comparison, not providing encoder logs... I think you get the idea. So I decided to do a quick comparison of my own, and to do it right. Here are the results.

Important Notice

Testing with just a single clip has its downsides - most test clips can benefit from encoders that have features X and/or Y. The clip currently used in this test (park Joy) seems to favor encoders with large block size, and VP8 as a format fares worse in this regard compared to H.264. My original reason for choosing Park Joy was because x264 developer Dark_Shikari said that "it shouldn't bias too heavily towards any one encoder like many of the other standard test clips will", but apparently this isn't so correct (see comments at bottom). If you're interested in technical properties of VP8 compared to H.264, you might be interested in this analysis. It is quite likely that multiple test encodes will not affect the conclusion much, though, seeing as VP8 as a format is weaker than H.264. The situation could be different if H.264 didn't have such a top-notch encoder as x264 or if we were only comparing to H.264 Baseline, but this comparison is about maximum[1] quality obtainable with each format (using the best encoders available).

Test Details

  • Test Clip: park_joy_1080p50.y4m - Download here
  • Encode Target: "Best quality" 2-pass 13600 kbps encode at 1080p50.
  • Test Platform: i7-2600k @ ~4.4 GHz, Windows 7 64-bit (other details don't really matter)
  • H.264 Encoder: x264 r2245, 64-bit - Download here
  • VP8 Encoder: vpxenc, 64-bit (libvpx 1.1.0) - Download here

Encoding Parameters

vpxenc -w 1920 -h 1080 --fps=50/1 --best -p 2 --fpf=vp8.stats --target-bitrate=13600 --end-usage=vbr --auto-alt-ref=1 --minsection-pct=5 --maxsection-pct=800 --lag-in-frames=16 --kf-min-dist=0 --kf-max-dist=250 --static-thresh=0 --drop-frame=0 --min-q=0 --max-q=60 -t 7 -o park_joy_vp8.webm park_joy_1080p50.y4m

x264 --input-res 1920x1080 --fps 50 --preset veryslow --tune film --pass 1 --stats h264.stats --bitrate 13600 -i 1 -I 250 -o NUL park_joy_1080p50.y4m
x264 --input-res 1920x1080 --fps 50 --preset veryslow --tune film --pass 2 --stats h264.stats --bitrate 13600 -i 1 -I 250 -o park_joy_h264.mkv park_joy_1080p50.y4m

Some Notes

  • A little background on me: I have about five years of experience in processing and encoding digital video.
  • The vpxenc settings are based on the "2-Pass Best Quality VBR Encoding" settings found on the WebM homepage here. x264 settings are based on my own experience (though they're not really anything special in this case).
  • I decided to use --preset veryslow instead of --preset placebo for x264 because placebo is really damn slow and so that both encoders would do a fast first pass (--preset placebo disables this).
  • As you might deduct from the command line parameters above, x264's preset and tune system makes things quite convenient. When I do actual encoding work, I usually start with --preset veryslow --tune [something] and only tweak a few settings beyond that.
  • VP8 encoding used about ~80% of the CPU at the recommended threads setting (cores - 1) on the second pass.
  • H.264 encoding used about ~95% of the CPU (with threads automatically set to 12 by x264) on the second pass.

The Results

  • VP8 [Download] - 13144 kbps (16436KB), first pass: 25.94 fps, second pass: 2.40 fps
  • H.264 [Download] - 13498 kbps (16483KB), first pass: 39.12 fps, second pass: 4.93 fps (full command line output below)
  • Screenshot Comparison - As the clip is quite similar throughout, I only took one pair of shots - if you want to see more, download and the watch the encoded videos yourself.

Conclusion

H.264 encoded with the latest x264 offers notably higher quality while encoding almost twice as fast as VP8 encoded with the latest libvpx offering. If you see a test claiming that VP8 is better than H.264 quality-wise, it is very likely that the comparison was done poorly, either by mistake or intentionally. I very much recommend reading this article by x264 developer Jason Garrett-Glaser on the subject.

On H.265, VP9 and Other Future Formats

I am very much looking forward to what future brings us in the field of video formats. The keyword here is future - even though H.265 is "approved" now, it'll be a while before we get actually usable encoding and decoding implementations (reference encoders are hardly great examples of what the format will be truly capable of - implementation matters a lot[2]). Other formats that have been drummed up recently, namely On2's (Google's) VP9 and Xiph's Daala, are certainly interesting as well, but remember to take any wild claims with a large grain of salt: On2 is pretty infamous when it comes to overmarketing their products (in case of VP9, a while back they said VP9 is "only ~7% behind H.265 (when compared to HEVC JM)" - the JM here means reference encoder) and Daala doesn't have much beyond its big words going for it at the moment.

Seeing as both also aim to be royalty-free (so essentially "patent-free"), beating H.265 will be no easy task, considering how much of a patent minefield the field of video encoding is. While I'd love a patent- and royalty-free video format to offer the highest quality compression you can find, I wouldn't hold my breath for one.

Any Not-H.265 format will also find it much more hard to get hardware support considering that H.265 is an "industry standard" - so even if they offered better quality than H.265, they could end up struggling when it comes to widespread adoption. If they can't offer better quality, then even more so. The most likely scenario will likely end up being very similar to the situation with H.264, VP8 and Theora today.

[1] Technically we could go even higher with H.264 than in this test by using something like the High 10 Profile (and placebo preset in x264), but most widespread "high quality" H.264 usage is limited to High Profile, so that's what we're using here.

[2] For those wondering why I didn't link to a test including the reference encoder, I couldn't really find any. Most likely because the reference encoder is very, very slow - x264 is approx. 50 times faster than it at the same quality level!

Discuss this post on Hacker News here.

output.log
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
R:\Work\enctest>vpxenc -w 1920 -h 1080 --fps=50/1 --best -p 2 --fpf=vp8.stats --target-bitrate=13600 --end-usage=vbr --auto-alt-ref=1 --minsection-pct=5 --maxsection-pct=800 --lag-in-frames=16 --kf-min-dist=0 --kf-max-dist=250 --static-thresh=0 --drop-frame=0 --min-q=0 --max-q=60 -t 7 -o park_joy_vp8.webm park_joy_1080p50.y4m
Pass 1/2 frame 500/501 72144B 1154b/f 57715b/s 19272 ms (25.94 fps)←[K
Pass 2/2 frame 500/530 16678501B ←[K 10904F 10678F 7585F 5863F 254F
Pass 2/2 frame 500/545 16824599B 269193b/f 13459679b/s 207996 ms (2.40 fps)←[K
 
R:\Work\enctest>x264 --input-res 1920x1080 --fps 50 --preset veryslow --tune film --pass 1 --stats h264.stats --bitrate 13600 -i 1 -I 250 -o NUL park_joy_1080p50.y4m
y4m [info]: 1920x1080p 1:1 @ 50/1 fps (cfr)
y4m [info]: color matrix: undef
x264 [info]: using SAR=1/1
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 FastShuffle SSE4.2
x264 [info]: profile Main, level 4.2
x264 [info]: started at Sun Jan 27 04:24:57 2013
x264 [info]: frame I:2 Avg QP:26.25 size:341482
x264 [info]: frame P:128 Avg QP:34.36 size: 81953
x264 [info]: frame B:370 Avg QP:38.21 size: 14194
x264 [info]: consecutive B-frames: 0.4% 0.4% 12.6% 78.4% 7.0% 1.2% 0.0% 0.0% 0.0%
x264 [info]: mb I I16..4: 23.2% 0.0% 76.8%
x264 [info]: mb P I16..4: 13.5% 0.0% 0.0% P16..4: 62.8% 0.0% 0.0% 0.0% 0.0% skip:23.7%
x264 [info]: mb B I16..4: 1.4% 0.0% 0.0% B16..8: 15.1% 0.0% 0.0% direct:10.1% skip:73.4% L0:24.7% L1:41.0% BI:34.3%
x264 [info]: final ratefactor: 26.85
x264 [info]: direct mvs spatial:98.1% temporal:1.9%
x264 [info]: coded y,uvDC,uvAC intra: 79.2% 66.0% 43.9% inter: 15.8% 8.3% 1.4%
x264 [info]: i16 v,h,dc,p: 22% 14% 49% 15%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 13% 16% 16% 9% 9% 7% 10% 8% 12%
x264 [info]: i8c dc,h,v,p: 62% 15% 18% 5%
x264 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
x264 [info]: kb/s:13139.92
 
encoded 500 frames, 39.12 fps, 13139.92 kb/s
x264 [info]: ended at Sun Jan 27 04:25:10 2013
x264 [info]: encoding duration 0:00:13
 
R:\Work\enctest>x264 --input-res 1920x1080 --fps 50 --preset veryslow --tune film --pass 2 --stats h264.stats --bitrate 13600 -i 1 -I 250 -o park_joy_h264.mkv park_joy_1080p50.y4m
y4m [info]: 1920x1080p 1:1 @ 50/1 fps (cfr)
y4m [info]: color matrix: undef
x264 [info]: using SAR=1/1
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 FastShuffle SSE4.2
x264 [info]: profile High, level 5.1
x264 [info]: started at Sun Jan 27 04:25:10 2013
x264 [info]: frame I:2 Avg QP:31.40 size:220974
x264 [info]: frame P:128 Avg QP:35.35 size: 84726
x264 [info]: frame B:370 Avg QP:39.72 size: 15097
x264 [info]: consecutive B-frames: 0.4% 0.4% 12.6% 78.4% 7.0% 1.2% 0.0% 0.0% 0.0%
x264 [info]: mb I I16..4: 17.5% 73.4% 9.1%
x264 [info]: mb P I16..4: 0.4% 3.7% 0.4% P16..4: 35.0% 20.4% 16.9% 1.5% 0.3% skip:21.4%
x264 [info]: mb B I16..4: 0.0% 0.1% 0.0% B16..8: 40.9% 4.6% 1.8% direct: 5.1% skip:47.5% L0:36.2% L1:53.3% BI:10.4%
x264 [info]: 8x8 transform intra:80.4% inter:43.9%
x264 [info]: direct mvs spatial:96.8% temporal:3.2%
x264 [info]: coded y,uvDC,uvAC intra: 80.1% 79.9% 62.3% inter: 16.9% 10.3% 3.0%
x264 [info]: i16 v,h,dc,p: 19% 18% 10% 53%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 9% 6% 5% 11% 16% 13% 15% 11% 15%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 7% 5% 3% 11% 15% 12% 16% 12% 19%
x264 [info]: i8c dc,h,v,p: 46% 20% 16% 18%
x264 [info]: Weighted P-Frames: Y:98.4% UV:0.0%
x264 [info]: ref P L0: 66.3% 8.2% 12.3% 2.5% 2.3% 1.6% 1.6% 0.8% 0.8% 0.6% 0.7% 0.5% 0.5% 0.5% 0.5% 0.3%
x264 [info]: ref B L0: 95.1% 2.8% 0.8% 0.3% 0.2% 0.2% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.0% 0.0% 0.0%
x264 [info]: ref B L1: 98.8% 1.2%
x264 [info]: kb/s:13498.29
 
encoded 500 frames, 4.93 fps, 13498.93 kb/s
x264 [info]: ended at Sun Jan 27 04:26:52 2013
x264 [info]: encoding duration 0:01:42

Daala doesn't have much beyond its big words going for it at the moment.

What 'big words' are you referring to?

(reasons for choosing this clip can be found here - my test here is largely modeled after the linked test)

The provided rational is not very good. Many other lossless clips are easily available from the same site you got this one from, but none of them are especially representative alone. In fact these clips are interesting for codec development in part because they exercise different things. (And I feel I'm an authority saying that considering that I'm one of the maintainers of that video repository)

Testing on a single clip is generally pretty poor technique. I haven't been tracking the state of the VP8 encoder closely but I would be surprised if a VP8 encoder ever outperformed High Profile h264 (as opposed to baseline) for Park Joy. The texture covering the whole image is basically the use case for having large transforms: It does not intra-predict well, and large transforms make it easier to preserve some amount of the original energy.

Basically— if someone had asked me yesterday which clip I'd use to show any codec with larger than 4x4 blocks outperforming a codec with only 4x4 blocks, I would have recommend parkjoy (or potentially parkrun or in_to_tree). So using it as your only point of comparison is pretty limiting... and even if you'd get the same results on other clips, its unfortunate to see you wasted your time on a test which had a forgone conclusion.

(This can demonstrated by using Theora as an intraframe coder and coding one frame of parkjoy, you'll likely get a result that looks better for a comparable rate than VP8... but this doesn't say much about Theora vs vp8 overall)

What 'big words' are you referring to?

These ones found here:

The goal of the project is to provide a free to implement, use and distribute digital media format and reference implementation with technical performance superior to h.265.

I've seen less-informed people on the internet claim that "Daala is better than H.265!" as if it was a fact based on pretty much these words alone.

The provided rational is not very good.

I'm no expert on the tech behind video encoding, but I'd expect that if Dark_Shikari says that "[Park Joy] shouldn't bias too heavily towards any one encoder like many of the other standard test clips will" then it generally shouldn't be that far from the truth.

Testing on a single clip is generally pretty poor technique.

Yes, yes, it's not representative of the whole spectrum of all video material ever, but this comparison was done on a whim at ~2AM and I wanted to be done in a couple of hours. The main problem in that regard was that downloading huge source files takes time - getting the source file for this comparison took much longer than the entire encoding process.

If you can suggest some other test clips (preferably as "general" as possible, and HD), I could probably add a couple more test encodes to this comparison. I highly doubt that they would change the conclusion in any notable manner, though!

I've seen less-informed people on the internet claim that "Daala is better than H.265!"

Can you point me to some of these claims? I'd like to go correct them as— while I admire the enthusiasm— they're patently nonsense at this time. (You couldn't even compare it, as Daala is not a codec yet)

then it generally shouldn't be that far from the truth.

It actually is— everyone makes mistakes. I'm not sure that a short bias free clip can exist, but even if on could— parkjoy wouldn't be in the running.

The main problem in that regard was that downloading huge source files takes time - getting the source file for this comparison took much longer than entire encoding process.

That doing something well is hard does not really justify doing it poorly. Yes, it's hard, and thats one reason you don't see more reasonable comparisons out there... but bad information for a comparison can be worse than no information. This kind of thing just provides more fodder for "less-informed people" to continue to be less informed. You can do better.

Obviously I'm not going to fault you for just doing what you find interesting— but you're promoting this on hacker news as though it resolves some question, and I don't think that it does— or at least not any better than the cruddy comparison it was responding to (http://pacoup.com/2012/12/20/vp8-webm-vs-h-264-mp4-december-2012/) does.

If you can suggest some other test clips (preferably as "general" as possible, and HD), I could probably add a couple more test encodes to this comparison

As mentioned, there is a whole collection of lossless clips from the site you got parkjoy. I'd generally suggest using all of them. The HD clips there generally under-represent low noise synthetic content, unless you include sintel and/or big buck bunny (and if so, take care to not over-represent them). On the plus side— once you have downloaded a bunch of lossless clips, you have them— and you can test over and over again. Perhaps contribute to the development of VP9 or Daala? People to try things out and report interesting findings are always in short supply. :)

Can you point me to some of these claims?

The discussions I'm referring to are gone by now due to fast expiration and pruning. I've done my best to correct people in them, though.

It actually is— everyone makes mistakes.

So in what way exactly? You mentioned larger than 4x4 blocks and large transforms - based on Dark_Shikari's analysis of VP8 VP8 has both former and latter (though it says VP8 is technically worse in both categories compared to H.264, so I guess there's that - but well, doesn't that also tell it's own tale in regards to H.264 being better than VP8?).

As mentioned, there is a whole collection of lossless clips from the site you got parkjoy. I'd generally suggest using all of them.

That's not very helpful, you know, especially since I'm doing a purely visual comparison here with no PSNR/SSIM - 28 visual comparisons would be very exhausting for viewers (even more so if downloading and watching the actual videos).

you're promoting this on hacker news as though it resolves some question, and I don't think that it does— or at least not any better than the cruddy comparison it was responding to does.

Even if I tested with a single clip, that's just rude.

VP8 has both former and latter

It does not.

That's not very helpful, you know, especially since I'm doing a purely visual comparison

So look for parts where they do better or worse— and point them out. It would at least let you discover things like this clip or that clip is really good with one encoder or the other.

But if you're going to purport to report a winner then there really isn't a replacement for a lot of work. :(

Even if I tested with a single clip, that's just rude.

I apologize if I've insulted you— it wasn't my intention. I'm expressing my earnest opinion. One single clip comparison says X is better, one single clip comparison says Y is better. The elephant in the lab is "maybe the clip determined the outcome?" and nothing has been done to answer that question in either case.

The elephant in the lab is "maybe the clip determined the outcome?"

There's still the quite large difference that you can actually verify the results of my test, whereas you can't with the other. Not to mention that it raises many warning flags for an unreliable comparison (as listed in the post).

It's the year 2013. The first tcp/ip packet was sent more than 43 years ago. Microchips and color displays have been around for nearly 40 years now. The first HTTP request was answered around 25 years ago. The first web page was served almost precisely 20 years ago.

Computing has become nearly free. The price you pay to execute one instruction on a microchip has fallen into the bottomless. TCP/IP is free to use. Nobody would dare ship a computer without TCP/IP and nobody gets a dime from for it. HTTP is free to use, same as TCP/IP. The technology to render a web page is free to use. All of these things have become subject of commoditization in a short period of time. And we, the society, the people, are much, much richer for it.

We still cannot encode, decode and play moving pictures as a matter of course, free of cost, built into everything we want however we want it. We will still not be able to do so in the year 2050, or the year 2100 or any other year hereafter.

Why? Because corruption, that's why. A small agglomerate of criminals have banded together to deprive the public of huge economic value by preventing video to become prey to commoditization by means of exploiting every conceivable IP legislation and copious lobbying.

If everybody behaved as criminally egomaniacal as the MPEG does, this conversion here would not be possible. We would not have the internet. We would not have the web. We would perhaps not even have personal computers. We would probably still be using cardbord punchcards to operate mainframes, if that at all.

The MPEG founded as a "not for profit" standards body with a couple dozen patents under their belt has grown to be a pawn of big industry interests with a vast patent portfolio spanning thousands (if not tens of thousands) of patents and the wealth and power to utterly obliterate anybody who is not playing by their rules. It has become a fiefdom designed for the sole purpose to delay any progress anybody might make on making video commonplace.

Any Not-H.265 format will also find it much more hard to get hardware support considering that H.265 is an "industry standard" - so even if they offered better quality than H.265, they could end up struggling when it comes to widespread adoption.

http://techcrunch.com/2014/01/02/googles-vp9-video-codec-gets-backing-from-arm-nvidia-sony-and-others-gives-4k-video-streaming-a-fighting-chance/

:^)

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.