Created
July 16, 2014 03:16
-
-
Save TimothyGu/21e76c295a620fea3357 to your computer and use it in GitHub Desktop.
BBB's IRC lesson
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
19:39 <BBB> intra prediction is taking edge pixels to predict a block’s content | |
19:39 <Timothy_Gu_> what is a block? | |
19:39 <Timothy_Gu_> a part of a frame? | |
19:39 <Timothy_Gu_> what is a macroblock? a group of blocks? | |
19:40 <BBB> the base subdivision unit of a frame in which the actual video content is coded | |
19:40 <BBB> for h264, 16x16 is the base block unit size | |
19:40 <Timothy_Gu_> ok | |
19:40 <BBB> vp8 same | |
19:40 <BBB> vp9/hevc it’s 64x64 -> 8x8 (with 4x4 support throug some hacks) | |
19:40 <Timothy_Gu_> huh? | |
19:40 <BBB> so to predict a block, you take edge pixels, apply a function over it, and that’s the block’s predicted content | |
19:40 <Timothy_Gu_> where did the 64 come from? | |
19:41 <BBB> vp9/hevc have bigger blocks | |
19:41 <BBB> basically gives better compression for some hd content | |
19:41 <Timothy_Gu_> then how about 8? | |
19:41 <BBB> for complex content | |
19:41 <BBB> i.e. adaptive to content of frame | |
19:41 <BBB> a 64x64 block can divide to 4 32x32 blocks, or be coded as-is | |
19:41 <BBB> 32x32 -> 16x16 -> 8x8 -> (hacky) 4x4 | |
19:42 <Timothy_Gu_> ok, so HEVC allows any of those (64, 32, 16, 8, 4)? | |
19:42 <BBB> yes | |
19:42 <BBB> and the encoder will select what makes most sense given the content | |
19:42 <BBB> for static motion (or no motion), you’ll likely get bigger blocks for hd content | |
19:43 <BBB> for highly complex motion patterns, maybe smaller blocks | |
19:43 <Timothy_Gu_> and then the block's predicted content is compared with the actual content? | |
19:43 <Timothy_Gu_> and a "diff" is coded? | |
19:43 <BBB> right, the difference is coded as transformed coefficients | |
19:43 <BBB> diff = quantize(transform(src[]-pred[])) | |
19:44 <BBB> and that’s your basic video encoder right there | |
19:44 <Timothy_Gu_> how does transform() work? | |
19:44 <BBB> typically just a forward dct | |
19:45 <Timothy_Gu_> how does DCT/Fourier transform work? | |
19:45 -!- jrmuizel [~jrmuizel@192-0-221-116.cpe.teksavvy.com] has joined #ffmpeg-devel | |
19:46 <BBB> I’d just read the wikipedia page :D | |
19:46 <Timothy_Gu_> BTW I asked my homeroom teacher (who teaches AP Stats) once on Fourier transform for a "for dummies" definition, and he was like "oh I remember back in college I did a project on it. I'm pretty sure it is covered in multi-variable calculus" | |
19:47 <BBB> it’s basically a function that finds frequency patterns in the set of difference values | |
19:47 <BBB> so think of the coefficients as weights of sine wave functions | |
19:47 <BBB> (ignore the 2d aspect for now, think just 1d) | |
19:47 <BBB> so if I have an array of 4 diff values double x[4] | |
19:47 <Timothy_Gu_> Then he pulled up the Wikipedia page (¡qué coincidencia!) that has a chart I absolutely don't understand | |
19:48 <Timothy_Gu_> ok continue | |
19:48 <BBB> and their values are like 0, 1, 1, 0 | |
19:48 <BBB> you can see this as a wave function that goes up towards the middle of the line, and down at the edges | |
19:48 <BBB> now imagine that I predefined 4 sine wave functions with different frequencies | |
19:49 <BBB> first DC (i.e. infinite wavelength), and the others something like pi/n wavelength (or maybe 2pi/n, I forgot), where n is 1, 2, 3 | |
19:50 <BBB> then the values (0,1,1,0 in this case, but really any set of values) can also be represented as a vector of 4 multipliers times each of these 3 wave functions (plus the one with infinite wavelength - the one I called dc) | |
19:50 <BBB> in this case, the 0,1,1,0 have an average value of 0.5, so the weight of dc would be 0.5 | |
19:51 <BBB> then after I subtract that, I’m left with -0.5, 0.5, 0.5, -0.5 | |
19:51 <BBB> and the 2pi/2 describes that quite well (if you assume the function is centered at the middle), again with a multiplier of 0.5 (if I assume the peak to be 1) | |
19:51 <BBB> so then the coefficients are 0.5, 0, 0.5 and 0 | |
19:52 <BBB> for typical natural patterns, which are common in video, transforms decrease the amount of information that needs to be coded | |
19:52 <Timothy_Gu_> is transform lossless? | |
19:52 <BBB> the inverse transform is just the opposite, it’s basically multiply the wav functions bu the cofficients (multipliers), add them all up, and you have your original diff back | |
19:53 <BBB> no | |
19:53 <BBB> it can be | |
19:53 <BBB> but dcts typically aren’t | |
19:53 <BBB> vp9’s dht (4x4) is lossless | |
19:53 <BBB> dwt, sorry | |
19:53 <BBB> h264’s dht is maybe lossless? | |
19:53 <BBB> but these are dct approximations that are changed to be less dct-like (thus worse compression) to become lossless | |
19:54 <Timothy_Gu_> ok | |
19:54 <BBB> also quantization is not lossless, so lossless dht/dwt only works for a quantizer of 1.0 (i.e. none) | |
19:55 <BBB> vp8 has no lossless transform | |
19:55 <BBB> I’m honestly not sure about hevc | |
19:56 <BBB> bedtime now, sorry... | |
19:56 <Timothy_Gu_> how did -.5, .5, .5, -.5 become .5, .0, .5, .0 again? | |
19:56 <Timothy_Gu_> last question | |
19:57 <BBB> -.5, .5, .5, -.5 became the third coefficient | |
19:58 <BBB> look at this image: http://en.wikipedia.org/wiki/Discrete_cosine_transform#mediaviewer/File:Dctjpeg.png | |
19:58 <BBB> imagine that each small square boxed in red edges is a 2d sine function | |
19:58 <BBB> just look at the top 8 (in fact, just the left 4 of the top row) | |
19:58 <BBB> imagine now that white is 1.0, and black is -1.0 | |
19:59 <BBB> the topleft one is a sine function where all values are 1.0 | |
20:00 <BBB> the second one is a sine function that goes from 1.0 to -1.0, like 1.0, 0.4, -0.4, -1.0 or so | |
20:00 <Timothy_Gu_> If all the values are the same, is it even a sine function? | |
20:00 <BBB> with infinite wavelength, sure | |
20:00 <Timothy_Gu_> ok | |
20:00 <BBB> (but I mean, it’s a special case, yes) | |
20:01 <BBB> the third one is something like 1.0, -0.7 -0.7, 1.0 | |
20:01 -!- bryno [~b@unaffiliated/bryno] has joined #ffmpeg-devel | |
20:01 <BBB> and the fourth one is … 1.0, -1.0, 1.0, -1.0 | |
20:01 <BBB> makes sense? | |
20:01 <Timothy_Gu_> So... are these "coefficients"? | |
20:02 <BBB> no, these are the base functions | |
20:02 <BBB> the coefficients are the multipliers times each of these base functions to get the diff | |
20:02 <BBB> like a matrix multiplication (in fact, slow implementations of f/idct are like a matrix multiply) | |
20:03 <BBB> so if your diff was 1.0, 1.0, 1.0, 1.0 | |
20:03 <BBB> you only need the first base function to describe this (multiplier=1.0) | |
20:03 <BBB> the multipliers for the other base functions are 0 | |
20:03 <BBB> so your coefficients are then 1.0, 0.0, 0.0, 0.0 | |
20:03 <Timothy_Gu_> oh, now I get it | |
20:03 <BBB> and so for more complex diff patterns you’ll use all 4 base functions with various coefficients | |
20:04 <BBB> the typical effect is that the earlier coefficients are bigger and the later are smaller, so that after quantization, they are zero | |
20:04 <cone-756> ffmpeg.git Michael Niedermayer master:aa1d096d027b: avcodec/snow: only allocate space for edges when encoding | |
20:04 <BBB> which means you can code only nonzero ones and thus save space | |
20:04 <BBB> that’s why transforms help compression | |
20:05 <BBB> instead of coding 4x4 pixels for a 4x4 block, you code only… say, 13, or 7, or even just 2 | |
20:05 <Timothy_Gu_> So if I understood correctly, DCT = avg all the members out -> use a wave form (or some wave forms) to simulate it by "assembling" different basic waves together | |
20:06 <Timothy_Gu_> And then after DCT you quantize ~0 values as 0 | |
20:06 <Timothy_Gu_> ~ as in approximated | |
20:10 <BBB> well, they’re not floating point values | |
20:10 <BBB> they’re fixed point | |
20:11 <BBB> so 1.0 would be like 1000, and 0.1 would be 100 | |
20:11 <BBB> then you use a quantizer | |
20:11 <BBB> say my quantizer is 200 | |
20:11 <BBB> 1000/200=5 | |
20:11 <BBB> 100/200=0 | |
20:11 <BBB> so the 100 fell off the radar | |
20:11 <Timothy_Gu_> ok, in floating point concept mine is correct, right? | |
20:12 <BBB> yeah | |
20:12 <BBB> and that’s your basic fdct | |
20:12 <BBB> idct is the same, but inverse | |
20:12 <BBB> and dequant is just multiply coded coefficients by the quantizer value | |
20:12 <BBB> so 5*200=1000, 0*200=0, etc. | |
20:13 <BBB> I guess now is bedtime :-p |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment