Skip to content

Instantly share code, notes, and snippets.

@TimothyGu
Created July 16, 2014 03:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save TimothyGu/21e76c295a620fea3357 to your computer and use it in GitHub Desktop.
Save TimothyGu/21e76c295a620fea3357 to your computer and use it in GitHub Desktop.
BBB's IRC lesson
19:39 <BBB> intra prediction is taking edge pixels to predict a block’s content
19:39 <Timothy_Gu_> what is a block?
19:39 <Timothy_Gu_> a part of a frame?
19:39 <Timothy_Gu_> what is a macroblock? a group of blocks?
19:40 <BBB> the base subdivision unit of a frame in which the actual video content is coded
19:40 <BBB> for h264, 16x16 is the base block unit size
19:40 <Timothy_Gu_> ok
19:40 <BBB> vp8 same
19:40 <BBB> vp9/hevc it’s 64x64 -> 8x8 (with 4x4 support throug some hacks)
19:40 <Timothy_Gu_> huh?
19:40 <BBB> so to predict a block, you take edge pixels, apply a function over it, and that’s the block’s predicted content
19:40 <Timothy_Gu_> where did the 64 come from?
19:41 <BBB> vp9/hevc have bigger blocks
19:41 <BBB> basically gives better compression for some hd content
19:41 <Timothy_Gu_> then how about 8?
19:41 <BBB> for complex content
19:41 <BBB> i.e. adaptive to content of frame
19:41 <BBB> a 64x64 block can divide to 4 32x32 blocks, or be coded as-is
19:41 <BBB> 32x32 -> 16x16 -> 8x8 -> (hacky) 4x4
19:42 <Timothy_Gu_> ok, so HEVC allows any of those (64, 32, 16, 8, 4)?
19:42 <BBB> yes
19:42 <BBB> and the encoder will select what makes most sense given the content
19:42 <BBB> for static motion (or no motion), you’ll likely get bigger blocks for hd content
19:43 <BBB> for highly complex motion patterns, maybe smaller blocks
19:43 <Timothy_Gu_> and then the block's predicted content is compared with the actual content?
19:43 <Timothy_Gu_> and a "diff" is coded?
19:43 <BBB> right, the difference is coded as transformed coefficients
19:43 <BBB> diff = quantize(transform(src[]-pred[]))
19:44 <BBB> and that’s your basic video encoder right there
19:44 <Timothy_Gu_> how does transform() work?
19:44 <BBB> typically just a forward dct
19:45 <Timothy_Gu_> how does DCT/Fourier transform work?
19:45 -!- jrmuizel [~jrmuizel@192-0-221-116.cpe.teksavvy.com] has joined #ffmpeg-devel
19:46 <BBB> I’d just read the wikipedia page :D
19:46 <Timothy_Gu_> BTW I asked my homeroom teacher (who teaches AP Stats) once on Fourier transform for a "for dummies" definition, and he was like "oh I remember back in college I did a project on it. I'm pretty sure it is covered in multi-variable calculus"
19:47 <BBB> it’s basically a function that finds frequency patterns in the set of difference values
19:47 <BBB> so think of the coefficients as weights of sine wave functions
19:47 <BBB> (ignore the 2d aspect for now, think just 1d)
19:47 <BBB> so if I have an array of 4 diff values double x[4]
19:47 <Timothy_Gu_> Then he pulled up the Wikipedia page (¡qué coincidencia!) that has a chart I absolutely don't understand
19:48 <Timothy_Gu_> ok continue
19:48 <BBB> and their values are like 0, 1, 1, 0
19:48 <BBB> you can see this as a wave function that goes up towards the middle of the line, and down at the edges
19:48 <BBB> now imagine that I predefined 4 sine wave functions with different frequencies
19:49 <BBB> first DC (i.e. infinite wavelength), and the others something like pi/n wavelength (or maybe 2pi/n, I forgot), where n is 1, 2, 3
19:50 <BBB> then the values (0,1,1,0 in this case, but really any set of values) can also be represented as a vector of 4 multipliers times each of these 3 wave functions (plus the one with infinite wavelength - the one I called dc)
19:50 <BBB> in this case, the 0,1,1,0 have an average value of 0.5, so the weight of dc would be 0.5
19:51 <BBB> then after I subtract that, I’m left with -0.5, 0.5, 0.5, -0.5
19:51 <BBB> and the 2pi/2 describes that quite well (if you assume the function is centered at the middle), again with a multiplier of 0.5 (if I assume the peak to be 1)
19:51 <BBB> so then the coefficients are 0.5, 0, 0.5 and 0
19:52 <BBB> for typical natural patterns, which are common in video, transforms decrease the amount of information that needs to be coded
19:52 <Timothy_Gu_> is transform lossless?
19:52 <BBB> the inverse transform is just the opposite, it’s basically multiply the wav functions bu the cofficients (multipliers), add them all up, and you have your original diff back
19:53 <BBB> no
19:53 <BBB> it can be
19:53 <BBB> but dcts typically aren’t
19:53 <BBB> vp9’s dht (4x4) is lossless
19:53 <BBB> dwt, sorry
19:53 <BBB> h264’s dht is maybe lossless?
19:53 <BBB> but these are dct approximations that are changed to be less dct-like (thus worse compression) to become lossless
19:54 <Timothy_Gu_> ok
19:54 <BBB> also quantization is not lossless, so lossless dht/dwt only works for a quantizer of 1.0 (i.e. none)
19:55 <BBB> vp8 has no lossless transform
19:55 <BBB> I’m honestly not sure about hevc
19:56 <BBB> bedtime now, sorry...
19:56 <Timothy_Gu_> how did -.5, .5, .5, -.5 become .5, .0, .5, .0 again?
19:56 <Timothy_Gu_> last question
19:57 <BBB> -.5, .5, .5, -.5 became the third coefficient
19:58 <BBB> look at this image: http://en.wikipedia.org/wiki/Discrete_cosine_transform#mediaviewer/File:Dctjpeg.png
19:58 <BBB> imagine that each small square boxed in red edges is a 2d sine function
19:58 <BBB> just look at the top 8 (in fact, just the left 4 of the top row)
19:58 <BBB> imagine now that white is 1.0, and black is -1.0
19:59 <BBB> the topleft one is a sine function where all values are 1.0
20:00 <BBB> the second one is a sine function that goes from 1.0 to -1.0, like 1.0, 0.4, -0.4, -1.0 or so
20:00 <Timothy_Gu_> If all the values are the same, is it even a sine function?
20:00 <BBB> with infinite wavelength, sure
20:00 <Timothy_Gu_> ok
20:00 <BBB> (but I mean, it’s a special case, yes)
20:01 <BBB> the third one is something like 1.0, -0.7 -0.7, 1.0
20:01 -!- bryno [~b@unaffiliated/bryno] has joined #ffmpeg-devel
20:01 <BBB> and the fourth one is … 1.0, -1.0, 1.0, -1.0
20:01 <BBB> makes sense?
20:01 <Timothy_Gu_> So... are these "coefficients"?
20:02 <BBB> no, these are the base functions
20:02 <BBB> the coefficients are the multipliers times each of these base functions to get the diff
20:02 <BBB> like a matrix multiplication (in fact, slow implementations of f/idct are like a matrix multiply)
20:03 <BBB> so if your diff was 1.0, 1.0, 1.0, 1.0
20:03 <BBB> you only need the first base function to describe this (multiplier=1.0)
20:03 <BBB> the multipliers for the other base functions are 0
20:03 <BBB> so your coefficients are then 1.0, 0.0, 0.0, 0.0
20:03 <Timothy_Gu_> oh, now I get it
20:03 <BBB> and so for more complex diff patterns you’ll use all 4 base functions with various coefficients
20:04 <BBB> the typical effect is that the earlier coefficients are bigger and the later are smaller, so that after quantization, they are zero
20:04 <cone-756> ffmpeg.git Michael Niedermayer master:aa1d096d027b: avcodec/snow: only allocate space for edges when encoding
20:04 <BBB> which means you can code only nonzero ones and thus save space
20:04 <BBB> that’s why transforms help compression
20:05 <BBB> instead of coding 4x4 pixels for a 4x4 block, you code only… say, 13, or 7, or even just 2
20:05 <Timothy_Gu_> So if I understood correctly, DCT = avg all the members out -> use a wave form (or some wave forms) to simulate it by "assembling" different basic waves together
20:06 <Timothy_Gu_> And then after DCT you quantize ~0 values as 0
20:06 <Timothy_Gu_> ~ as in approximated
20:10 <BBB> well, they’re not floating point values
20:10 <BBB> they’re fixed point
20:11 <BBB> so 1.0 would be like 1000, and 0.1 would be 100
20:11 <BBB> then you use a quantizer
20:11 <BBB> say my quantizer is 200
20:11 <BBB> 1000/200=5
20:11 <BBB> 100/200=0
20:11 <BBB> so the 100 fell off the radar
20:11 <Timothy_Gu_> ok, in floating point concept mine is correct, right?
20:12 <BBB> yeah
20:12 <BBB> and that’s your basic fdct
20:12 <BBB> idct is the same, but inverse
20:12 <BBB> and dequant is just multiply coded coefficients by the quantizer value
20:12 <BBB> so 5*200=1000, 0*200=0, etc.
20:13 <BBB> I guess now is bedtime :-p
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment