yohhoy/av1-codec.md

## av1-codec.md

      
    Raw
  

              av1-codec.md
            
          
    AV1 coding scheme


Arithmetic coding

multi-symbols (up to 16 values)
Coefficients coding: lv_map
CDF: Cumulative distribution function

CDF update (at the end of frame)
CDF update (adaptive per symbol)


Image blocking

Tile(Row x Col)
Superblock(64x64/128x128)
Quadtree, 1:2/2:1 Rectangular, 1:4/4:1 Rectangular (down to 4x4)
Segmentation map (up to 8 segs)


Intra prediction

DC predicator
Directional mode=V(90),H(180),45,135,113,157,203,67

Angle delta=3-deg * [-3, +3] (total=8x7=56 angles)


Smooth-DC, Smooth-V, Smooth-H
Recursive intra predication
Paeth predicator
CfL: Chroma from Luma
Intra BC(block copy) as screen content tools

force int-mv, delay four 64x64 superblock


Palette Mode as screen content tools (up to 8 colors)


Inter prediction

Motion vector precision(quater-pel or eighth-pel)

Interger-mv as screen content tools


Global motion
Motion vector predicaton (4 mvs candidate)
Warped motion models(translation/rotation+symmetric zoom+translation/general affine)
OBMC: Overlapped Block Motion Compensation
Compound prediction (has 2-ref&mvs)

Intra-Inter compound
Wedge mask(16 directions)
jnt(?) use distance weight


Motion field estimation


RefFrame management

7 reference frames + Curr frame

Intra frame(ref_frame=0)
Last frame(ref_frame=1/2/3)
Golden frame(ref_frame=4)
Backward reference(BWD) frame(ref_frame=5)
Alt-ref frame(ref_frame=6/7)


Residual Transform

DCT: Discrete Cosine Transform
ADST: Asymmetric Discrete Sine Transform
WHT: Walsh Hadamard Transform
19 Tx sizes=4x4/8x8/16x16/32x32/64x64(1:1), 4x8/8x4/8x16/16x8/16x32/32x16/32x64/64x32(1:2), 4x16/16x4/8x32/32x8/16x64/64x16(1:4)
Quantize/Dequantize
Quantize matrix (32x32/32x16/16x32, subsample for other size)
Lossless coding (per segment)


In-loop filtering

CDEF: Constrained Directional Enhancement Filter

cdef_params: damping, bits of idx, 1st/2nd filter strength (per frame)
cdef_idx (per 64x64 block)
8 direction


Loop Restortion

lr_params: type, unit size=64x64/128x128/256x256 (per frame)
Wiener filter: 7-tap for luma, 5-tap for chroma
Self guided restortion(SGR) projection


CurrFrame = loop_filter(CurrFrame)
CdefFrame = CDEF(CurrFrame)
UpscaledCdefFrame = upscaling(CdefFrame)
UpscaledCurrFrame = upscaling(CurrFrame)
LrFrame = loop_restortion(UpscaledCurrFrame, UpscaledCdefFrame)


Post filtering

OutY/U/V = LrFrame
Film grain synthesis
Render size (informative hint)


OBU(Open Bitstream Unit) types


sequence header

profile
bitdepth(8/10/12)
chroma format(YUV400/420/422/444)
operating point
frame size
superblock size(64x64/128x128)
enable coding tools


frame header

invoke decode_frame() if show_existing_frame


tile group

invoke decode_frame() after last tile


frame

frame header OBU + tile group OBU


temporal delimiter
redundant frame header
metadata

private data(any)
HDR content light level
HDR mastering display color volume
scalability structure


padding