Skip to content

Instantly share code, notes, and snippets.

@yohhoy
Last active August 14, 2019 09:50
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save yohhoy/b208944a399026ed97e39fee98916eef to your computer and use it in GitHub Desktop.
Save yohhoy/b208944a399026ed97e39fee98916eef to your computer and use it in GitHub Desktop.
AV1 video codec memorandum

AV1 coding scheme

  • Arithmetic coding
    • multi-symbols (up to 16 values)
    • Coefficients coding: lv_map
    • CDF: Cumulative distribution function
      • CDF update (at the end of frame)
      • CDF update (adaptive per symbol)
  • Image blocking
    • Tile(Row x Col)
    • Superblock(64x64/128x128)
    • Quadtree, 1:2/2:1 Rectangular, 1:4/4:1 Rectangular (down to 4x4)
    • Segmentation map (up to 8 segs)
  • Intra prediction
    • DC predicator
    • Directional mode=V(90),H(180),45,135,113,157,203,67
      • Angle delta=3-deg * [-3, +3] (total=8x7=56 angles)
    • Smooth-DC, Smooth-V, Smooth-H
    • Recursive intra predication
    • Paeth predicator
    • CfL: Chroma from Luma
    • Intra BC(block copy) as screen content tools
      • force int-mv, delay four 64x64 superblock
    • Palette Mode as screen content tools (up to 8 colors)
  • Inter prediction
    • Motion vector precision(quater-pel or eighth-pel)
      • Interger-mv as screen content tools
    • Global motion
    • Motion vector predicaton (4 mvs candidate)
    • Warped motion models(translation/rotation+symmetric zoom+translation/general affine)
    • OBMC: Overlapped Block Motion Compensation
    • Compound prediction (has 2-ref&mvs)
      • Intra-Inter compound
      • Wedge mask(16 directions)
      • jnt(?) use distance weight
    • Motion field estimation
  • RefFrame management
    • 7 reference frames + Curr frame
      • Intra frame(ref_frame=0)
      • Last frame(ref_frame=1/2/3)
      • Golden frame(ref_frame=4)
      • Backward reference(BWD) frame(ref_frame=5)
      • Alt-ref frame(ref_frame=6/7)
  • Residual Transform
    • DCT: Discrete Cosine Transform
    • ADST: Asymmetric Discrete Sine Transform
    • WHT: Walsh Hadamard Transform
    • 19 Tx sizes=4x4/8x8/16x16/32x32/64x64(1:1), 4x8/8x4/8x16/16x8/16x32/32x16/32x64/64x32(1:2), 4x16/16x4/8x32/32x8/16x64/64x16(1:4)
    • Quantize/Dequantize
    • Quantize matrix (32x32/32x16/16x32, subsample for other size)
    • Lossless coding (per segment)
  • In-loop filtering
    • CDEF: Constrained Directional Enhancement Filter
      • cdef_params: damping, bits of idx, 1st/2nd filter strength (per frame)
      • cdef_idx (per 64x64 block)
      • 8 direction
    • Loop Restortion
      • lr_params: type, unit size=64x64/128x128/256x256 (per frame)
      • Wiener filter: 7-tap for luma, 5-tap for chroma
      • Self guided restortion(SGR) projection
    • CurrFrame = loop_filter(CurrFrame)
    • CdefFrame = CDEF(CurrFrame)
    • UpscaledCdefFrame = upscaling(CdefFrame)
    • UpscaledCurrFrame = upscaling(CurrFrame)
    • LrFrame = loop_restortion(UpscaledCurrFrame, UpscaledCdefFrame)
  • Post filtering
    • OutY/U/V = LrFrame
    • Film grain synthesis
    • Render size (informative hint)

OBU(Open Bitstream Unit) types

  • sequence header
    • profile
    • bitdepth(8/10/12)
    • chroma format(YUV400/420/422/444)
    • operating point
    • frame size
    • superblock size(64x64/128x128)
    • enable coding tools
  • frame header
    • invoke decode_frame() if show_existing_frame
  • tile group
    • invoke decode_frame() after last tile
  • frame
    • frame header OBU + tile group OBU
  • temporal delimiter
  • redundant frame header
  • metadata
    • private data(any)
    • HDR content light level
    • HDR mastering display color volume
    • scalability structure
  • padding
@yohhoy
Copy link
Author

yohhoy commented Mar 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment