master-of-zen/Video.md

## Video.md

      
    Raw
  

              Video.md
            
          
    Video is messy

I speak on stuff that i noticed, if you have questions message me directly on discord Master Of Zen#7693

Video encoding is messy and a lot of stuff is poorely explained online, which leaves people with bad framework to understand and deal with video.
Base


Coding Standarts and Video encoders are different things. aomenc, rav1e, SVT-AV1 are all video encoders for AV1.


Coding standarts define how encoded video should be as a binary(file) and decoded(made back into pixels, usually to play it or feed into another program as pixels).


What video encoder try to do is to exploit correlations(similarities) with the frame, and between consequtive frames, meaning to find a way to use less data(bits).


Coding standarts. Specify what tools that are allowed to be in binary to reconstruct video and how binary should be constructed. More modern standarts employ tools that better.


Presets. Set of tools that encoders will use to find optimal solutions (use less data, be efficient).General rule is that more tools encoder have and try - better the efficiency. Also under presets i want include Software and Hardware encoders, because for output video file they represent just different set of tools(presets) composed with different goals and resources in mind. Ideally, if software and hardware encoder have exactly the same set of tools, same settings, they will produce exactly the same output.


Preset determine "effort" that encoder will spend try to find optimal solution, before moving on to next tast.


Encoders analize frames and find optimal ways to break them into blocks, and how to move those blocks on next frames to compensate for movement. Making new blocks usually takes more data than reusing blocks from previous frames


By transforming and quantizing blocks, we can change amount of data that is required to make those blocks.
Not visually significant data is discarded first, more we discard - lighter the block, and more diviation from the original is introduced.


Idea of transforming is that we can represent some arbirary NxN block of pixels as NxN coefficients, and i think this video covers it perfectly https://www.youtube.com/watch?v=Q2aEzeMDHMA.


Bigger blocks are generally better at less detailed parts of frame(as they ), smallest are used to catch details. As we reduce quality/discard more information, block partition decision can change and bigger blocks are used instead.


Blocks, it's all blocks

Here is example of how encoder breaks frame that we try to encode with with high and low quality into blocks.
High quality (aomenc cq 20)


Low quality (aomenc cq 63)


Notice that with quality change most of the small blocks with fine detail dissapeard, and were replaced with bigger blocks, which are also more quantized(which makes parts of video looks like checkerboard).
So when quality of video goes down what we are used to see is big and highly quantized blocks.
Also, you can check all of those in interactive environment with this links:

ONE
TWO
Resolution and quality

Increase of quality is not proportional to bitrate, initial increase in bitrate gives gain percepted quality, after that increase in bitrate give marginal gains.
For each video segment, for each encoder preset, that point is different is different.
Bitrate is "a budget" of data that is spend, without context it's usually useless to determine efficiency or quality of encoding.
It makes sense to downscale the video only at very low bitrates, as encoder tools as transforming, are already exploiting spatial redundancy and correlations.

In case when we want to serve video with least amount of bitrate we use convex hull, where we use downscaling as last resource to lower bitrate with preserving as much appeal as possible, when encoder no longer able to manage that resolution.
Which produce a Convex Hull of resolutions and bitrates at which it's optimal to encode.

Live Video upscale

As right now MPV implements using glsl shaders when watching video.
Currently one of the best options for live upscaling is AMD's FSR [FSRCNNX_x2_8-0-4-1.glsl]
There is many more, but (in my opinion) they are specialized to certain type of content, or don't perform as well.
Example of MPV config that use it: https://github.com/hl2guide/better-mpv-config and have other neet things that improve quality of video.
One of the additional cool features that it's possible to watch youtube through mpv (when configured),
with command like this:
mpv https://www.youtube.com/watch?v=4zWoz6Qctpg
*needed to note that video will be upscaled only is window display resolution is bigger than video
Example of playing youtube video with mpv with playback info

You can see many user shader first portion of those are FSR.


Comparision of mpv playback quality and watching same video on youtube

Performance vary greatly depending on quality of a video.
Usually it performs amazing on any text, and videos that were downscaled.
Here is how whole frame looks like

Guess where is youtube window screenshot and which is MPV screenshot.


Stable diffusion web ui upscale

I use this project:
https://github.com/AUTOMATIC1111/stable-diffusion-webui
With DLSR or Ersgan Upscalers
(DSLR is really slow, but produce amazing results)
Many more could be found on stable diffusion subredits.
Here is my last one)