Skip to content

Instantly share code, notes, and snippets.

Created December 19, 2015 15:00
Show Gist options
  • Save anonymous/5e9813ccda6af868ecf1 to your computer and use it in GitHub Desktop.
Save anonymous/5e9813ccda6af868ecf1 to your computer and use it in GitHub Desktop.
== Codebase ==
Daala is structured as a shared or static library with an API. The API is container agnostic and similar to the Opus API. The codebase is built into three libraries for encoder, decoder, and common code, so applications only need to link in what they need. There are also example executables that use the API to encode files and play them back. These are encoder_example, dump_video and player_example as described in the Daala Quickstart guide. There is also a graphical stream analyzer that will display information such as block sizes, PVQ modes, bit allocation, and more.
Daala can be built on any Unix-like platform or Microsoft Windows. Windows builds can be performed either with mingw64 or with Microsoft Visual Studio. Guides on building the code and encoding video can be found here:
https://wiki.xiph.org/Daala_Quickstart
There is also a second, independent decoder implementation in FFmpeg being developed by a community member:
https://github.com/atomnuker/FFmpeg
== Encoder options and parameters ==
The encoder is designed to have few parameters - the defaults are configured to produce the best results. The full list of options is available via encoder example --help. An abbreviated list is shown below:
-k --keyframe-rate <n> Frequency of keyframes in output.
-b --b-frames <n> Number of B-frames between two
reference frames. Default 0
(i.e. P frames only). Max 4.
-v --video-quality <n> Daala quality selector from 0 to 511.
511 yields the smallest files, but
lowest video quality; 1 yields the
highest quality, but large files;
0 is lossless.
-z --complexity <n> Computational complexity: 0...10
Fastest: 0, slowest: 10, default: 7
--[no-]fpr Disable (default) or enable full
precision references.
The best possible performance is achieved with -z 10 --fpr -b 2. Reducing the -z level or turning off FPR will reduce runtime at the cost of coding efficiency. For real time encoding, the number of B frames must be set to zero.
== Tools ==
- Entropy Coder
Daala's entropy coder is based on the range coder we successfully deployed in the Opus audio codec, with optimizations to avoid multiplications and divisions. The entropy coder supports alphabet sizes up to 16 to increase throughput per symbol and has several adaptation models that allow per-symbol probability adaptation. It is documented in the following IETF draft:
https://tools.ietf.org/html/draft-terriberry-netvc-codingtools
- Transforms
Daala uses lapped transforms with variable block sizes. These are made of two separable components - perfectly reversible integer approximations of the DCT (up to 64x64), and time-domain reversible filters.
There is a demo page that describes with many pictures and diagrams how these filters are constructed and how they behave:
https://people.xiph.org/~xiphmont/demo/daala/demo1.shtml
The design of the filters is documented as an IETF draft:
https://datatracker.ietf.org/doc/draft-egge-netvc-tdlt/
- PVQ
Daala uses Perceptual Vector Quantization to extract perceptually meaningful parameters from the transform coefficients. The idea is extended from Opus's Pyramid Vector Quantization. It preserves contrast and applies activity masking without any extra signaling.
PVQ is visually documented at a Daala demo page:
https://people.xiph.org/~jm/daala/pvq_demo/
It has been presented in various places, including Linux.conf.au 2015: https://www.youtube.co.uk/watch?v=Dmho4gcRvQ4
It is also documented as an IETF draft:
https://tools.ietf.org/html/draft-valin-videocodec-pvq-02
- OBMC
Daala uses Overlapped Block Motion Compensation (OMBC) for inter frame prediction. The implementation in Daala supports variable block sizes with full blending windows. This is achieved via placing motion vectors on block corners and using a 4-8 subdivision mesh rather than the traditional quadtree.
The prediction method produces no block edges, which when combined with lapped transforms, removes the need for an adaptive deblocking filter. It also allows the size of transform and prediction blocks to be decoupled. Large skipped regions also look better due to the lack of blocking.
A dynamic programming search implemented in the Daala encoder substantially improves the gains achievable with this method.
Tim Terriberry has written a paper on this technique:
https://people.xiph.org/~tterribe/daala/vbsobmc.pdf
You can watch him present about it here:
http://people.xiph.org/~tdaede/video/SPIE_Tim.webm
The feature is documented as an IETF draft:
https://datatracker.ietf.org/doc/draft-terriberry-netvc-obmc/
- Multiple reference frames and B-frames
Daala supports three refrence frames, only two of which can be refenced from a single picture.
Currently this is used to implement periodic "golden frames" (at a fixed interval, with each P-frame referencing the previous frame and the most recent golden frame) as well as an MPEG2-style B-frame reference structure (with one forward and one backward reference, not themselves used as reference frames).
Each motion vector selects which of the two references to use.
There is currently no bi-prediction mode to blend predictions from multiple reference frames, nor a DIRECT mode equivalent.
This is still an area of active development and there is not yet an IETF draft for this feature.
- CfL
Chroma-from-Luma is a tool that predicts chroma planes from the luma planes. The technique operates in the frequency domain and is simple to implement using features already availlable in PVQ.
The feature is described with pictures of its operation on a Daala demo page:
https://people.xiph.org/~xiphmont/demo/daala/demo4.shtml
It is also documented as an IETF draft:
https://datatracker.ietf.org/doc/draft-egge-netvc-cfl/
- Haar DC
In intra frames, DC coefficients are transformed using a Haar transform within each 64x64 superblock to gain finer quantization resolution for large-scale features.
Gradient prediction from neighboring superblocks further reduces the cost of DC coding.
There is not yet an IETF draft for this feature.
- Deringing filter
Daala has a sophisticated directional deringing filter. The current design of the filter is documented as an IETF draft:
https://datatracker.ietf.org/doc/draft-valin-netvc-deringing/
- Other tools
Daala has a screencasting transform. It currently must be activated manually. It is documented here:
https://datatracker.ietf.org/doc/draft-valin-netvc-l1tw/
Daala can operate with a traditional adaptive deblocking filter (taken from Thor), but using the same transforms and other tools.
Daala supports full precision 12-bit reference frames (full precision refrences). These can be used regardless of the input or output depth and always provide a coding efficiency improvement, at the cost of speed. The same binary can run with either 8 or 12 bit reference frames.
Daala supports a bilinear loop filter designed to improve the quality of large gradients.
Unlike a traditional deblocking filter, it does not operate across block boundaries. There is no IETF draft yet, but was described in this talk: https://www.youtube.com/watch?v=g7fVwIZBW8Q
== Engineering Resources ==
There are currently 6 full-time engineers dedicated to the project. There are also many community developers who actively contribute to the project.
== Progress & Future Plans ==
Historically, Daala has seen between 10% and 15% quarterly improvement on metrics.
There are several important features that we plan to focus on next quarter: More sophisticated B-frames (including a bi-prediction mode and pyramid B-frames), rate-control (including content adaptive frame type decisions and quantizer selection) and supporting more intra-frame features in inter frames (including CfL and Haar DC). A detailed list of future engineering tasks is available here:
https://app.smartsheet.com/b/publish?EQBCT=0ada8c0a0ff447e9a586db05bfaaaada
== Other ==
Much more information and documentation can be found on the Xiph wiki page for Daala: https://wiki.xiph.org/Daala
These include videos of talks, slides, and academic papers submitted about Daala or various tools within.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment