Skip to content

Instantly share code, notes, and snippets.

@shreevari
Last active August 17, 2021 11:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save shreevari/6886eb75b02935b4a011dc3d958bad56 to your computer and use it in GitHub Desktop.
Save shreevari/6886eb75b02935b4a011dc3d958bad56 to your computer and use it in GitHub Desktop.
Tuning activity masking

Extracting the activity mask from rav1e.

Even though the activity mask is decoupled from the encoding process, it is an uphill task to output a y4m of the activity masks for each frame currently. The easiest way to get a quick activity mask from the encoder is to write out activity values to a csv file as they are computed. An example patch that does this is present below as "extract-mask.diff"

The activity map

The current attempt to create a map of activity is by computing the cube root of standard deviation(simply, 6th root of variance) of each 8x8 block. Keep track of the activity map implementation in this pull request.

Previous attempts

Attempts of quantifying the activity of an 8x8 block included different trials. One method to do so was to perform a forward transform(DCT) on the image and apply the Constrast Sensitivity Function (CSF) with zeroed DC coefficient and accumulate to get a measure of activity. The distribution of activity was too polar this way. Taking a square root of the activity map distributed activity more evenly around the map. Edges were still being recognized as areas of high activity and were losing too much quality during activity masking. Ignoring the low frequencies at the top left corner of the DCT resulted in hard edges contributing less to activity.

Description IENA Sking
Original mask - Variance IENA original Sking original
Square root of activity IENA sqrt Sking sqrt
Square root of activity with low frequencies filtered IENA de-edged Sking de-edged

The AWCY run comparing the activity mask implementation in comparison with reference shows that the encoding time has increased significantly with no increase in quality after activity masking.

A simple variance based mask looks promising in terms of encoding time costs based on this AWCY run.

Analysis of the map

Looking at a few sample test masks, the activity values range between 0 and 2.5. A python script to filter the map is present below as "mask-filter.ipynb".

The sample masks selected for analysis are "IENA_-Avenches-_6.y4m", "Sking.y4m", "US_Open_Tennis_2010_1st_Round_046.y4m" from "subset1-y4m" video set which is publicly available at derf's media test collection.

With this mask, we are able to identify ranges of absolute values and what kind of activity they correspond to. Blocks with activity 0.0-1.0 are plain areas where increase in perceptual quality with number of bits spent is low. Blocks with activity 1.75-2.5 are mostly hard edges that cannot tolerate loss in quality well. The range of activity that is expected to produce an increase in perceptual quality by activity masking lies between 1.0 and 1.75.

A. Image B. Edges(1.75 - 2.50) C. Plain Areas(0.00 - 1.00) (A - B - C) Activity mask considered
IENA source A1. IENA 1.75 - 2.50 A2. IENA 0.00 - 1.00 A3. IENA 1.00 - 1.75
Sking source B1. Sking 1.75 - 2.50 B2. Sking 0.00 - 1.00 B3. Sking 1.00 - 1.75
US_Open source C1. US_Open 1.75 - 2.50 C2. US_Open 0.00 - 1.00 C3. US_Open 1.00 - 1.75

Blocks of activity in range 1.00 to 1.75 participate in activity masking proportional to their activities while blocks with activities lower and higher are clamped at these boundaries.

Activity masking in RDO

Distortion in RDO is substituted with percieved distortion on basis of activity of the block. Higher the activity, lower the percieved distortion as compared to actual distortion. For blocks with activities as mentioned before, activity masking is disabled. Keep track of the implementation details in this pull request.

Comparison of scale and effects on activity masking

IENA Blocks losing distortion Blocks gaining distortion
Scale: 0.75 IENA 0.75 losing IENA 0.75 gaining
Scale: 0.70 IENA 0.70 losing IENA 0.7 gaining
Sking Blocks losing distortion Blocks gaining distortion
Scale: 0.75 Sking 0.75 losing Sking 0.75 gaining
Scale: 0.70 Sking 0.70 losing Sking 0.7 gaining
US_Open Blocks losing distortion Blocks gaining distortion
Scale: 0.75 US_Open 0.75 losing US_Open 0.75 gaining
Scale: 0.70 US_Open 0.70 losing US_Open 0.7 gaining

All blocks with resulting activity less than 1.0 gain distortion and hence quality. Blocks with resulting activity more than 1.0 lose distortion and hence quality. From the images above, it is evident that blocks losing and gaining distortion are more balanced at scale 0.70 than 0.75.

Activity masking with activity clamped to range 1.00 - 1.75 and scaled by 0.7

Relevant AWCY run shows no increase in performance with quality metrics. All the images below are encoded at 128 base quantizer.

Reference Activity masked
Fruits Reference Fruits Activity Masked
IENA Reference IENA Activity Masked
Sking Reference Sking Activity Masked
Swallowtail Reference Swallowtail Activity Masked

Comparisons

This AWCY run shows no increase in quality on disabling CDEF and performing activity masking.

Disabling CDEF completely is not the right way to test activity masking. Since rav1e weights distortion when it is tuning for Psychovisual(which is by default), we need to tune it for Psnr instead. This AWCY run shows very little variation in quality as compared to the reference.

A comparison of pure activity bias in RDO against a reference without any other kind bias shows that there needs to be a comparison at the same rate.

Activity masking at all block sizes

This AWCY run shows that activity masking at all block sizes does not affect the quality metrics or the rate much. All images below are encoded at 128 base quantizer.

Reference Activity masked
Crepuscular Reference Crepuscular Activity masked
Fruits Reference Fruits Activity Masked
Sking Reference Sking Activity Masked
Swallowtail Reference Swallowtail Activity Masked

Rate matching using ab_compare from daala tools

Rate of the encodes from the above runs differ and may not be the best comparison. Justin Nickelsberg added rav1e support to ab_compare in daala tools on my request. With the help of ab_compare, rate matched encodes are compared to assess variation in perceptual quality.

Sub-block distortion biasing

Since the biasing scheme is computed for 8x8 blocks, during activity masking of larger blocks, distortion is biased for each 8x8 sub-block with the chosen scale and is summed up to obtain the total distortion of the block. This masks activity more accurately as compared to using the mean activity measure of a large block. It is important to note that usage of transform domain distortion should be disabled to measure the effects of activity masking more accurately.

Reference Activity masked
Crepuscular Reference Crepuscular Activity masked
Fruits Reference Fruits Activity Masked
Nymphe Reference Nymphe Activity Masked
IENA Reference IENA Activity Masked
Swallowtail Reference Swallowtail Activity Masked

Further plan

Adaptive quantization

Current implementation of adaptive quantization goes well with temporal block importance biasing and is based on the RDO loop. But activity masking could be trialled independent of RDO and block importance biases. This requires that activity based adaptive quantization be tuned separately and also figure out a way for it to co-exist with other factors influencing the quantizer offsets. An absolute biasing model based on the computed activity mask would yield higher perceptual quality.

Optimizing activity mask storage and access

Current implementation of the activity mask involves two-dimensional vectors which make the storage and access of activity slow. This has lead to a constant increase in the encoding time due to activity masking as seen in this AWCY run. Once the implementation of activity masking is finalized, this computation could be optimized with the usage of arrays. It could also be moved to a pre-process stage to be processed on parallel before the actual encoding of the frames start.

More control over activity masking

Providing a way to control the amount of activity masking based on the speed settings and tune preferences throught the CLI and the API would ensure good usage of activity masking.

diff --git a/Cargo.toml b/Cargo.toml
index 753f920..a95c876 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -35,7 +35,8 @@ quote = "^0.6.10" # hack for proc-macro-hack
num-traits = "0.2"
num-derive = "0.2"
paste = "0.1"
-serde = "1.0"
+serde = { version = "1.0", features = ["derive"] }
+csv = "1.1.1"
serde_derive = "1.0"
serde_json = { version = "1.0", optional = true }
dav1d-sys = { version = "0.2", optional = true }
diff --git a/src/activity.rs b/src/activity.rs
index a10b900..22b73f4 100644
--- a/src/activity.rs
+++ b/src/activity.rs
@@ -7,13 +7,15 @@
// Media Patent License 1.0 was not distributed with this source code in the
// PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+use serde::Serialize;
+use csv::Writer;
use crate::transform::*;
use crate::tiling::*;
use crate::frame::*;
use crate::util::*;
use crate::rate::QSCALE;
-#[derive(Debug, Default)]
+#[derive(Debug, Default, Serialize)]
pub struct ActivityMask {
mask: Vec<Vec<f32>>,
// Width and height of the original frame that is masked
@@ -35,6 +37,10 @@ impl ActivityMask {
}
pub fn new_from_plane<T: Pixel>(luma_plane: &Plane<T>, bit_depth: usize) -> ActivityMask {
+
+
+ let mut wtr = Writer::from_path("mask.csv").unwrap();
+
// Contrast Sensitivity Function with DC coeff = 0 to subtract mean
let csf: [[f32; 8]; 8] = [
[0.0/*1.608_443*/, 2.339_554, 2.573_509, 1.608_443, 1.072_295, 0.643_377, 0.504_610, 0.421_887],
@@ -86,9 +92,13 @@ impl ActivityMask {
// Applying CSF to the deviation on tx_domain and accumulating
let act: f32 = tx_deviation.iter().zip(csf.iter().flatten()).map(|(d, &csf)| (*d as f32 * csf / (1 << QSCALE) as f32).powf(2.0).floor() / 64.0f32).sum();
+ wtr.write_field(act.to_string()).unwrap();
+
row.push(act);
}
+ wtr.write_record(None::<&[u8]>).unwrap();
+
mask.push(row)
}
ActivityMask {
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment