sile/README.md

## README.md

      
    Raw
  

              README.md
            
          
    概要


Optunaというハイパーパラメータ最適化ツールを使って、FFmpegでの動画エンコードパラメータの最適化を試してみた結果のメモ
具体的には、決められた制約(後述)下で、画質(SSIM)を最大化するようなパラメータ群を自動で見つけ出すのが目的
結果としては、

画質的には、FFmpegが提供しているプリセットの中で二番目に重いもの(slower)より若干良い程度のパラメータ群が見つかった
また、Optunaが見つけたパラメータ群の方がslowerに比べて、CPU負荷が小さかった


方針


時間と計算資源はそこそこ潤沢にあるものと仮定し、その中で「各動画のエンコード」を最適化したいとする

各動画毎に、最適なエンコードパラメータ群を都度決定するようなユースケース
動画の種類毎(e.g., スポーツ、アニメ、ニュース、実況、3D)にパラメータを分けたい、的なものの発展形


問題を難しくするために、以下の制約を課す:

エンコード後のビットレートと解像度は固定(100Kbps, 320x240)
キーフレーム間隔は12
※ 実際の用途では、ここは利用者の要求に応じた制約に置き換わる


エンコード画質の評価指標はSSIM (これの最大化を目指す)

現実的には「SSIMが高い == 人が見て綺麗」という訳ではないが、簡単のために機械的に計算可能な指標を利用する


使用する動画はBig Buck Bunny(約10分)
Optunaの構成・使い方:

4並列で8時間最適化を実行する
SuccessiveHalvingPrunerを使って枝刈りを行う:

入力動画を100秒、300秒、900秒の地点で区切って、Optunaが提案したパラメータ群を使って変換を行い、結果の画質を測定する
その際に100秒ないし300秒地点で明らかに画質が悪いパラメータ群に関しては、その時点で変換を中断する（時間の節約のため）


最適化対象パラメータ等はoptimize-ffmpeg.pyを参照


実行手順と結果

GCPのvCPU=4の仮想マシン上で実行した:
// セットアップ
$ apt install ffmpeg
$ pip install ffmpeg-python optuna
$ wget https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_480p_h264.mov

$ optuna --version
optuna 0.6.0

$ ffmpeg -version | head -1
ffmpeg version 3.2.12-1~deb9u1 Copyright (c) 2000-2018 the FFmpeg developers

// 最適化実行
$ python optuna-ffmpeg.py
...ログは省略...

Study statistics:
  Number of finished trials:  761
Best trial:
  Value: 0.0669073
  Params:  // 以降が最適化後のパラメータ群
    me: esa
    partitions-p4x4: 1
    psy-rd.0: 0.5660359330969369
    no-deblock: 0
    scenecut: 33
    mixed-refs: 0
    deadzone-intra: 0
    subme: 9
    b-adapt: 1
    rc-lookahead: 13
    partitions-i4x4: 1
    partitions-b8x8: 0
    psy-rd.1: 0.22124595698077815
    no-cabac: 0
    nr: 1
    qcomp: 0.5578100145277352
    qdiff: 10
    no-dct-decimate: 0
    direct: temporal
    partitions-i8x8: 1
    deblockalpha: -1
    no-chroma-me: 0
    partitions-p8x8: 1
    deblockbeta: -3
    deadzone-inter: 0
    b-pyramid: 0
    8x8dct: 1
    min-keyint: 2
    trellis: 2
    no-fast-pskip: 1
    me_range: 11
    refs: 1

// Optunaによる最適化後のパラメータ群での変換
$ time -p ffmpeg -y -i big_buck_bunny_480p_h264.mov -tune ssim -ssim 1 -s 320x240 -vb 100k -an -refs 16 -qcomp 0.5413518659359666 -qdiff 24 -me_range 8 -x264opts "keyint=12:partitions=p4x4,i4x4,i8x8:min-keyint=4:no-fast-pskip:deblock=-2,-1:subme=9:nr:8x8dct:psy-rd=0.9040981359096917,0.4768265276771205:direct=temporal:rc-lookahead=15:trellis=2:me=hex:scenecut=35:b-adapt=0" -vcodec libx264 -f null -
[libx264 @ 0x56469ad0f900] SSIM Mean Y:0.9329060 (11.733db)
real 43.12
user 140.94
sys 1.67

// FFmpegが提供するプリセットを使っての変換 (下に行くほど高画質・高負荷)

// preset=medium (default)
$ time -p ffmpeg -y -i big_buck_bunny_480p_h264.mov -tune ssim -ssim 1 -preset medium -s 320x240 -vb 100k -an -x264opts "keyint=12" -vcodec libx264 -f null -
[libx264 @ 0x55e567b621a0] SSIM Mean Y:0.9296671 (11.528db)
real 31.90
user 107.31
sys 1.39

// preset=slow
$ time -p ffmpeg -y -i big_buck_bunny_480p_h264.mov -tune ssim -ssim 1 -preset slow -s 320x240 -vb 100k -an -x264opts "keyint=12" -vcodec libx264 -f null -
[libx264 @ 0x560c5e6221a0] SSIM Mean Y:0.9319332 (11.671db)
real 42.77
user 135.48
sys 1.70

// preset=slower
$ time -p ffmpeg -y -i big_buck_bunny_480p_h264.mov -tune ssim -ssim 1 -preset slower -s 320x240 -vb 100k -an -x264opts "keyint=12" -vcodec libx264 -f null -
[libx264 @ 0x55e11eb5a1a0] SSIM Mean Y:0.9328051 (11.727db)
real 52.60
user 176.97
sys 1.66

// preset=veryslow
$ time -p ffmpeg -y -i big_buck_bunny_480p_h264.mov -tune ssim -ssim 1 -preset veryslow -s 320x240 -vb 100k -an -x264opts "keyint=12" -vcodec libx264 -f null -
[libx264 @ 0x5574b77581a0] SSIM Mean Y:0.9347978 (11.857db)
real 67.37
user 220.57
sys 1.83
各パラメータセットでの画質と変換時間(CPU負荷)のまとめ:


パラメータセット
画質(SSIM)
変換時間(秒)


preset=medium
0.9296671
31.90


preset=slow
0.9319332
42.77


preset=slower
0.9328051
52.60


optuna
0.9329060
43.12


preset=veryslow
0.9347978
67.37


Optunaが見つけたパラメータセットはおおよそ「slower相当の画質」かつ「slow相当の変換時間」となっていた。
感想等


とりあえずとあるサイトに列挙されているパラメータ群をほぼ機械的にOptunaに指定するだけで、それなりの結果は得られた
veryslowに負けている理由は気になる:

現状では、FFmpeg(libx264)に指定可能な全てのパラメータを最適化対象にしている訳ではないので、それらも対象に含めるようにすればまた結果は変わってくるかもしれない

入力動画や条件を変更してみても、また結果は変わりそう


最適化の経過を見ていた印象では、現状の構成のまま最適化時間を延ばしたとしても、結果の大幅な向上は見込めなさそうだった


映像画質以外にも、以下のような指標を対象にして、最適化するようにしても面白いかもしれない:

CPU使用率、変換後のファイルサイズ、デコード時のレイテンシ、etc

あるいはこれらのコンビネーション (e.g., 変換時間がN秒以内に収まる中で、最適な画質を選択する)


画質に関してもSSIM以外にもいろいろ選択肢がある(PSNR, VMAF, etc)
変換制約と評価指標が特殊になればなるほど、Optunaの強みが出てきそう


実際に適用にするとしたら、評価指標をどうするかが一番悩ましい

映像画質の場合には、SSIMのような客観指標だけでなく、人の目による主観評価も行われる印象だけど、後者はOptunaのような仕組みとは相性が良くない


## optimize-ffmpeg.py
import ffmpeg
import optuna
from optuna.pruners import SuccessiveHalvingPruner
from optuna.structs import TrialPruned
import re

# https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_480p_h264.mov
input_file='big_buck_bunny_480p_h264.mov'

def objective(trial):
    # ffmpeg(libx264)に指定するパラメータを選択
    #
    # 参考: https://sites.google.com/site/linuxencoding/x264-ffmpeg-mapping
    refs=trial.suggest_int('refs', 1, 16)
    qcomp=trial.suggest_uniform('qcomp', 0.0, 1.0)
    b_adapt = trial.suggest_int('b-adapt', 0, 2)
    subme = trial.suggest_int('subme', 1, 9)
    trellis = trial.suggest_int('trellis', 0, 2)
    if trellis > 0:
        psy_rd=':psy-rd={},{}'.format(trial.suggest_uniform('psy-rd.0', 0.0, 1.0),
                                      trial.suggest_uniform('psy-rd.1', 0.0, 1.0))
    else:
        psy_rd=''

    partitions = []
    for p in ['p8x8', 'p4x4', 'b8x8', 'i8x8', 'i4x4']:
        if bool(trial.suggest_categorical('partitions-{}'.format(p), [0, 1])):
            partitions.append(p)
    if len(partitions) == 0:
        partitions = 'none'
    else:
        partitions = ','.join(partitions)

    me = trial.suggest_categorical('me', ['dia', 'hex', 'umh', 'esa'])
    me_range = trial.suggest_int('me_range', 4, 16)
    direct = trial.suggest_categorical('direct', ['none', 'spatial', 'temporal', 'auto'])
    rc_lookahead = trial.suggest_int('rc-lookahead', 10, 100)
    min_keyint = trial.suggest_int('min-keyint', 1, 12)
    scenecut = trial.suggest_int('scenecut', 10, 100)
    qdiff = trial.suggest_int('qdiff', 1, 51)

    no_deblock = bool(trial.suggest_categorical('no-deblock', [0, 1]))
    if no_deblock:
        deblock = 'no-deblock'
    else:
        deblock = 'deblock={},{}'.format(trial.suggest_int('deblockalpha', -6, 6),
                                         trial.suggest_int('deblockbeta', -6, 6))

    flags = ''
    for f in ['b-pyramid', 'no-cabac', 'mixed-refs', 'no-chroma-me', '8x8dct', 'no-fast-pskip', 'no-dct-decimate',
              'deadzone-inter', 'deadzone-intra', 'nr']:
        if bool(trial.suggest_categorical(f, [0, 1])):
            flags += ':{}'.format(f)

    # 明らかに望みが薄いパラメータセットを使って動画全体を変換(評価)するのは無駄なので、
    # 100秒および300秒地点で枝刈りが行われるようにする。
    # (動画自体の尺は600秒程度)
    for duration in [100, 300, 900]:
        enc = ffmpeg.input(input_file, t=duration)

        x264opts = 'keyint=12:b-adapt={}:subme={}:trellis={}:partitions={}:direct={}:rc-lookahead={}:min-keyint={}:scenecut={}:me={}:{}{}{}'.format(
            b_adapt, subme, trellis, partitions, direct, rc_lookahead, min_keyint, scenecut, me, deblock, flags, psy_rd)

        enc = enc.output('-', f='null', tune='ssim', ssim=1, an=None, vcodec='libx264', s='320x240', vb='100k',
                         refs=refs, qcomp=qcomp, x264opts=x264opts, qdiff=qdiff, me_range=me_range)

        # 変換実行
        _, stderr = enc.run(capture_stderr=True)

        # SSIMを評価値として使用する
        ssim = float(re.search('SSIM Mean Y:([0-9.]+)', stderr.decode('utf-8')).group(1))

        trial.report(ssim, duration)
        if trial.should_prune():
            raise optuna.structs.TrialPruned()

    return ssim

if __name__ == '__main__':
    study = optuna.create_study(
        study_name = "ffmpeg",
        storage = "sqlite:///optuna-ffmpeg.db",
        load_if_exists = True,
        direction = 'maximize',
        pruner=SuccessiveHalvingPruner(min_resource=100, reduction_factor=3))

    study.optimize(objective, timeout=8 * 60 * 60, n_jobs=4)
    print('Study statistics: ')
    print('  Number of finished trials: ', len(study.trials))

    print('Best trial:')
    trial = study.best_trial

    print('  Value: {}'.format(trial.value))

    print('  Params: ')
    for key, value in trial.params.items():
        print('    {}: {}'.format(key, value))
パラメータセット	画質(SSIM)	変換時間(秒)
preset=medium	0.9296671	31.90
preset=slow	0.9319332	42.77
preset=slower	0.9328051	52.60
optuna	0.9329060	43.12
preset=veryslow	0.9347978	67.37
	import ffmpeg
	import optuna
	from optuna.pruners import SuccessiveHalvingPruner
	from optuna.structs import TrialPruned
	import re

	# https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_480p_h264.mov
	input_file='big_buck_bunny_480p_h264.mov'

	def objective(trial):
	# ffmpeg(libx264)に指定するパラメータを選択
	#
	# 参考: https://sites.google.com/site/linuxencoding/x264-ffmpeg-mapping
	refs=trial.suggest_int('refs', 1, 16)
	qcomp=trial.suggest_uniform('qcomp', 0.0, 1.0)
	b_adapt = trial.suggest_int('b-adapt', 0, 2)
	subme = trial.suggest_int('subme', 1, 9)
	trellis = trial.suggest_int('trellis', 0, 2)
	if trellis > 0:
	psy_rd=':psy-rd={},{}'.format(trial.suggest_uniform('psy-rd.0', 0.0, 1.0),
	trial.suggest_uniform('psy-rd.1', 0.0, 1.0))
	else:
	psy_rd=''

	partitions = []
	for p in ['p8x8', 'p4x4', 'b8x8', 'i8x8', 'i4x4']:
	if bool(trial.suggest_categorical('partitions-{}'.format(p), [0, 1])):
	partitions.append(p)
	if len(partitions) == 0:
	partitions = 'none'
	else:
	partitions = ','.join(partitions)

	me = trial.suggest_categorical('me', ['dia', 'hex', 'umh', 'esa'])
	me_range = trial.suggest_int('me_range', 4, 16)
	direct = trial.suggest_categorical('direct', ['none', 'spatial', 'temporal', 'auto'])
	rc_lookahead = trial.suggest_int('rc-lookahead', 10, 100)
	min_keyint = trial.suggest_int('min-keyint', 1, 12)
	scenecut = trial.suggest_int('scenecut', 10, 100)
	qdiff = trial.suggest_int('qdiff', 1, 51)

	no_deblock = bool(trial.suggest_categorical('no-deblock', [0, 1]))
	if no_deblock:
	deblock = 'no-deblock'
	else:
	deblock = 'deblock={},{}'.format(trial.suggest_int('deblockalpha', -6, 6),
	trial.suggest_int('deblockbeta', -6, 6))

	flags = ''
	for f in ['b-pyramid', 'no-cabac', 'mixed-refs', 'no-chroma-me', '8x8dct', 'no-fast-pskip', 'no-dct-decimate',
	'deadzone-inter', 'deadzone-intra', 'nr']:
	if bool(trial.suggest_categorical(f, [0, 1])):
	flags += ':{}'.format(f)

	# 明らかに望みが薄いパラメータセットを使って動画全体を変換(評価)するのは無駄なので、
	# 100秒および300秒地点で枝刈りが行われるようにする。
	# (動画自体の尺は600秒程度)
	for duration in [100, 300, 900]:
	enc = ffmpeg.input(input_file, t=duration)

	x264opts = 'keyint=12:b-adapt={}:subme={}:trellis={}:partitions={}:direct={}:rc-lookahead={}:min-keyint={}:scenecut={}:me={}:{}{}{}'.format(
	b_adapt, subme, trellis, partitions, direct, rc_lookahead, min_keyint, scenecut, me, deblock, flags, psy_rd)

	enc = enc.output('-', f='null', tune='ssim', ssim=1, an=None, vcodec='libx264', s='320x240', vb='100k',
	refs=refs, qcomp=qcomp, x264opts=x264opts, qdiff=qdiff, me_range=me_range)

	# 変換実行
	_, stderr = enc.run(capture_stderr=True)

	# SSIMを評価値として使用する
	ssim = float(re.search('SSIM Mean Y:([0-9.]+)', stderr.decode('utf-8')).group(1))

	trial.report(ssim, duration)
	if trial.should_prune():
	raise optuna.structs.TrialPruned()

	return ssim

	if __name__ == '__main__':
	study = optuna.create_study(
	study_name = "ffmpeg",
	storage = "sqlite:///optuna-ffmpeg.db",
	load_if_exists = True,
	direction = 'maximize',
	pruner=SuccessiveHalvingPruner(min_resource=100, reduction_factor=3))

	study.optimize(objective, timeout=8 * 60 * 60, n_jobs=4)
	print('Study statistics: ')
	print(' Number of finished trials: ', len(study.trials))

	print('Best trial:')
	trial = study.best_trial

	print(' Value: {}'.format(trial.value))

	print(' Params: ')
	for key, value in trial.params.items():
	print(' {}: {}'.format(key, value))