nanake/ffmpeg-hevc-encode-nvenc.md

## ffmpeg-hevc-encode-nvenc.md

      
    Raw
  

              ffmpeg-hevc-encode-nvenc.md
            
          
    Encoding high-quality HEVC content in a two-pass manner with FFmpeg - based NVENC encoder on supported hardware:

If you've built ffmpeg as instructed here on Linux and the ffmpeg binary is in your path, you can do fast HEVC encodes as shown below, using NVIDIA's NPP's libraries to vastly speed up the process.
Now, to do a simple NVENC encode in 1080p, (that will even work for Maxwell Gen 2 (GM200x) series), start with:
ffmpeg  -i <inputfile> -pass 1 \
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \
-c:v hevc_nvenc -profile main -preset slow -rc vbr_2pass \ 
-qmin 15 -qmax 20 -2pass 1 -c:a copy /dev/null && ffmpeg  -i <inputfile> -pass 2 \
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \
-c:v hevc_nvenc -profile main -preset slow -rc vbr_2pass \ 
-qmin 15 -qmax 20 -2pass 1 -c:a copy <outputfile>

Note that this encode method lacks 10-bit support and is in the 4:2:0 color space.
Extra notes: For full hardware-accelerated transcodes, you may also want to use one of the many Nvidia CUVID-based accelerated decoders available in your FFmpeg build. See the list available on your system as shown here.
Add the appropriate CUVID decoder to the command line based on the source media file:


For transcoding 8-bit H.264/AVC content to the same or to 8-bit HEVC content as the final result, append -hwaccel cuvid -c:v h264_cuvid to the ffmpeg arguments before the -i option.


For transcoding 8-bit HEVC content to the same or to 8-bit H.264 content as the final result, append -hwaccel cuvid -c:v hevc_nvenc to the ffmpeg arguments before the -i  option.


Follow the same guide in transcoding 8-bit content supported by CUVID's decoder as shown above, linked to the previous gist, as per the input format.


Now, for 10-bit encodes, take care to omit the -hwaccel cuvid option (as all textures have to be copied to system memory) and instead add only the -c:v {hwaccel_type} , which can be any of the following entries based on the source content codec:


(a).h263_cuvid: Nvidia CUVID H263 decoder (codec h263)
(b).h264_cuvid: Nvidia CUVID H264 decoder (codec h264)
(c).hevc_cuvid: Nvidia CUVID HEVC decoder (codec hevc)
(d).mjpeg_cuvid: Nvidia CUVID MJPEG decoder (codec mjpeg)
(e).mpeg1_cuvid: Nvidia CUVID MPEG1VIDEO decoder (codec mpeg1video)
(f).mpeg2_cuvid: Nvidia CUVID MPEG2VIDEO decoder (codec mpeg2video)
(g).mpeg4_cuvid: Nvidia CUVID MPEG4 decoder (codec mpeg4)
(h).vc1_cuvid:  Nvidia CUVID VC1 decoder (codec vc1)
(i).vp8_cuvid:  Nvidia CUVID VP8 decoder (codec vp8)
(j).vp9_cuvid:  Nvidia CUVID VP9 decoder (codec vp9)
Note that decode support will vary on the platform you're on, and as such:

Maxwell Generation 1 SKUs (GM107) is limited to H.264, MJPEG, and MPEG (1 through 4) decode support only.
Second Generation Maxwell (GM204) is the same as Maxwell's first generation in terms of decode capability.
Newer Maxwell GPUs (GM206 and the GM200) offer additional support for fixed function hardware accelerated HEVC decoding.
All pascal GPUs (GP104, GP100, etc) offer support for all the above CUVID-based decoders.

An attempt to use a CUVID-based decoder that is not supported by your hardware will result in a CUDA-related error like this:
[vp9_cuvid @ 0x30bf700] ctx->cvdl->cuvidCreateDecoder(&cudec, &cuinfo) failed -> CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Stream mapping:
  Stream #0:0 -> #0:0 (vp9 (vp9_cuvid) -> h264 (h264_nvenc))
Error while opening decoder for input stream #0:0 : Generic error in an external library
[AVIOContext @ 0x30c14a0] Statistics: 0 seeks, 0 writeouts
[AVIOContext @ 0x30c16e0] Statistics: 882605 bytes read, 0 seeks

Here, I tried using the vp9_cuvid decoder on an unsupported platform (to be specific, a First generation Maxwell card) and it failed spectacularly.
Everything after this point will require a Pascal based card (10xx).
Adding 10bit:
ffmpeg  -i <inputfile> -pass 1 \ 
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \
-c:v hevc_nvenc -profile:v main10 -preset slow \
-rc vbr_2pass -qmin 15 -qmax 20 -2pass 1 -c:a:0 copy /dev/null && ffmpeg  -i  <inputfile> -pass 2 \ 
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \
-c:v hevc_nvenc -profile:v main10 -preset slow \
-rc vbr_2pass -qmin 15 -qmax 20 -2pass 1 -c:a:0 copy <outputfile>

Adding 10bit with 4:4:4 conversion:
ffmpeg  -i <inputfile> -pass 1 \ 
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \ 
-c:v hevc_nvenc -pix_fmt yuv444p16 -profile:v main10 -preset slow -rc vbr_2pass -qmin 15 -qmax 20 -2pass 1 -c:a:0 copy /dev/null && ffmpeg  -i <inputfile> -pass 2 \ 
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \ 
-c:v hevc_nvenc -pix_fmt yuv444p16 -profile:v main10 -preset slow -rc vbr_2pass -qmin 15 -qmax 20 -2pass 1 -c:a:0 copy <outputfile>

And finally, 10bit, 4:4:4 with the maximum look-ahead value Pascal supports, which helps with motion heavy scenes:
ffmpeg -pass 1  -i <inputfile> \
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \
-c:v hevc_nvenc -pix_fmt yuv444p16 -profile:v main10 -preset slow -rc vbr_2pass -qmin 15 -qmax 20 -2pass 1 -rc-lookahead 32 -c:a:0 copy /dev/null && ffmpeg -pass 2  -i <inputfile> \
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \
-c:v hevc_nvenc -pix_fmt yuv444p16 -profile:v main10 -preset slow -rc vbr_2pass -qmin 15 -qmax 20 -2pass 1 -rc-lookahead 32 -c:a:0 copy <outputfile>

Note: Using NVIDIA's NPP to speed up the encode and decode process as illustrated above has been documented extensively, refer to  this gist for more information.
Hint: If you want to do the encodes without having to specify the target encodes resolution (skipping the nvidia-provided scaler), you may repeat the snippets above by removing the -filter:v argument in full.
Basic encode:
ffmpeg  -i <inputfile> -pass 1 \
-c:v hevc_nvenc -profile main -preset slow -rc vbr_2pass \ 
-qmin 15 -qmax 20 -2pass 1 -c:a:0 copy /dev/null && ffmpeg  -i <inputfile> -pass 2 \
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \
-c:v hevc_nvenc -profile main -preset slow -rc vbr_2pass \ 
-qmin 15 -qmax 20 -2pass 1 -c:a:0 copy <outputfile>

Adding 10bit:
ffmpeg  -i <inputfile> -pass 1 \ 
-c:v hevc_nvenc -profile:v main10 -preset slow \
-rc vbr_2pass -qmin 15 -qmax 20 -2pass 1 -c:a:0 copy /dev/null && ffmpeg  -i  <inputfile> -pass 2 \ 
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \
-c:v hevc_nvenc -profile:v main10 -preset slow \
-rc vbr_2pass -qmin 15 -qmax 20 -2pass 1 -c:a:0 copy <outputfile>

Adding 10bit with 4:4:4 conversion:
ffmpeg  -i <inputfile> -pass 1 \ 
-c:v hevc_nvenc -pix_fmt yuv444p16 -profile:v main10 -preset slow -rc vbr_2pass -qmin 15 -qmax 20 -2pass 1 -c:a:0 copy /dev/null && ffmpeg  -i <inputfile> -pass 2 \ 
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \ 
-c:v hevc_nvenc -pix_fmt yuv444p16 -profile:v main10 -preset slow -rc vbr_2pass -qmin 15 -qmax 20 -2pass 1 -c:a:0 copy <outputfile>

And finally, 10bit, 4:4:4 with the maximum look-ahead value Pascal supports, to help with motion heavy scenes:
ffmpeg -pass 1  -i <inputfile> \
-c:v hevc_nvenc -pix_fmt yuv444p16 -profile:v main10 -preset slow -rc vbr_2pass -qmin 15 -qmax 20 -2pass 1 -rc-lookahead 32 -c:a:0 copy /dev/null && ffmpeg -pass 2  -i <inputfile> \
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \
-c:v hevc_nvenc -pix_fmt yuv444p16 -profile:v main10 -preset slow -rc vbr_2pass -qmin 15 -qmax 20 -2pass 1 -rc-lookahead 32 -c:a:0 copy <outputfile>

This gist will be updated as the NVENC SDK adds more HEVC encode features. Refer to this portion on speeding up ffmpeg with GNU parallel on a multi-node cluster and this portion on using xargs to spawn multiple ffmpeg sessions for NVENC as needed.