Skip to content

Instantly share code, notes, and snippets.

@Koulil77
Created October 14, 2017 06:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Koulil77/62c9e5defebb1ebb3fb5f7410d6d4d61 to your computer and use it in GitHub Desktop.
Save Koulil77/62c9e5defebb1ebb3fb5f7410d6d4d61 to your computer and use it in GitHub Desktop.
This gist shows you how to encode specifically to HEVC with ffmpeg's NVENC on supported hardware, with a two-pass profile and optional CUVID-based hardware-accelerated decoding.

Encoding high-quality HEVC content with FFmpeg - based NVENC encoder on supported hardware:

If you've built ffmpeg as instructed here on Linux and the ffmpeg binary is in your path, you can do fast HEVC encodes as shown below, using NVIDIA's NPP's libraries to vastly speed up the process.

Now, to do a simple NVENC encode in 1080p, (that will even work for Maxwell Gen 2 (GM200x) series), start with:

ffmpeg  -i <inputfile>  \
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=nv12:interp_algo=lanczos,hwdownload \
-c:v hevc_nvenc -profile main -preset slow -rc vbr_hq \ 
 -c:a copy <outputfile>

Note that this encode method lacks 10-bit support and is in the 4:2:0 color space.

Extra notes: For full hardware-accelerated transcodes, you may also want to use one of the many Nvidia CUVID-based accelerated decoders available in your FFmpeg build. See the list available on your system as shown here.

Add the appropriate CUVID decoder to the command line based on the source media file:

  1. For transcoding 8-bit H.264/AVC content to the same or to 8-bit HEVC content as the final result, append -hwaccel cuvid -c:v h264_cuvid to the ffmpeg arguments before the -i option.

  2. For transcoding 8-bit HEVC content to the same or to 8-bit H.264 content as the final result, append -hwaccel cuvid -c:v hevc_nvenc to the ffmpeg arguments before the -i option.

  3. Follow the same guide in transcoding 8-bit content supported by CUVID's decoder as shown above, linked to the previous gist, as per the input format.

  4. Now, for 10-bit encodes, you can now use the -hwaccel cuvid option (as the latest NVENC SDK enables FFmpeg to do full HDR transcoding), combined with -c:v {hwaccel_type} , which can be any of the following entries based on the source content codec:

(a).h263_cuvid: Nvidia CUVID H263 decoder (codec h263) (b).h264_cuvid: Nvidia CUVID H264 decoder (codec h264) (c).hevc_cuvid: Nvidia CUVID HEVC decoder (codec hevc) (d).mjpeg_cuvid: Nvidia CUVID MJPEG decoder (codec mjpeg) (e).mpeg1_cuvid: Nvidia CUVID MPEG1VIDEO decoder (codec mpeg1video) (f).mpeg2_cuvid: Nvidia CUVID MPEG2VIDEO decoder (codec mpeg2video) (g).mpeg4_cuvid: Nvidia CUVID MPEG4 decoder (codec mpeg4) (h).vc1_cuvid: Nvidia CUVID VC1 decoder (codec vc1) (i).vp8_cuvid: Nvidia CUVID VP8 decoder (codec vp8) (j).vp9_cuvid: Nvidia CUVID VP9 decoder (codec vp9)

Note that decode support will vary on the platform you're on, and as such:

  1. Maxwell Generation 1 SKUs (GM107) is limited to H.264, MJPEG, and MPEG (1 through 4) decode support only.
  2. Second Generation Maxwell (GM204) is the same as Maxwell's first generation in terms of decode capability.
  3. Newer Maxwell GPUs (GM206 and the GM200) offer additional support for fixed function hardware accelerated HEVC decoding.
  4. All pascal GPUs (GP104, GP100, etc) offer support for all the above CUVID-based decoders.

An attempt to use a CUVID-based decoder that is not supported by your hardware will result in a CUDA-related error like this:

[vp9_cuvid @ 0x30bf700] ctx->cvdl->cuvidCreateDecoder(&cudec, &cuinfo) failed -> CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Stream mapping:
  Stream #0:0 -> #0:0 (vp9 (vp9_cuvid) -> h264 (h264_nvenc))
Error while opening decoder for input stream #0:0 : Generic error in an external library
[AVIOContext @ 0x30c14a0] Statistics: 0 seeks, 0 writeouts
[AVIOContext @ 0x30c16e0] Statistics: 882605 bytes read, 0 seeks

Here, I tried using the vp9_cuvid decoder on an unsupported platform (to be specific, a First generation Maxwell card) and it failed spectacularly.

Everything after this point will require a Pascal based card (10xx).

Adding 10bit:

ffmpeg  -i <inputfile>  \ 
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=nv12:interp_algo=lanczos,hwdownload \
-c:v hevc_nvenc -profile:v main10 -preset slow \
-rc vbr_hq -c:a:0 copy <outputfile>

Adding 10bit with 4:4:4 conversion:

ffmpeg  -i <inputfile> \ 
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=yuv444p16:interp_algo=lanczos,hwdownload \ 
-c:v hevc_nvenc -profile:v main10 -preset slow -rc vbr_hq -c:a:0 copy <outputfile>

And finally, 10bit, 4:4:4 with the maximum look-ahead value Pascal supports, which helps with motion heavy scenes:

ffmpeg -i <inputfile> \
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:format=yuv444p16:interp_algo=lanczos,hwdownload,format=nv12 \
-c:v hevc_nvenc  -profile:v main10 -preset slow -rc vbr_hq -rc-lookahead 32 -c:a:0 copy <outputfile>

Note: Using NVIDIA's NPP to speed up the encode and decode process as illustrated above has been documented extensively, refer to this gist for more information.

Hint: If you want to do the encodes without having to specify the target encodes resolution (skipping the nvidia-provided scaler), you may repeat the snippets above by removing the -filter:v argument in full.

This gist will be updated as the NVENC SDK adds more HEVC encode features. Refer to this portion on speeding up ffmpeg with GNU parallel on a multi-node cluster and this portion on using xargs to spawn multiple ffmpeg sessions for NVENC as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment