Skip to content

Instantly share code, notes, and snippets.

Star You must be signed in to star a gist
Save Brainiarc7/95c9338a737aa36d9bb2931bed379219 to your computer and use it in GitHub Desktop.
This gist contains instructions on setting up FFmpeg and Libav to use VAAPI-based hardware accelerated encoding (on supported platforms) for H.264 (and H.265 on supported hardware) video formats.

Using VAAPI's hardware accelerated video encoding on Linux with Intel's hardware on FFmpeg and libav

Hello, brethren :-)

As it turns out, the current version of FFmpeg (version 3.1 released earlier today) and libav (master branch) supports full H.264 and HEVC encode in VAAPI on supported hardware that works reliably well to be termed "production-ready".

Assumptions:

Before taking on this manual, the author assumes that:

  1. The end-user can comfortably install and configure their Linux distribution of choice.
  2. The end user can install, upgrade, downgrade and resolve both conflicts and dependency resolution of packages on his/her distribution's package manager.
  3. That the user is comfortable with the Linux terminal, and can navigate through it.
  4. Basic competence on the shell, such as reading man files, using a text editor of choice, manipulating file operations on the same, etc is assumed.

And as an indemnity clause, I, the author, will not be liable for any damage, implied or otherwise, to your files, hardware or the stability of your machine as a consequence to using these instructions to achieve a similar feat as described in this gist.

Implications:

It means that when you're encoding content for use with your blogs or some fancy youtube download, you can do it much, much faster on hardware with lower processor utilization (so you can multi-task) , lesser heat output and, as a plus, is significantly faster (As tested on my end, ~8.7x for 1080p and ~4.2x for 4k encodes with reference media) compared to a pure, software-based approach as offered by libx264 and similar implementations, albeit at an acceptable quality compromise.

Here goes:

First, you will need to build ffmpeg (and libav,as per your preferences) with appropriate arguments. --enable-vaapi switch should be enough, though.

Here are my build options (Note that I load ffmpeg and libav via the module system):

FFmpeg's module files are here, and as more versions are compiled, more modules will be added. Libav's module files are here, and as more versions are compiled, more modules will be added.

FFmpeg's configuration switches used:

./configure --enable-nonfree --enable-gpl --enable-version3
--enable-libass --enable-libbluray --enable-libmp3lame
--enable-libopencv --enable-libopenjpeg --enable-libopus
--enable-libfaac --enable-libfdk-aac --enable-libtheora
--enable-libvpx --enable-libwebp --enable-opencl --enable-x11grab
--enable-opengl --cpu=native --enable-nvenc --enable-vaapi
--enable-vdpau  --enable-ladspa --enable-libass  --enable-libgsm
--enable-libschroedinger --enable-libsmbclient --enable-libsoxr
--enable-libspeex --enable-libssh --enable-libwavpack --enable-libxvid
--enable-libx264 --enable-libx265 --enable-netcdf  --enable-openal
--enable-openssl --enable-cuda --prefix=/apps/ffmpeg/git --enable-omx

Libav's configuration switches used:

./configure --prefix=/apps/libav/11.7 --enable-gpl --enable-version3
--enable-nonfree --enable-runtime-cpudetect --enable-gray
--enable-vaapi --enable-vdpau --enable-vda --enable-libmp3lame
--enable-libopenjpeg --enable-libopus --enable-libfaac
--enable-libfdk-aac --enable-libtheora --enable-libvpx
--enable-libwebp  --enable-x11grab  --cpu=native  --enable-vaapi
--enable-vdpau  --enable-libgsm --enable-libschroedinger
--enable-libspeex --enable-libwavpack --enable-libxvid
--enable-libx264 --enable-libx265 --enable-openssl --enable-nvenc
--enable-cuda --enable-omx

Then run make and make install to build and install the toolkits respectively.

Warning: These options are for reference only, a useful FFmpeg build will require you to install appropriate dependencies for some build options as suited to your environment and platform. Modify as needed. Also see the indemnity clause at the top of this document.

Here are the dependencies I had to install on my end (without acounting for the OpenMAX IL bellagio back-end):

sudo apt-get install yasm ladspa-sdk ladspa-foo-plugins ladspalist libass5 libass-dev libbluray-bdj libbluray-bin libbluray-dev libbluray-doc libbluray1 libmp3lame-dev \ libmp3lame-ocaml libmp3lame-ocaml-dev libmp3lame0 libsox-fmt-mp3 libopencv-* opencv-* python-cv-bridge python-image-geometry python-opencv python-opencv-apps gstreamer1.0-vaapi gstreamer1.0-vaapi-doc libopenjp2-* libopenjp2-7-dev libopenjp2-7-dbg libopenjp3d7 libopenjpeg-dev libopenjpeg-java libopenjpeg5 libopenjpeg5-dbg libopenjpip7 openjpeg-tools libopus-dbg libopus-dev libopus-doc libopus0 libtag1-dev libtag1-doc libtag1v5 libtagc0 libtagc0-dev libopus-ocaml libopus-ocaml-dev libopusfile-dev libopusfile-doc libopusfile0 libvorbis-java opus-tools opus-tools-dbg libfaac-dev libfaac0 fdkaac  libfdk-aac0 libfdk-aac0-dbg libfdk-aac-dev libtheora-dbg libtheora-dev libtheora-doc libtheora0 libtheora-bin libtheora-ocaml libtheora-ocaml-dev libvpx-dev libvpx-doc libvpx3 libvpx3-dbg libwebp-dev libwebp5 libwebpdemux1 libwebpmux1 opencl-headers mesa-vdpau-drivers libvdpau-va-gl1 vdpauinfo vdpau-va-driver libvdpau-doc libvdpau-dev libvdpau1 libvdpau1-dbg libgsm-tools libgsm0710-0 libgsm0710-dev libgsm0710mux3 libgsm1 libgsm1-dbg libgsm1-dev sox libsox-dev libsox-fmt-all libsox-fmt-alsa libsox-fmt-ao libsox-fmt-base libsox-fmt-mp3 libsox-fmt-oss libsox-fmt-pulse libsox2 libsoxr-dev libsoxr-lsr0 libschroedinger-dev libschroedinger-doc libschroedinger-ocaml libschroedinger-ocaml-dev libschroedinger-1.0-0 libsmbclient libsmbclient-dev  smbclient  libspeex-dev libspeex1 libspeexdsp-dev libspeexdsp1 libspeex-ocaml libspeex-ocaml-dev libspeex-dbg libssh-4 libssh-dev libssh-dbg libssh-doc  libssh-gcrypt-4 libssh2-1 libssh2-1-dev libwavpack-dev libwavpack1 libxvidcore-dev libxvidcore4  libx265-dev libx265-79 libx265-doc libx264-148 libx264-dev libnetcdf-* netcdf-* libopenal-* openal-info  openssl 

When done, you may then create and load the appropriate environment modules for both ffmpeg and libav as your choices go. Don't load both at the same time, though :-) (Mark them as module conflicts to ensure that if this is set up on a cluster, library conflicts do not occur when users inadvertently load both of them by accident in the same session).

Now, we get to the interesting bits:

Encoding with VAAPI

You'll notice that we pass several arguments to ffmpeg as indicated below:

ffmpeg -loglevel debug -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i "input
file" -vf 'format=nv12,hwupload' -map 0:0 -map 0:1 -threads 8 -aspect
16:9 -y -f matroska -acodec copy -b:v 12500k -vcodec h264_vaapi
"output file"

Let's break down these arguments to their meaning:

(a) .-loglevel tells ffmpeg to log ffmpeg events as debug output. This will be very verbose, and is completely optional. You can disregard this.

(b). -vaapi_device: This is important. You must select a valid VAAPI H/W context device to which you will upload textures to via hwupload, formatted in the NV12 colorspace. This points to a /dev/dri/render*_ file on your Linux system.

(c). -vf : This is an inbuilt ffmpeg option that allows you to specify codec options/arguments to be passed to our encoder, in this case, h264_vaapi (Remember, we built this when we passed --enable-vaapi at the configuration stage). Here, we tell ffmpeg to convert all textures to one colorspace, NV12 (As it's the one accepted by Intel's QuickSync hardware encoder) and to also use hwupload, an ffmpeg intrinsic, that tells the program to asynchronously copy the converted pixel data to VAAPI's surfaces.

(d). - threads : Specifies the number of threads that FFmpeg should use. By default, use the number of logical processors available on your processor here. On Intel processors that support Hyperthreading, multiply the number of cores your processor has by 2.

(e). -f : Specifies the container format specification you can use. This can be Matroska, webm, mp4, etc. Take your pick (as per your container constraints).

(f). -acodec: Specifies the audio codec to use when transcoding the video's audio stream. In the example given above, we use ffmpeg's muxers to copy the audio stream as is, untouched.

(g). -vcodec: Selects the video encoder to use. In this case, we selected h264_vaapi, our key point of interest here.

(h).-hwaccel vaapi: This instructs ffmpeg to use VAAPI based hardware accelerated decode (for supported codecs, see platform limits), and it can drastically lower the processor load during the process. Note that you should only use this option if your hardware supports hardware-accelerated decoding via VAAPI for the source fornat being encoded.

(i). Using the vaapi_scaler in the video filters: It is possible to use Intel's QuickSync hardware via VAAPI for resize and scaling (when up-or downscaling the input source to a higher or lower resolution), using a filter snippet such as the one shown below:

vf 'format=nv12,hwupload,scale_vaapi=w=1920:h=1080'

You may specify a different resolution by changing the dimensions in =w= and :h= to suit your needs.

See an example of this filter snippet used above in the two-pass example in FFmpeg below.

(j). -hwaccel_output_format : This option should be used every time you declare the -hwaccel method as vaapi , so that the decode stage takes place entirely in hardware. This option generates decode output directly on VAAPI hardware surfaces, speeding up decode performance significantly.

You may confirm supported decode formats on your setup by running vainfo:

vainfo

Sample output on a Haswell testbed:

libva info: VA-API version 0.39.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_0_39
libva info: va_openDriver() returns 0
vainfo: VA-API version: 0.39 (libva 1.7.0)
vainfo: Driver version: Intel i965 driver for Intel(R) Haswell Mobile - 1.7.0
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            :	VAEntrypointVLD
      VAProfileMPEG2Simple            :	VAEntrypointEncSlice
      VAProfileMPEG2Main              :	VAEntrypointVLD
      VAProfileMPEG2Main              :	VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline:	VAEntrypointVLD
      VAProfileH264ConstrainedBaseline:	VAEntrypointEncSlice
      VAProfileH264Main               :	VAEntrypointVLD
      VAProfileH264Main               :	VAEntrypointEncSlice
      VAProfileH264High               :	VAEntrypointVLD
      VAProfileH264High               :	VAEntrypointEncSlice
      VAProfileH264MultiviewHigh      :	VAEntrypointVLD
      VAProfileH264MultiviewHigh      :	VAEntrypointEncSlice
      VAProfileH264StereoHigh         :	VAEntrypointVLD
      VAProfileH264StereoHigh         :	VAEntrypointEncSlice
      VAProfileVC1Simple              :	VAEntrypointVLD
      VAProfileVC1Main                :	VAEntrypointVLD
      VAProfileVC1Advanced            :	VAEntrypointVLD
      VAProfileNone                   :	VAEntrypointVideoProc
      VAProfileJPEGBaseline           :	VAEntrypointVLD

Supported encode formats are appended with the VAEntrypointEncSlice fields, and all decode formats(s) for your SKU will be listed under the VAEntryPointVLD and VAEntrypointVideoProc fields.

To interpret the output above, we can learn that the Haswell SKU above supports VAAPI - based hardware-accelerated decode for H.264 Simple, Main and Stereo High profiles (I'd assume that the Stereo High profile infers to H.264's Multi-view coding encode mode, useful for encoding 3D Blurays and similar media, implying feature parity with Windows-based implementations where MVC encodes and decodes are supported by Intel QuickSync. Need to test that sometime).

The other arguments are pretty standard to FFmpeg and need no introduction :-)

You may also use extra options such as QP mode (for constant-rate quality encoding) with this codec in ffmpeg as shown:

ffmpeg -loglevel debug -vaapi_device /dev/dri/renderD128 -i "input file" -vf 'format=nv12,hwupload' -map 0:0 -map 0:1 -threads 8 -aspect 16:9 -y -f matroska -acodec copy -vcodec h264_vaapi -qp 19 -bf 2 "output file"

Here, you'll notice that we've added a few extra options to the arguments passed to the selected video encoder, h264_vaapi, and they are as follows:

(a). -qp: This option selects Fixed QP of P frames, and is ignored if bit-rate is set instead. Particularly useful for CRF-based encodes where a constant quality is required without bit-rate constraints. For a standard reference, a QP value of ~18 gives an approximate visual quality value similar to lossless compression, and going higher (~51) will give you way worse visual quality.

(b). -bf: This option toggles the maximum number of B-frames (bi-directional) between P-(progressive) frames. You may pump this higher than the default (2) if your selected encoder profile is High or better. Recommended: Leave this at the default (2).

In my tests, it's also possible to do two-pass encoding with this encoder (h264_vaapi) in ffmpeg, as illustrated in the example below:

ffmpeg -loglevel debug -hwaccel vaapi -hwaccel_output_format vaapi -i "input-file" -vaapi_device /dev/dri/renderD129  -vf 'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' -pass 1 -qp:v 19 -b:v 10.5M -c:v h264_vaapi -bf 4 -threads 4 -aspect 16:9 -an -y -f mp4 "/dev/null" && ffmpeg -loglevel debug -hwaccel vaapi -hwaccel_output_format vaapi -i "phfx4k.mkv" -vaapi_device /dev/dri/renderD129 -vf 'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' -pass 2 -acodec copy -c:v h264_vaapi -bf 4 -qp:v 19 -b:v 10.5M -threads 4 -aspect 16:9 -y -f mp4 "output.mp4"

Let's break that down:

With ffmpeg (and libav also), you must specify both passes sequentially (-pass 1 and -pass 2) because ffmpeg does not reiterate over input files for multiple passes. Secondly, this allows the user to tune the two-pass encoding as he/she sees fit, for example, by skipping audio processing in the first pass (-an) and only copying/muxing the audio stream from the input file's container specification into the output file's container (-acodec copy), as illustrated in the examples above.

And now we move on to libav's options for a similar encode:

avconv -v 55 -y -vaapi_device /dev/dri/renderD128 -hwaccel vaapi -hwaccel_output_format vaapi -i input.mkv \
-c:a copy -vf 'format=nv12|vaapi,hwupload' -c:v h264_vaapi -bf 2 -b 12500k output.mkv

Let's break down these arguments to their meaning:

(a) .-v : This defines avconv's verbosity level. This one is completely optional, though its' regarded as good practice to leave it enabled and set to a reasonable verbosity level as desired for troubleshooting and diagnostics purposes.

(b). -vaapi_device: This is important. You must select a valid VAAPI H/W context device to which you will upload textures to via hwupload, formatted in the NV12 colorspace. This points to a /dev/dri/render*_ file on your Linux system.

(c). -hwaccel: This option allows you to select the hardware - based accelerated decoding to use for the encode session. In our case above, we are picking vaapi as this has a positive impact on encoder performance. A nice freebie.

(d). -hwaccel_output_format : This option should be used every time you declare the -hwaccel method as vaapi , so that the decode stage takes place entirely in hardware. This option generates decode output directly on VAAPI hardware surfaces, speeding up decode performance significantly.

(e). -vf : This is an inbuilt libav option that allows you to specify video filter options to be passed to our encoder, in this case, h264_vaapi (Remember, we built this when we passed --enable-vaapi at the configuration stage). Here, we tell libav to convert all textures to one colorspace, NV12 (As it's the one accepted by Intel's QuickSync hardware encoder) and to also use hwupload, a libav intrinsic, that tells the program to asynchronously copy the converted pixel data to VAAPI's surfaces. This argument also includes the hardware accelerated decode output format we requested earlier, raw VAAPI hardware surfaces.

(f). -bf : Specifies the bframe setting to use. Sane values for Intel 's Quick Sync encode hardware should be between 2 and 4. Test and report back.

(g). -c:a: Specifies the audio codec to use when transcoding the video's audio stream. In the example given above, we use libav's muxers to copy the audio stream as is, untouched.

(h). -c:v: Selects the video encoder to use. In this case, we selected h264_vaapi, our key point of interest here. (i). -b: Selects the video stream's bitrate passed to the encoder, h264_vaapi.

You may see the original documentation on Libav's website here on build instructions, using the alternate hevc_vaapi on supported hardware, encoder limitations, caveats, etc.

If all well according to plan, your video file should be encoded to H.264, muxed into the selected container and be done with.

See the screen-shot library here.

Extra information:

You can always view the build configuration of your Ffmpeg pipeline at any times by running:

For FFmpeg:

lin@mjanja:~$ ffmpeg -buildconf
ffmpeg version N-80785-g0fd76d7 Copyright (c) 2000-2016 the FFmpeg developers
  built with gcc 5.3.1 (Ubuntu 5.3.1-14ubuntu2.1) 20160413
  configuration: --enable-nonfree --enable-gpl --enable-version3 --enable-libass --enable-libbluray --enable-libmp3lame --enable-libopencv --enable-libopenjpeg --enable-libopus --enable-libfaac --enable-libfdk-aac --enable-libtheora --enable-libvpx --enable-libwebp --enable-opencl --enable-x11grab --enable-opengl --cpu=native --enable-nvenc --enable-vaapi --enable-vdpau --enable-ladspa --enable-libass --enable-libgsm --enable-libschroedinger --enable-libsmbclient --enable-libsoxr --enable-libspeex --enable-libssh --enable-libwavpack --enable-libxvid --enable-libx264 --enable-libx265 --enable-netcdf --enable-openal --enable-openssl --prefix=/apps/ffmpeg/git --enable-omx
  libavutil      55. 27.100 / 55. 27.100
  libavcodec     57. 48.101 / 57. 48.101
  libavformat    57. 40.101 / 57. 40.101
  libavdevice    57.  0.102 / 57.  0.102
  libavfilter     6. 46.102 /  6. 46.102
  libswscale      4.  1.100 /  4.  1.100
  libswresample   2.  1.100 /  2.  1.100
  libpostproc    54.  0.100 / 54.  0.100

  configuration:
    --enable-nonfree
    --enable-gpl
    --enable-version3
    --enable-libass
    --enable-libbluray
    --enable-libmp3lame
    --enable-libopencv
    --enable-libopenjpeg
    --enable-libopus
    --enable-libfaac
    --enable-libfdk-aac
    --enable-libtheora
    --enable-libvpx
    --enable-libwebp
    --enable-opencl
    --enable-x11grab
    --enable-opengl
    --cpu=native
    --enable-nvenc
    --enable-vaapi
    --enable-vdpau
    --enable-ladspa
    --enable-libass
    --enable-libgsm
    --enable-libschroedinger
    --enable-libsmbclient
    --enable-libsoxr
    --enable-libspeex
    --enable-libssh
    --enable-libwavpack
    --enable-libxvid
    --enable-libx264
    --enable-libx265
    --enable-netcdf
    --enable-openal
    --enable-openssl
    --prefix=/apps/ffmpeg/git
    --enable-omx

On help and documentation:

List all formats:

ffmpeg -formats

Display options specific to, and information about, a particular muxer:

ffmpeg -h muxer=matroska

Display options specific to, and information about, a particular demuxer:

ffmpeg -h demuxer=gif

Codecs (encoders and decoders):

List all codecs:

ffmpeg -codecs

List all encoders:

ffmpeg -encoders

List all decoders:

ffmpeg -decoders

Display options specific to, and information about, a particular encoder:

ffmpeg -h encoder=mpeg4

Display options specific to, and information about, a particular decoder:

ffmpeg -h decoder=aac

Reading the results

There is a key near the top of the output that describes each letter that precedes the name of the format, encoder, decoder, or codec:

$ ffmpeg -encoders
[…]
Encoders:
 V..... = Video
 A..... = Audio
 S..... = Subtitle
 .F.... = Frame-level multithreading
 ..S... = Slice-level multithreading
 ...X.. = Codec is experimental
 ....B. = Supports draw_horiz_band
 .....D = Supports direct rendering method 1
 ------
[…]
 V.S... mpeg4                MPEG-4 part 2

In this example V.S... indicates that the encoder mpeg4 is a Video encoder and supports Slice-level multithreading.

Extra notes for AMD hardware supporting VCE:

If you have a supported GCN+ AMD GPU running on Linux with the mesa driver stack, you may be able to use the AMD VCE Block via VAAPI with an example such as the one shown below:

DRI_PRIME=1 LIBVA_DRIVER_NAME=radeonsi ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -hwaccel_output_format vaapi \
-framerate 30 -video_size 1920x1200 -f x11grab -i :0.0 -f pulse -ac 2 -i 1 \
-vf 'format=nv12,hwupload' -threads 8 \
-vcodec h264_vaapi -bf 0 -acodec pcm_s16le output.mkv

Where we capture from the screen via x11grab and the audio from a pulseaudio device.

You must set the LIBVA_DRIVER_NAME and the DRI_PRIME=1 environment variables to radeonsi prior to using VAAPI on VCE, and ensure that the -vaapi_device points to the correct renderer.

Note that with AMD hardware, we generally disable B-Frame support as newer SKUs such as the RX 460/470/480 and their rebrands (Polaris-based) do not support B-Frames in H.264 encoding. See this issue on Github for more details.

@Brainiarc7
Copy link
Author

@mattold7 what driver are you using with VAAPI? Show me the output of vainfo.
You may also reach me directly via email, thanks.

@barolo
Copy link

barolo commented Jul 10, 2019

I'm trying to transcode 10bit hevc into 8bit h264_vaapi, but the output is garbled [ half of the screen flashes green ] on AMD RAVEN APU.
Could you check if I'm doing it properly?

ffmpeg
-init_hw_device vaapi=amd:/dev/dri/renderD128 -hwaccel vaapi -hwaccel_output_format vaapi
-hwaccel_device amd -filter_hw_device amd
-i nexp.mkv'
-vf "scale_vaapi=format=nv12,hwupload" -threads 6
-c:v h264_vaapi -profile:v 578
-c:a copy -bf 0 -c:s copy
-f mpegts -y plop.mkv

vainfo:
libva info: VA-API version 1.6.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib64/va/drivers/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_6
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.6 (libva 2.6.0.pre1)
vainfo: Driver version: Mesa Gallium driver 19.2.0-devel for AMD RAVEN (DRM 3.32.0, 5.2.0-gentoo, LLVM 8.0.0)
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileVC1Simple : VAEntrypointVLD
VAProfileVC1Main : VAEntrypointVLD
VAProfileVC1Advanced : VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
VAProfileH264Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointEncSlice
VAProfileH264High : VAEntrypointVLD
VAProfileH264High : VAEntrypointEncSlice
VAProfileHEVCMain : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointEncSlice
VAProfileHEVCMain10 : VAEntrypointVLD
VAProfileJPEGBaseline : VAEntrypointVLD
VAProfileVP9Profile0 : VAEntrypointVLD
VAProfileVP9Profile2 : VAEntrypointVLD
VAProfileNone : VAEntrypointVideoProc

@mj43
Copy link

mj43 commented Aug 12, 2019

I am trying to use vaapi to cutdown on cpu load as I have fairly low level systems z8350 (Cherrytrail) and N4200 (Broxton). When I use libx264 cpu on the z8350 usage is +95% and video is soft, situation with the N4200 is a bit better but leaves a bit to be desired in terms of video quality.
Using vaapi encoding gives better results and cuts overall CPU usage to ~50% z8350 and <50% N4200. However, there appears to be a memory leak and I can watch RAM usage creep from 600MB up until after about 15 minutes my available 3.7GB are used up and the system locks up. This occurs on both machines, OS in both cases is Ubuntu 18.04.3 LTS.
Is memory leak with vaapi not an issue with your build of FFMPEG? I ask as I am using the standard Ubuntu build.

ffmpeg -y -vaapi_device /dev/dri/renderD128 -i overlays/VB12.png -thread_queue_size 32768 -f rawvideo -vcodec rawvideo -s 1280x720 -pix_fmt bgr24 -framerate 25 -i - -filter_complex [1:v][0:v]overlay=705:572, format=nv12,hwupload -thread_queue_size 65536 -f alsa -ac 1 -i hw:0 -af adelay=250, volume=1, atempo=1.0 -vcodec h264_vaapi -qp:v 23 -profile:v 100 -level:v 31 /media/mark/Test/VR100test_0828.mp4

ffmpeg version 4.1.4-0york318.04 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 7 (Ubuntu 7.4.0-1ubuntu1
18.04.1)
configuration: --prefix=/usr --extra-version='0york3~18.04' --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-nonfree --enable-libfdk-aac --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 22.100 / 56. 22.100
libavcodec 58. 35.100 / 58. 35.100
libavformat 58. 20.100 / 58. 20.100
libavdevice 58. 5.100 / 58. 5.100
libavfilter 7. 40.101 / 7. 40.101
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 3.100 / 5. 3.100
libswresample 3. 3.100 / 3. 3.100
libpostproc 55. 3.100 / 55. 3.100
Input #0, png_pipe, from 'overlays/VB12.png':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: png, pal8(pc), 575x148 [SAR 11810:11810 DAR 575:148], 25 tbr, 25 tbn, 25 tbc
Input #1, rawvideo, from 'pipe:':
Duration: N/A, start: 0.000000, bitrate: 552960 kb/s
Stream #1:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1280x720, 552960 kb/s, 25 tbr, 25 tbn, 25 tbc
Guessed Channel Layout for Input Stream #2.0 : mono
Input #2, alsa, from 'hw:0':
Duration: N/A, start: 1565594935.407496, bitrate: 768 kb/s
Stream #2:0: Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s
Stream mapping:
Stream #0:0 (png) -> overlay:overlay (graph 0)
Stream #1:0 (rawvideo) -> overlay:main (graph 0)
hwupload (graph 0) -> Stream #0:0 (h264_vaapi)
Stream #2:0 -> #0:1 (pcm_s16le (native) -> aac (native))
Output #0, mp4, to '/media/mark/Test/VR100test_0828.mp4':
Metadata:
encoder : Lavf58.20.100
Stream #0:0: Video: h264 (h264_vaapi) (High) (avc1 / 0x31637661), vaapi_vld(progressive), 1280x720, q=-1--1, 25 fps, 12800 tbn, 25 tbc (default)
Metadata:
encoder : Lavc58.35.100 h264_vaapi
Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 69 kb/s
Metadata:
encoder : Lavc58.35.100 aac

@Brainiarc7
Copy link
Author

Hello @mj43,

You'd be best sorted by building FFmpeg + Libva components from source.
The git tip on FFmpeg sees a lot of commits, often with bug fixes and this is why I cannot recommend release versions for any production workflow.

@barolo that sounds like a bug in the radeonsi VAAPI driver, perhaps you should report that upstream?
The dri-devel mailing list is a good place to start.
For Intel-related bugs (such as with their vaapi drivers, i965 and iHD respectively), report these to the respective projects.
You may also want to post the same bug report on libva.

@barolo
Copy link

barolo commented Aug 16, 2019

@Brainiarc7
Reported it both to mesa and ffmpeg month ago, no response so far.
Solved it by using command below, unsure if it's a proper solution [ changing 10->8bit pixel format with hwaccel ]:

ffmpeg -threads 4 -init_hw_device vaapi=amd:/dev/dri/renderD128
-hwaccel vaapi -hwaccel_output_format vaapi -hwaccel_device amd -filter_hw_device amd
-i test.mkv \
-vf "hwdownload,format=p010,format=nv12,hwupload" \
-c:v h264_vaapi -profile:v 578 -c:a copy -c:s copy
-f mpegts out.mkv

@ferdnyc
Copy link

ferdnyc commented Sep 13, 2019

@Brainiarc7 Great resource, thanks.

Just a quick housekeeping request: Do you think you could line-wrap the sudo apt-get command line, and the two ffmpeg commands in your Haswell example, the same way you did the earlier ffmpeg examples? The three I mentioned are kind of #XXXtreemHorizontalScrolling right now. Thx!

(And as always, curse you GitHub for capping the page width, when you know you have to display unwrapped <pre> blocks.)

@Brainiarc7
Copy link
Author

Brainiarc7 commented Sep 13, 2019 via email

@martinpickett
Copy link

@Brainiarc7 Thank you for this guide (and the many other covering similar topics), I have found them very helpful.

I have had a lot of success with single pass VBR encoding, but more recently I have been attempting two-pass encoding with FFmpeg and h264_vaapi. For the first pass I have been using a modified version of the command line you suggest in this guide, however the "ffmpeg2pass.log" file is empty. Can you tell me what I am doing wrong? The exact command line I am running is:

ffmpeg -loglevel error -stats -hwaccel vaapi -i "input.mkv" -vaapi_device /dev/dri/renderD128 -vf 'format=nv12,hwupload' -pass 1 -b:v 6M -c:v h264_vaapi -threads 4 -aspect 16:9 -an -y -f matroska "/dev/null"

Many thanks for your guides and any help you can provide,
Martin

@Brainiarc7
Copy link
Author

@martinpickett two pass encoding is not implemented in these encoders. That's speaking for both VAAPI and QSV.

To my knowledge, the only hardware-accelerated encoder that implements anything close to what you're looking for is NVENC, and even then, that option is not exposed to the FFmpeg CLI but activated internally by the use of the private options -2pass:v 1 or by using the slow preset in the NVENC encoders.

See ffmpeg -h encoder=h264_nvenc and ffmpeg -h hevc_nvenc for more details on the same.

Glad you found this useful.

@martinpickett
Copy link

@Brainiarc7 Thank you for the quick response. I will have a look and NVENC.

@Anan5a
Copy link

Anan5a commented Nov 29, 2019

I would like to request a bash script. As it is easy for noobs!

Also i think many library/encoder don't need to be compiled as they come in the repository

@koreanfan
Copy link

Hello. I want to stream dota2 gameplay with amd rx560 video with proc fx-8300. I try obs on my debian buster and it works fine with x264. But i dont need many features from obs, i need simple stream. So i create 2 scripts. One of them start streaming and another stop streaming. But i have pixelate on my stream and video have micro freezes. With obs i dont have those issues. Below 2 scripts used by me:
1)start_streaming.sh
#!/bin/bash
ffmpeg -f x11grab -s 1920x1080 -r 30 -i :0.0 -f pulse -i 0 -f flv -ac 2 -ar 44100
-vcodec libx264 -g 60 -keyint_min 30 -b:v 6000k -minrate 6000k -maxrate 6000k -pix_fmt yuv420p
-s 1920x1080 -preset ultrafast -tune film -acodec aac -threads 0 -strict normal
-bufsize 6000k "rtmp://live.restream.io/live/mystreamkey"
2)stop_stream.sh
#!/bin/bash
pkill ffmpeg

I use 2 keybshortcut for those scripts to run stream and stop stream. Can you help me please to fix this issues and how i can use amd vaapi for streaming?

@shperrung
Copy link

shperrung commented Mar 14, 2020

Hi!
I use ffmpeg --libvidstab-enable library for de-shaking videos made my mom. I found this 2-pass method that perfectly work in case of software codec's (one command for seeking files by name mask -> shaking analysis -> EIS with encoding into preferable format:

find * -type f -size +10M ( -iname "mp4" ! -iname "_resized*" ) -exec ffmpeg -threads 3 -i {} -vf vidstabdetect=result="transforms.trf" -f null NUL ";" -exec ffmpeg -i {} -vf vidstabtransform=smoothing=30:input="transforms.trf" -c:v libx264 -preset veryslow -crf 18 -c:a copy '{}_resized_stab.mp4' ";" -exec rm ./transforms.trf ";"

How to use it with HW transcode?
#First pass:
ffmpeg -threads auto -i phone.MOV -vf vidstabdetect=result="transforms.trf" -f null NUL
#successfully completed
#Second pass:
ffmpeg -loglevel debug -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i phone.MOV -filter_complex 'hwupload,format=nv12, hwdownload,vidstabtransform=smoothing=30:input="transforms.trf",format=nv12,hwupload' -map 0:0 -map 0:1 -threads 2 -f mp4 -c:a aac -ab 128k -vcodec h264_vaapi -qp 19 -bf 2 phone_hw.mp4
#Is stopped with error

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x55e5df12f080] stts: 303354 ctts: -20, ctts_index: 15168, ctts_count: 22990
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x55e5d
[h264 @ 0x55e5df14d280] Param buffer (type 1, 240 bytes) is 0x8000002.
[h264 @ 0x55e5df14d280] Slice 0 param buffer (3128 bytes) is 0x8000001.            [h264 @ 0x55e5df14d280] Slice 0 data buffer (80523 bytes) is 0x8000000.
[h264 @ 0x55e5df14d280] Decode to surface 0x4000013.                               [AVHWFramesContext @ 0x7f5e38078200] Unmap surface 0x4000018.
[Parsed_format_1 @ 0x55e5df3be8c0] Setting 'pix_fmts' to value 'nv12'              [Parsed_vidstabtransform_3 @ 0x55e5df3bfa80] Setting 'smoothing' to value '30'
[Parsed_vidstabtransform_3 @ 0x55e5df3bfa80] Setting 'input' to value '"transforms.trf"'
[Parsed_vidstabtransform_3 @ 0x55e5df3bfa80] vidstabtransform filter: init v1.1 (2015-05-16)
[Parsed_format_4 @ 0x55e5df3c0bc0] Setting 'pix_fmts' to value 'nv12'              [graph 0 input from stream 0:1 @ 0x55e5df3c2180] Setting 'video_size' to value '1920x1080'                                                                            [graph 0 input from stream 0:1 @ 0x55e5df3c2180] Setting 'pix_fmt' to value '23'
[graph 0 input from stream 0:1 @ 0x55e5df3c2180] Setting 'time_base' to value '1/600'
[graph 0 input from stream 0:1 @ 0x55e5df3c2180] Setting 'pixel_aspect' to value '0/1'
[graph 0 input from stream 0:1 @ 0x55e5df3c2180] Setting 'sws_param' to value 'flags=2'
[graph 0 input from stream 0:1 @ 0x55e5df3c2180] Setting 'frame_rate' to value '30/1'
[graph 0 input from stream 0:1 @ 0x55e5df3c2180] w:1920 h:1080 pixfmt:nv12 tb:1/600 fr:30/1 sar:0/1 sws_param:flags=2
[format @ 0x55e5df3bf1c0] Setting 'pix_fmts' to value 'vaapi_vld'
[auto_scaler_0 @ 0x55e5df3c5380] w:iw h:ih flags:'bilinear' interl:0
[Parsed_format_1 @ 0x55e5df3be8c0] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_hwupload_0' and the filter 'Parsed_format_1'
Impossible to convert between the formats supported by the filter 'Parsed_hwupload_0' and the filter 'auto_scaler_0'
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:1
[AVIOContext @ 0x55e5df164b40] Statistics: 0 seeks, 0 writeouts
[aac @ 0x55e5df165580] Qavg: 4083.935
[aac @ 0x55e5df165580] 2 frames left in the queue on closing
[AVIOContext @ 0x55e5df137f00] Statistics: 1102747 bytes read, 11 seeks
Conversion failed!
Press any key to continue...

@ferdnyc
Copy link

ferdnyc commented Mar 15, 2020

@shperrung I'm no expert, but this looks bad to me:


[graph 0 input from stream 0:1 @ 0x55e5df3c2180] Setting 'pixel_aspect' to value '0/1'
[graph 0 input from stream 0:1 @ 0x55e5df3c2180] Setting 'sws_param' to value 'flags=2'
[graph 0 input from stream 0:1 @ 0x55e5df3c2180] Setting 'frame_rate' to value '30/1'
[graph 0 input from stream 0:1 @ 0x55e5df3c2180] w:1920 h:1080 pixfmt:nv12 tb:1/600 fr:30/1 sar:0/1 sws_param:flags=2
                                                                                           ^^^^^^^^^

Perhaps adding an -aspect argument, or adding scale=1920:1080,setsar=_something_ to your filter chain, would help?

@shperrung
Copy link

shperrung commented Mar 19, 2020

Hi, ferdnyc!
Unfortunately, I couldn't find right combination of these filters. Looks like -vidstabtansform filter is not compatible with HW transcode. I anyway get

[format @ 0x5594a3faf780] Setting 'pix_fmts' to value 'vaapi_vld'
[auto_scaler_0 @ 0x5594a3fb2cc0] w:iw h:ih flags:'bilinear' interl:0
[Parsed_format_1 @ 0x5594a3fad8c0] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_hwupload_0' and the filter 'Parsed_format_1'
Impossible to convert between the formats supported by the filter 'Parsed_hwupload_0' and the filter 'auto_scaler_0'
Error reinitializing filters!

@Odianosen25
Copy link

Odianosen25 commented Jul 1, 2020

Hello @Brainiarc7, thanks your your invaluable information, and it has been so instrumental in me understanding ffmpeg. I have been able to develop some good recording with hardware acceleration based on your example, and I wanted to do same for HLS. I use the following for my HLS

ffmpeg -init_hw_device vaapi=intel:/dev/dri/renderD128 -hwaccel_device intel -hwaccel vaapi -hwaccel_output_format vaapi -i /dev/video0 -filter_hw_device intel -vf 'format=nv12|vaapi,hwupload' -vcodec h264_vaapi -qp 25 -acodec copy -hls_time 1 -hls_list_size 2 -hls_init_time 1 -hls_flags delete_segments -hls_segment_filename 'file%03d.mp4' output.m3u8

It works but its like almost 30 seconds behind live. Compared to using CPU which is under 5 seconds. Any ideas why this is the case please? Been on this for weeks now.

Kind regards

@ferdnyc
Copy link

ferdnyc commented Jul 1, 2020

@Odianosen25:

Perhaps this comment on a StackOverflow question might shed some light?

It needs 3 segments before playback. Using hls-time is not enough though, you also need to specify a keyframe interval of 1s. You can use the -g (GOP size) option for that. Eg: -g 30 will insert an iframe each 30 frames and since your framerate is 30 fps that translates to 1 iframe / s

It seems likely that the more-efficient hardware encoder could be inserting keyframes less frequently than the CPU, by default.

@Odianosen25
Copy link

@ferdnyc thanks and I will try as said. I use the -g option for the CPU version and yes it helps. But I didn’t see it for the h264_vaapi encoder, so will try it and get back.

Kind regards

@ferdnyc
Copy link

ferdnyc commented Jul 1, 2020

@Odianosen25 Yeah, looking at the API docs gop_size is part of the VAAPIEncodeContext structure, so it looks like the same FFmpeg -g flag is respected whether the encoder is using the CPU or GPU.

The -b_depth and -idr_interval flags specific to h264_vaapi sound like they could also be meaningful here, except the default values are already a conservative 1 and 0 (respectively) so there isn't really any room for improvement there.

EDIT: As an alternative, hls_flags takes a split_by_time flag that forces splitting at hls_time intervals, even if the split point is not a keyframe. It sounds like that's fraught with peril, though, so setting -g is probably the better approach.

@Odianosen25
Copy link

Odianosen25 commented Jul 1, 2020

Wow @ferdnyc you just saved me weeks of head cracking stuff. Can't believe I spent so much time on this stuff, and all I needed was to add the -g flag. After spending weeks trying to figure it out, now I feel stupid> Thanks again. Now my CPU can relax and focus on other things ;). Even the compression is so amazing, less than 60k per segment. CPU gave over 3 times that

@ferdnyc
Copy link

ferdnyc commented Jul 1, 2020

🎉 @Odianosen25 That's great, glad to hear it was so effective! SO user aergistal deserves the real credit, I just got lucky with the googles.

@Odianosen25
Copy link

@Brainiarc7 or @ferdnyc,thanks for the help so far. I am trying to make use of hardware acceleration for decode, and then get the frames, but I seem to be having an issue doing it completely in hardware. I ran the following

'ffmpeg', '-y', '-hide_banner', '-init_hw_device', 'vaapi=intel:/dev/dri/renderD128', '-hwaccel_device', 'intel', '-hwaccel', 'vaapi', '-hwaccel_output_format', 'vaapi', '-thread_queue_size', '16', '-f', 'v4l2', '-i', '/dev/video0', '-filter_hw_device', 'intel', '-vf', 'format=nv12|vaapi,hwupload,scale_vaapi=w=1080:h=1920:format=nv12,hwdownload', '-f','rawvideo', '-pix_fmt', 'bgr24', '-']

And I get

[hwdownload @ 0x5574560e5480] Invalid output format bgr24 for hwframe download.
[Parsed_hwdownload_3 @ 0x5574560e6100] Failed to configure output pad on Parsed_hwdownload_3
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0
[video4linux2,v4l2 @ 0x5574560d7080] Some buffers are still owned by the caller on close

If I remove the filter, it works all good. Now I have an rtsp stream, that also doesn't work; but a strange thing with that, is that it wouldn't work unless I remove ``hwaccel_output_format', 'vaapi'`

So if I run

['ffmpeg', '-y', '-hide_banner', '-init_hw_device', 'vaapi=intel:/dev/dri/renderD128', '-hwaccel_device', 'intel', '-hwaccel', 'vaapi', '-hwaccel_output_format', 'vaapi', '-thread_queue_size', '16', '-rtsp_transport', 'tcp', '-i', 'rtsp://admin:@192.168.0.100:554', '-f', 'rawvideo', '-pix_fmt', 'bgr24', '-']

It tells me

Impossible to convert between the formats supported by the filter 'Parsed_null_0' and the filter 'auto_scaler_0'
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:0

But removing ``hwaccel_output_format', 'vaapi'`,all works good. Any ideas please?

Regards

@briantbutton
Copy link

briantbutton commented Mar 24, 2021

@Odianosen25, me too.

I have been banging on this particular outcome for two months. Always on Ubuntu 20.04, with good old H.264 (MP4) but I cannot seem to get away from the "Impossible to convert" message. I have gone from an AMD processor and an Intel i3 core. I am now wondering, "can I find a config that WILL work". (Kindly note that I have tried about 150 parameters combinations in my search for the holy grail.)

In the two lines below, note the first one. Where do these filters come from? Is it possible to understand that, not to mention control it?

[format @ 0x5641a8f95f40] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_null_0' and the filter 'format' Impossible to convert between the formats supported by the filter 'Parsed_null_0' and the filter 'auto_scaler_0' E

I did a search on 'auto_scaler_0' and ALL I got were complaints about the error message "Impossible to convert". I cannot find any information about it, other than it's proximity to failed video transcodes.

@sgjava
Copy link

sgjava commented Jan 14, 2022

@Brainiarc7 Trying to get this working headless. And I get:

No VA display found for device /dev/dri/renderD128.
Device creation failed: -22.
Failed to set value '/dev/dri/renderD128' for option 'vaapi_device': Invalid argument

vainfo will work without X using the following, but ffmpeg????

vainfo --display drm 
libva info: VA-API version 1.14.0
libva info: Trying to open /usr/local/lib/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_14
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.14 (libva 2.14.0.pre1)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 22.2.0 (35dc70f41)
vainfo: Supported profile and entrypoints
      VAProfileNone                   :	VAEntrypointVideoProc
      VAProfileNone                   :	VAEntrypointStats
      VAProfileMPEG2Simple            :	VAEntrypointVLD
      VAProfileMPEG2Main              :	VAEntrypointVLD
      VAProfileH264Main               :	VAEntrypointVLD
      VAProfileH264Main               :	VAEntrypointEncSlice
      VAProfileH264Main               :	VAEntrypointFEI
      VAProfileH264Main               :	VAEntrypointEncSliceLP
      VAProfileH264High               :	VAEntrypointVLD
      VAProfileH264High               :	VAEntrypointEncSlice
      VAProfileH264High               :	VAEntrypointFEI
      VAProfileH264High               :	VAEntrypointEncSliceLP
      VAProfileVC1Simple              :	VAEntrypointVLD
      VAProfileVC1Main                :	VAEntrypointVLD
      VAProfileVC1Advanced            :	VAEntrypointVLD
      VAProfileJPEGBaseline           :	VAEntrypointVLD
      VAProfileJPEGBaseline           :	VAEntrypointEncPicture
      VAProfileH264ConstrainedBaseline:	VAEntrypointVLD
      VAProfileH264ConstrainedBaseline:	VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline:	VAEntrypointFEI
      VAProfileH264ConstrainedBaseline:	VAEntrypointEncSliceLP
      VAProfileVP8Version0_3          :	VAEntrypointVLD
      VAProfileHEVCMain               :	VAEntrypointVLD
      VAProfileHEVCMain               :	VAEntrypointEncSlice
      VAProfileHEVCMain               :	VAEntrypointFEI
      VAProfileHEVCMain10             :	VAEntrypointVLD
      VAProfileVP9Profile0            :	VAEntrypointVLD

@sgjava
Copy link

sgjava commented Jan 15, 2022

OK, going to answer my own question here for someone else if needed.

sudo usermod -a -G render username
sudo usermod -a -G video username

username is your non-root user.

@TrueWodzu
Copy link

Hi @Brainiarc7 and guys :-)

I don't seem to benefit at all when using parallel transcoding, is this something you also noticed? For example:

Transcoding single file

ffmpeg version N-110411-gaf8be7bf43 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.3.0-1ubuntu1~22.04)
  configuration: --pkg-config-flags=--static --enable-static --disable-shared --prefix=/home/wodzu/ffmpeg_build --bindir=/home/wodzu/bin --extra-cflags=-I/home/wodzu/ffmpeg_build/include --extra-ldflags=-L/home/wodzu/ffmpeg_build/lib --extra-cflags=-I/opt/intel/mediasdk/include --extra-ldflags=-L/opt/intel/mediasdk/lib --extra-ldflags=-L/opt/intel/mediasdk/plugins --enable-vaapi --enable-libmfx --enable-opencl --disable-debug --enable-libdrm --enable-gpl --enable-runtime-cpudetect --enable-libx264 --enable-libx265 --enable-openssl --enable-pic --extra-libs='-lpthread -lm -lz -ldl' --enable-nonfree
  libavutil      58.  6.100 / 58.  6.100
  libavcodec     60. 10.100 / 60. 10.100
  libavformat    60.  5.100 / 60.  5.100
  libavdevice    60.  2.100 / 60.  2.100
  libavfilter     9.  6.100 /  9.  6.100
  libswscale      7.  2.100 /  7.  2.100
  libswresample   4. 11.100 /  4. 11.100
  libpostproc    57.  2.100 / 57.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test_01.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.20.100
  Duration: 00:02:00.00, start: 0.004000, bitrate: 1238 kb/s
  Stream #0:0[0x1](eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(progressive), 1920x1080, 1237 kb/s, 30 fps, 30 tbr, 1000k tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (h264_vaapi))
Press [q] to stop, [?] for help
Output #0, mp4, to 'no_parallel_vaapi.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf60.5.100
  Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), vaapi(progressive), 1920x1080, q=2-31, 30 fps, 15360 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc60.10.100 h264_vaapi
[out#0/mp4 @ 0x556472291900] video:8261kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.413080%
frame= 3600 fps=427 q=-0.0 Lsize=    8295kB time=00:01:59.93 bitrate= 566.6kbits/s speed=14.2x    

real	0m8,463s
user	0m1,441s
sys	0m0,622s

Transcoding two files in parallel

ffmpeg version N-110411-gaf8be7bf43 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.3.0-1ubuntu1~22.04)
  configuration: --pkg-config-flags=--static --enable-static --disable-shared --prefix=/home/wodzu/ffmpeg_build --bindir=/home/wodzu/bin --extra-cflags=-I/home/wodzu/ffmpeg_build/include --extra-ldflags=-L/home/wodzu/ffmpeg_build/lib --extra-cflags=-I/opt/intel/mediasdk/include --extra-ldflags=-L/opt/intel/mediasdk/lib --extra-ldflags=-L/opt/intel/mediasdk/plugins --enable-vaapi --enable-libmfx --enable-opencl --disable-debug --enable-libdrm --enable-gpl --enable-runtime-cpudetect --enable-libx264 --enable-libx265 --enable-openssl --enable-pic --extra-libs='-lpthread -lm -lz -ldl' --enable-nonfree
  libavutil      58.  6.100 / 58.  6.100
  libavcodec     60. 10.100 / 60. 10.100
  libavformat    60.  5.100 / 60.  5.100
  libavdevice    60.  2.100 / 60.  2.100
  libavfilter     9.  6.100 /  9.  6.100
  libswscale      7.  2.100 /  7.  2.100
  libswresample   4. 11.100 /  4. 11.100
  libpostproc    57.  2.100 / 57.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test_02.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.20.100
  Duration: 00:02:00.00, start: 0.004000, bitrate: 1238 kb/s
  Stream #0:0[0x1](eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(progressive), 1920x1080, 1237 kb/s, 30 fps, 30 tbr, 1000k tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (h264_vaapi))
Press [q] to stop, [?] for help
Output #0, mp4, to 'parallel_test_02_vaapi.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf60.5.100
  Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), vaapi(progressive), 1920x1080, q=2-31, 30 fps, 15360 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc60.10.100 h264_vaapi
[out#0/mp4 @ 0x55f5119dd900] video:8261kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.413080%
frame= 3600 fps=214 q=-0.0 Lsize=    8295kB time=00:01:59.93 bitrate= 566.6kbits/s speed=7.15x    
ffmpeg version N-110411-gaf8be7bf43 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.3.0-1ubuntu1~22.04)
  configuration: --pkg-config-flags=--static --enable-static --disable-shared --prefix=/home/wodzu/ffmpeg_build --bindir=/home/wodzu/bin --extra-cflags=-I/home/wodzu/ffmpeg_build/include --extra-ldflags=-L/home/wodzu/ffmpeg_build/lib --extra-cflags=-I/opt/intel/mediasdk/include --extra-ldflags=-L/opt/intel/mediasdk/lib --extra-ldflags=-L/opt/intel/mediasdk/plugins --enable-vaapi --enable-libmfx --enable-opencl --disable-debug --enable-libdrm --enable-gpl --enable-runtime-cpudetect --enable-libx264 --enable-libx265 --enable-openssl --enable-pic --extra-libs='-lpthread -lm -lz -ldl' --enable-nonfree
  libavutil      58.  6.100 / 58.  6.100
  libavcodec     60. 10.100 / 60. 10.100
  libavformat    60.  5.100 / 60.  5.100
  libavdevice    60.  2.100 / 60.  2.100
  libavfilter     9.  6.100 /  9.  6.100
  libswscale      7.  2.100 /  7.  2.100
  libswresample   4. 11.100 /  4. 11.100
  libpostproc    57.  2.100 / 57.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test_01.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.20.100
  Duration: 00:02:00.00, start: 0.004000, bitrate: 1238 kb/s
  Stream #0:0[0x1](eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(progressive), 1920x1080, 1237 kb/s, 30 fps, 30 tbr, 1000k tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (h264_vaapi))
Press [q] to stop, [?] for help
Output #0, mp4, to 'parallel_test_01_vaapi.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf60.5.100
  Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), vaapi(progressive), 1920x1080, q=2-31, 30 fps, 15360 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc60.10.100 h264_vaapi
[out#0/mp4 @ 0x55af719ce900] video:8261kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.413080%
frame= 3600 fps=214 q=-0.0 Lsize=    8295kB time=00:01:59.93 bitrate= 566.6kbits/s speed=7.15x    

real	0m17,423s
user	0m2,958s
sys	0m1,131s

As you can see, the time fro two files is just twice as for a single file.

And stats for my CPU are showing that bascially CPU is waiting for graphics card:

08:11:47     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
08:11:49     all    1,85    0,00    0,55   15,44    0,00    0,13    0,00    0,00    0,00   82,04
08:11:49       0    1,99    0,00    0,50    0,00    0,00    0,00    0,00    0,00    0,00   97,51
08:11:49       1    4,62    0,00    1,03   94,36    0,00    0,00    0,00    0,00    0,00    0,00
08:11:49       2    1,00    0,00    1,00    0,50    0,00    0,00    0,00    0,00    0,00   97,50
08:11:49       3    0,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00  100,00
08:11:49       4    0,50    0,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00   99,50
08:11:49       5    1,01    0,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00   98,99
08:11:49       6    4,48    0,00    0,50    0,00    0,00    0,00    0,00    0,00    0,00   95,02
08:11:49       7    0,50    0,00    0,50    0,00    0,00    0,00    0,00    0,00    0,00   98,99
08:11:49       8    1,02    0,00    1,02    0,00    0,00    1,53    0,00    0,00    0,00   96,43
08:11:49       9    5,10    0,00    1,02   93,37    0,00    0,00    0,00    0,00    0,00    0,51
08:11:49      10    1,01    0,00    0,50    0,00    0,00    0,00    0,00    0,00    0,00   98,49
08:11:49      11    1,00    0,00    0,50    0,00    0,00    0,00    0,00    0,00    0,00   98,50

I am having Intel 10th generation CPU, so it shouldn't be that bad? Tried almost everything, messing with all codec parameters and that gave me nothing. Any ideas? I just wanted to reiterate, that this is not iowait for hard drive, it is for the graphics card.

@Brainiarc7
Copy link
Author

@TrueWodzu could you show us your ffmpeg command-lines for troubleshooting?

@TrueWodzu
Copy link

Hi @Brainiarc7 thank you for your interest, here they are:

time ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -hwaccel_device /dev/dri/renderD129 -i test_01.mp4 -c:v h264_vaapi -profile: high -qp: 42 no_parallel_vaapi.mp4

time parallel -j 2 'ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -hwaccel_device /dev/dri/renderD129 -i {} -c:v h264_vaapi -profile: high -qp: 42 parallel_{.}_vaapi.mp4' ::: test_01.mp4 test_02.mp4

test_01.mp4 test_02.mp4 are identical.

@ferdnyc
Copy link

ferdnyc commented May 7, 2023

@TrueWodzu Something you may find interesting/helpful, Nvidia has published a doc on using GPU-accelereated FFmpeg for for encoding, decoding, transcoding, etc.

It's written for their hardware, of course, so all of the commands use -hwaccel cuda, but adapting them to -hwaccel vaapi shouldn't be that difficult, and they cover some pretty complex scenarios, including all sorts of split pipelines. The specific section I linked to is on parallel 1:N (so, really 2:N) transcoding & scaling, since that's closest to your use case, but the whole doc is pretty informative. Good source of inspiration / sanity-checking, if nothing else.

@TrueWodzu
Copy link

@ferdnyc many thanks for that! I will definitely read it. However, in my case in might be some strange issue related to Ubuntu/hardware, because on a different machine, with very old processor e3845 the same command line that I provided above is running nicely. What I mean by that is - running two ffmpegs in parallel has no decrease in performance. It is starting to decrease at 3 ffmpegs in parallel.

On my i7-10750H, which can run circles around e3845, when I run two ffmpegs in parallel, performance decreases two times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment