Skip to content

Instantly share code, notes, and snippets.

@Sub-7
Created January 20, 2018 22:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Sub-7/bdfa5d1c3dfc9b1efbec35241c58a488 to your computer and use it in GitHub Desktop.
Save Sub-7/bdfa5d1c3dfc9b1efbec35241c58a488 to your computer and use it in GitHub Desktop.
Building a VAAPI enabled FFmpeg for use bound to the !/bin prefix

Build FFmpeg and libva with decode and encode hardware acceleration on an Intel-based validation testbed:

Build platform: Ubuntu

Install baseline dependencies first

sudo apt-get -y install autoconf automake build-essential libass-dev libtool pkg-config texinfo zlib1g-dev

We present two options, building all toolkits from source or installing the libva and mesa stacks from a PPA.

Option 1: Build the latest libva and all drivers from source:

First things first, build the dependency chain:

  1. cmrt:

This is the C for Media Runtime GPU Kernel Manager for Intel G45 & HD Graphics family. it's a prerequisite for building the intel-hybrid-driver package.

git clone https://github.com/01org/cmrt
cd cmrt
./autogen.sh
./configure
time make -j$(nproc) VERBOSE=1
sudo make -j$(nproc) install
sudo ldconfig -vvvv
  1. intel-hybrid-driver:

This package provides support for WebM project VPx codecs. GPU acceleration is provided via media kernels executed on Intel GEN GPUs. The hybrid driver provides the CPU bound entropy (e.g., CPBAC) decoding and manages the GEN GPU media kernel parameters and buffers.

It's a prerequisite to building libva with the desired configuration so we can access VPX-series hybrid decode capabilities on supported hardware configurations.

git clone https://github.com/01org/intel-hybrid-driver
cd intel-hybrid-driver
./autogen.sh
./configure
time make -j$(nproc) VERBOSE=1
sudo make -j$(nproc) install
sudo ldconfig -vvv
  1. intel-vaapi-driver:

This package provides the VA-API (Video Acceleration API) user mode driver for Intel GEN Graphics family SKUs. The current video driver back-end provides a bridge to the GEN GPUs through the packaging of buffers and commands to be sent to the i915 driver for exercising both hardware and shader functionality for video decode, encode, and processing. it also provides a wrapper to the intel-hybrid-driver when called up to handle VP8/9 hybrid decode tasks on supported hardware (Must be configured with the --enable-hybrid-codec option).

git clone https://github.com/01org/intel-vaapi-driver
cd intel-vaapi-driver
./autogen.sh
./configure --enable-hybrid-codec
time make -j$(nproc) VERBOSE=1
sudo make -j$(nproc) install
sudo ldconfig -vvvv
  1. libva:

Libva is an implementation for VA-API (Video Acceleration API)

VA-API is an open-source library and API specification, which provides access to graphics hardware acceleration capabilities for video processing. It consists of a main library and driver-specific acceleration backends for each supported hardware vendor.

git clone https://github.com/01org/libva
cd libva
./autogen.sh
./configure
time make -j$(nproc) VERBOSE=1
sudo make -j$(nproc) install

At this stage, reboot:

sudo systemctl reboot 

When done, test the VAAPI supported featureset by running vainfo:

vainfo

The output on my current testbed is:

libva info: VA-API version 0.40.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/local/lib/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_0_40
libva info: va_openDriver() returns 0
vainfo: VA-API version: 0.40 (libva 1.7.3)
vainfo: Driver version: Intel i965 driver for Intel(R) Skylake - 1.8.3.pre1 (glk-alpha-58-g5a984ae)
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            :	VAEntrypointVLD
      VAProfileMPEG2Simple            :	VAEntrypointEncSlice
      VAProfileMPEG2Main              :	VAEntrypointVLD
      VAProfileMPEG2Main              :	VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline:	VAEntrypointVLD
      VAProfileH264ConstrainedBaseline:	VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline:	VAEntrypointEncSliceLP
      VAProfileH264Main               :	VAEntrypointVLD
      VAProfileH264Main               :	VAEntrypointEncSlice
      VAProfileH264Main               :	VAEntrypointEncSliceLP
      VAProfileH264High               :	VAEntrypointVLD
      VAProfileH264High               :	VAEntrypointEncSlice
      VAProfileH264High               :	VAEntrypointEncSliceLP
      VAProfileH264MultiviewHigh      :	VAEntrypointVLD
      VAProfileH264MultiviewHigh      :	VAEntrypointEncSlice
      VAProfileH264StereoHigh         :	VAEntrypointVLD
      VAProfileH264StereoHigh         :	VAEntrypointEncSlice
      VAProfileVC1Simple              :	VAEntrypointVLD
      VAProfileVC1Main                :	VAEntrypointVLD
      VAProfileVC1Advanced            :	VAEntrypointVLD
      VAProfileNone                   :	VAEntrypointVideoProc
      VAProfileJPEGBaseline           :	VAEntrypointVLD
      VAProfileJPEGBaseline           :	VAEntrypointEncPicture
      VAProfileVP8Version0_3          :	VAEntrypointVLD
      VAProfileVP8Version0_3          :	VAEntrypointEncSlice
      VAProfileHEVCMain               :	VAEntrypointVLD
      VAProfileHEVCMain               :	VAEntrypointEncSlice
      VAProfileVP9Profile0            :	VAEntrypointVLD

Option 2: Skip building the dependencies manually and install them from a PPA:

This can be accomplished by adding the Oibaf PPA and the xorg-edgers-PPA to the system, and rebooting when done:

sudo add-apt-repository  ppa:xorg-edgers/ppa
sudo add-apt-repository ppa:oibaf/graphics-drivers
sudo apt-get update && sudo apt-get -y upgrade && sudo apt-get -y dist-upgrade && sudo systemctl reboot

On reboot, test the VAAPI supported featureset by running vainfo:

vainfo

Making a usable FFmpeg build to test the encoders:

Now, we will build an FFmpeg binary that can take advantage of VAAPI to test the encode and decode capabilities on Skylake, using a custom prefix because we load FFmpeg via the environment-modules system on the testbed.

Prepare the target directories first:

mkdir -p $HOME/bin
chown -Rc $USER:$USER $HOME/bin
mkdir -p ~/ffmpeg_sources

Include extra components as needed:

(a). Build and deploy nasm: Nasm is an assembler for x86 optimizations used by x264 and FFmpeg. Highly recommended or your resulting build may be very slow.

Note that we've now switched away from Yasm to nasm, as this is the current assembler that x265,x264, among others, are adopting.

cd ~/ffmpeg_sources
wget wget http://www.nasm.us/pub/nasm/releasebuilds/2.14rc0/nasm-2.14rc0.tar.gz
tar xzvf nasm-2.14rc0.tar.gz
cd nasm-2.14rc0
./configure --prefix="$HOME/bin" --bindir="$HOME/bin"
make -j$(nproc) VERBOSE=1
make -j$(nproc) install
make -j$(nproc) distclean

(b). Build and deploy libx264 statically: This library provides a H.264 video encoder. See the H.264 Encoding Guide for more information and usage examples. This requires ffmpeg to be configured with --enable-gpl --enable-libx264.

cd ~/ffmpeg_sources
wget http://download.videolan.org/pub/x264/snapshots/last_x264.tar.bz2
tar xjvf last_x264.tar.bz2
cd x264-snapshot*
PATH="$HOME/bin:$PATH" ./configure --prefix="$HOME/bin" --bindir="$HOME/bin" --enable-static --disable-opencl
PATH="$HOME/bin:$PATH" make -j$(nproc) VERBOSE=1
make -j$(nproc) install VERBOSE=1
make -j$(nproc) distclean

(c). Build and configure libx265: This library provides a H.265/HEVC video encoder. See the H.265 Encoding Guide for more information and usage examples.

sudo apt-get install cmake mercurial
cd ~/ffmpeg_sources
hg clone https://bitbucket.org/multicoreware/x265
cd ~/ffmpeg_sources/x265/build/linux
PATH="$HOME/bin:$PATH" cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX="$HOME/bin" -DENABLE_SHARED:bool=off ../../source
make -j$(nproc) VERBOSE=1
make -j$(nproc) install VERBOSE=1
make -j$(nproc) clean VERBOSE=1

(d). Build and deploy the libfdk-aac library: This provides an AAC audio encoder. See the AAC Audio Encoding Guide for more information and usage examples. This requires ffmpeg to be configured with --enable-libfdk-aac (and --enable-nonfree if you also included --enable-gpl).

cd ~/ffmpeg_sources
wget -O fdk-aac.tar.gz https://github.com/mstorsjo/fdk-aac/tarball/master
tar xzvf fdk-aac.tar.gz
cd mstorsjo-fdk-aac*
autoreconf -fiv
./configure --prefix="$HOME/bin" --disable-shared
make -j$(nproc)
make -j$(nproc) install
make -j$(nproc) distclean

(e). Build and configure libvpx

   cd ~/ffmpeg_sources
   git clone https://github.com/webmproject/libvpx/
   cd libvpx
   ./configure --prefix="$HOME/bin" --enable-runtime-cpu-detect --enable-vp9 --enable-vp8 \
   --enable-postproc --enable-vp9-postproc --enable-multi-res-encoding --enable-webm-io --enable-vp9-highbitdepth --enable-onthefly-bitpacking --enable-realtime-only \
   --cpu=native --as=yasm
   time make -j$(nproc)
   time make -j$(nproc) install
   time make clean -j$(nproc)
   time make distclean

(f). Build LibVorbis

   cd ~/ffmpeg_sources
   wget -c -v http://downloads.xiph.org/releases/vorbis/libvorbis-1.3.5.tar.xz
   tar -xvf libvorbis-1.3.5.tar.xz
   cd libvorbis-1.3.5
   ./configure --enable-static --prefix="$HOME/bin"
   time make -j$(nproc)
   time make -j$(nproc) install
   time make clean -j$(nproc)
   time make distclean

(g). Build FFmpeg:

cd ~/ffmpeg_sources
git clone https://github.com/FFmpeg/FFmpeg -b master
cd FFmpeg
PATH="$HOME/bin:$PATH" PKG_CONFIG_PATH="$HOME/bin/lib/pkgconfig" ./configure \
  --pkg-config-flags="--static" \
  --prefix="$HOME/bin" \
  --extra-cflags="-I$HOME/bin/include" \
  --extra-ldflags="-L$HOME/bin/lib" \
  --bindir="$HOME/bin" \
  --enable-debug=3 \
  --enable-vaapi \
  --enable-libvorbis \
  --enable-libvpx \
  --enable-gpl \
  --cpu=native \
  --enable-opengl \
  --enable-libfdk-aac \
  --enable-libx264 \
  --enable-libx265 \
  --extra-libs=-lpthread \
  --enable-nonfree 
PATH="$HOME/bin" make -j$(nproc) 
make -j$(nproc) install 
make -j$(nproc) distclean 
hash -r

Note: To get debug builds, omit the distclean step and you'll find the ffmpeg_g binary under the sources subdirectory.

We often want debug builds when an issue crops up and a gdb trace may be required for debugging purposes.

Sample snippets to test the new encoders:

Confirm that the VAAPI encoders have been built successfully:

ffmpeg  -hide_banner -encoders | grep vaapi 

 V..... h264_vaapi           H.264/AVC (VAAPI) (codec h264)
 V..... hevc_vaapi           H.265/HEVC (VAAPI) (codec hevc)
 V..... mjpeg_vaapi          MJPEG (VAAPI) (codec mjpeg)
 V..... mpeg2_vaapi          MPEG-2 (VAAPI) (codec mpeg2video)
 V..... vp8_vaapi            VP8 (VAAPI) (codec vp8)

See the help documentation for each encoder in question:

ffmpeg -hide_banner -h encoder='encoder name'

Test the encoders;

Using GNU parallel, we will encode some mp4 files (4k H.264 test samples, 40 minutes each, AAC 6-channel audio) on the ~/src path on the system to VP8 and HEVC respectively using the examples below. Note that I've tuned the encoders to suit my use-cases, and re-scaling to 1080p is enabled. Adjust as necessary.

To VP8, launching 10 encode jobs simultaneously:

parallel -j 10 --verbose '/apps/ffmpeg/dyn/bin/ffmpeg -loglevel debug -threads 4 -hwaccel vaapi -i "{}"  -vaapi_device /dev/dri/renderD129 -c:v vp8_vaapi -loop_filter_level:v 63 -loop_filter_sharpness:v 15 -b:v 4500k -maxrate:v 7500k -vf 'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' -c:a libvorbis -b:a 384k -ac 6 -f webm "{.}.webm"' ::: $(find . -type f -name '*.mp4')

To HEVC with GNU Parallel:

To HEVC Main Profile, launching 10 encode jobs simultaneously:

parallel -j 4 --verbose '/apps/ffmpeg/dyn/bin/ffmpeg -loglevel debug -threads 4 -hwaccel vaapi -i "{}"  -vaapi_device /dev/dri/renderD129 -c:v hevc_vaapi -qp:v 19 -b:v 2100k -maxrate:v 3500k -vf 'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' -c:a libvorbis -b:a 384k -ac 6 -f matroska "{.}.mkv"' ::: $(find . -type f -name '*.mp4')

Some notes:

  1. Intel's QuickSync is very efficient. See the power utilization traces and average system loads with 10 encodes running simultaneously here.
  2. Skylake's HEVC encoder is very slow, and I suspect that on my hardware, may be slower than the software-based x265 encoder and kvazaar's HEVC encoders. However, its' quality, when well tuned, is significantly superior to other hardware-based encoders such as the Nvidia NVENC HEVC encoder on Maxwell GM200-series SKUs. The NVENC encoder on Pascal is however faster and superior to the one on Intel's Skylake HEVC encoder implementation.
  3. Unlike Nvidia's NVENC, there are no simultaneous encodes limitations on the consumer SKUs. I was able to run 10 encode sessions siumultaneously with VAAPI, whereas with NVENC, I'd have been limited to two maximum simultaneous encodes on the GeForce GTX series GPUs on the testbeds. Good work, Intel.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment