Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 93 You must be signed in to star a gist
  • Fork 26 You must be signed in to fork a gist
  • Save Brainiarc7/7b6049aac3145927ae1cfeafc8f682c1 to your computer and use it in GitHub Desktop.
Save Brainiarc7/7b6049aac3145927ae1cfeafc8f682c1 to your computer and use it in GitHub Desktop.
ffmpeg livestreaming to youtube via Nvidia's NVENC and Intel's VAAPI on supported hardware

Streaming your Linux desktop to Youtube and Twitch via Nvidia's NVENC and VAAPI:

Considerations to take when live streaming:

The following best practice observations apply when using a hardware-based encoder for live streaming to any platform:

  1. Set the buffer size (-bufsize:v) equal to the target bitrate (-b:v). You want to ensure that you're encoding in CBR mode.

  2. Set up the encoders as shown:

(a). Omit the presets as they'll override the desired rate control mode.

(b). Use a rate control mode that enforces a constant rate control mode. That way, we can provide a stream with a perfect fixed bit-rate (per variant, where applicable, with multiple outputs as shown in some examples below).

(c ). For multiple outputs (where enabled), call up the versatile tee muxer, outputting to multiple streaming services (or to a file, if so desired). Substitute all variables (such as stream keys) with your own.

(d). The thread count is also lowered to ensure that VBV underflows do not occur. Note that higher thread counts for hardware-accelerated encoders and decoders have rapidly diminishing returns.

(e. On consumer SKUs, NVENC is limited to two encode sessions only. This is by design. You can override this artificial limit by using this project. This does not apply to the likes of AMD's VCE (via VAAPI) or Intel's VAAPI and QSV implementations.

  1. With Nvidia's NVENC:

Without scaling the output:

If you want the output video frame size to be the same as the input for Twitch:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-b:v 2400k -minrate:v 2400k -maxrate:v 2400k -bufsize:v 2400k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

To do the same for Youtube, do:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2  -b:a 128k \
-b:v 2400k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Note: Ensure that the size specified above (-s ) does match your screen's resolution. Also, ensure that the stream key used is correct, otherwise your live broadcast will fail.

You can optionally capture from the first KMS device (as long as you're a member of the video group) as shown:

For Twitch:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-device /dev/dri/card0 -f kmsgrab -i - \
-thread_queue_size 1024 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

To do the same for Youtube, do:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Scaling the output:

If you want the output video frame size to be smaller than the input then you can insert the appropriate scale video filter for NVENC:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-filter:v "hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos" \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

And for Youtube:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-filter:v "hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos" \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Note that the examples above will use the CUDA-based NPP acceleration in the GPU to re-scale the output to a perfect 720p (HD-ready) video stream on broadcast. This will require you to build ffmpeg with support for both NVENC and the CUDA SDK.

With webcam overlay:

This will place your webcam overlay in the top right:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-filter:v "hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos" \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

And for Youtube:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-filter:v "hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos" \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r 60 -g 120 -bf:v 3 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

You can see additional details your webcam with something like:

ffmpeg -f v4l2 -list_formats all -i /dev/video0 

Or with:

v4l2-ctl --list-formats-ext

See the documentation on the ​video4linux2 (v4l2) input device for more info.

With webcam overlay and logo:

This will place your webcam overlay in the top right, and a logo in the bottom left:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-i logo.png -filter_complex \
"[0:v]hwupload_cuda,scale_npp=1024:-1,setpts=PTS-STARTPTS[bg]; \
[1:v]hwupload_cuda,scale_npp=120:-1,setpts=PTS-STARTPTS[fg]; \
[bg][fg]overlay=W-w-10:10[bg2]; \
[bg2][3:v]overlay=W-w-10:H-h-10,format=yuv420p[out]"
-map "[out]" -map 2:a \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

And for Youtube:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-i logo.png -filter_complex \
"[0:v]hwupload_cuda,scale_npp=1024:-1,setpts=PTS-STARTPTS[bg]; \
[1:v]hwupload_cuda,scale_npp=120:-1,setpts=PTS-STARTPTS[fg]; \
[bg][fg]overlay=W-w-10:10[bg2]; \
[bg2][3:v]overlay=W-w-10:H-h-10[out]"
-map "[out]" -map 2:a \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Streaming a file to Youtube:

ffmpeg -re -threads:v 2 -threads:a 8 -complex_filter_threads 2 -filter_threads 2 \
-i input.mkv \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Outputting to multiple streaming services & local file:

You can use the tee muxer to efficiently stream to multiple sites and save a local copy if desired. Using tee will allow you to encode only once and send the same data to multiple outputs. Using the onfail option will allow the other streams to continue if one fails.

ffmpeg -threads:v 2 -threads:a 8 -complex_filter_threads 2 -filter_threads 2 \
-i input -map 0 -bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f tee \
"[f=flv:onfail=ignore]rtmp://live.twitch.tv/app/<stream key>|  [f=flv:onfail=ignore]rtmp://a.rtmp.youtube.com/live2/<stream key>|local_file.mkv"

The example above will stream to both Youtube and twitch TV and at the same time, store a copy of the video stream on the local filesystem. Modify paths as needed.

Notes:

You can use xwininfo | grep geometry to select the target window and get placement coordinates. For example, an output of -geometry 800x600+284+175 would result in using -video_size 800x600 -i :0.0+284,175. You can also use it to automatically enter the input screen size: -video_size $(xwininfo -root | awk '/-geo/{print $2}').

The ​pulse input device (requires --enable-libpulse) can be an alternative to the ​ALSA input device, as in:

-f pulse -i default

As per your preference.

Part 2: Using Intel's VAAPI to achieve the same:

Using x11grab as shown below:

Without scaling the output:

If you want the output video frame size to be the same as the input for Twitch:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

To do the same for Youtube, do:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

With scaling the output:

If you want the output video frame size to be scaled for Twitch:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload,scale_vaapi=w=1920:h=1080:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

To do the same for Youtube, do:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload,scale_vaapi=w=1920:h=1080:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Note: Ensure that you've set the admin SETCAP capability for your FFmpeg binary if you intend to capture from KMS devices:

sudo setcap cap_sys_admin+ep /path/to/ffmpeg

As shown in the examples below. Note that KMS capture is highly prone to failure on multi-GPU systems when using direct device derivation for VAAPI encoder contexts, as shown below. Where possible, stick to x11grab instead.

Tips with using KMS:

KMS surfaces can be mapped directly to VAAPI, as shown in the example below, with scaling enabled:

ffmpeg -threads:v 2 -threads:a 8 -filter_threads 2 \
-framerate 60 -f kmsgrab -i - \
-vf 'hwmap=derive_device=vaapi,scale_vaapi=w=1920:h=1080:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16  \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Also, ensure that the stream key used is correct, otherwise your live broadcast will fail.

Scaling the output:

If you want the output video frame size to be smaller than the input then you can insert the appropriate scale video filter for VAAPI:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-f kmsgrab -i - \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-vf 'hwmap=derive_device=vaapi,scale_vaapi=w=1280:h=720:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

And for Youtube:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-f kmsgrab -framerate 60 -i - \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-vf 'hwmap=derive_device=vaapi,scale_vaapi=w=1280:h=720:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Note that the examples above will use Intel's VAAPI-based scaling engines (available since Sandybridge) in the GPU to re-scale the output to a perfect 720p (HD-ready) video stream on broadcast.

With webcam overlay:

This will place your webcam overlay in the top right:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 1024 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

And for Youtube:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 60 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

You can see additional details your webcam with something like:

 ffmpeg -f v4l2 -list_formats all -i /dev/video0 

Or with:

v4l2-ctl --list-formats-ext

See the documentation on the ​video4linux2 (v4l2) input device for more info.

Note: Your webcam may natively support whatever frame size you want to overlay onto the main video, so scaling the webcam video as shown in this example can be omitted (just set the appropriate v4l2 -video_size and remove the scale=120:-1,). See this answer on superuser for more details on the same.

With webcam overlay and logo:

This will place your webcam overlay in the top right, and a logo in the bottom left:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-i logo.png -filter_complex \
"[0:v]format=nv12,hwupload,scale_vaapi=1024:-1,setpts=PTS-STARTPTS[bg]; \
[1:v]format=nv12,hwupload,scale_vaapi=120:-1,setpts=PTS-STARTPTS[fg]; \
[bg][fg]overlay=W-w-10:10[bg2]; \
[bg2][3:v]overlay=W-w-10:H-h-10,format=yuv420p[out]"
-map "[out]" -map 2:a 
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

And for Youtube:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-i logo.png -filter_complex \
"[0:v]format=nv12,hwupload,scale_vaapi=1024:-1,setpts=PTS-STARTPTS[bg]; \
[1:v]format=nv12,hwupload,scale_vaapi=120:-1,setpts=PTS-STARTPTS[fg]; \
[bg][fg]overlay=W-w-10:10[bg2]; \
[bg2][3:v]overlay=W-w-10:H-h-10,format=yuv420p[out]"
-map "[out]" -map 2:a 
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Streaming a file to Youtube:

ffmpeg -threads:v 2 -threads:a 8 -complex_filter_threads 2 -filter_threads 2 \
-re -i input.mkv -init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va  \ 
-vf 'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-c:a aac -b:a 160k -ac 2 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Outputting to multiple streaming services & local file:

You can use the tee muxer to efficiently stream to multiple sites and save a local copy if desired. Using tee will allow you to encode only once and send the same data to multiple outputs. Using the onfail option will allow the other streams to continue if one fails.

ffmpeg -threads:v 2 -threads:a 8 -filter_threads 2 \
-re -i input.mkv -init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf 'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \ 
-c:a aac -b:a 160k -ac 2 -map 0 \
-f tee \
"[f=flv:onfail=ignore]rtmp://live.twitch.tv/app/<stream key>|[f=flv:onfail=ignore]rtmp://a.rtmp.youtube.com/live2/<stream key>|local_file.mkv"

The example above will stream to both YouTube and twitch TV and at the same time, store a copy of the video stream on the local file-system. Modify paths as needed.

Note: For VAAPI, use the scaler on local file encodes if, and only if, you need to use it, and for best results, stick to the original video stream dimensions, or downscale if needed (scale_vaapi=w=x:h=y) as up-scaling can introduce artifacting.

To determine the video stream's properties, you can run:

ffprobe -i path-to-video.ext

Where path-to-video-ext refers to the absolute path and name of the file.

Extras:

Outputting to multiple outputs can be achieved as shown, using a sample for both NVENC and VAAPI: (As requested by @semeion below, substitute the scale values with the resolutions you want).

(a). NVENC:

ffmpeg -loglevel $LOGLEVEL \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-f x11grab -thread_queue_size 512 -s "$INRES" -framerate "$FPS" -i :0.0 \
-f alsa -thread_queue_size 512 -ac 2 -i hw:0,0 \
-filter_complex "split=2[a][b]; \
[a]hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos[c]; \
[b]hwupload_cuda,scale_npp=w=1024:h=768:interp_algo=lanczos[d]" \
-b:v:0 2400k -minrate:v:0 2400k -maxrate:v:0 2400k -bufsize:v:0 2400k -c:v:0 h264_nvenc -qp:v:0 19  \
-profile:v:0 high -rc:v:0 cbr_ld_hq -level:v:0 4.1 -r:v 60 -g:v 120 -bf:v:0 3 -refs:v 16 \
-b:v:1 1500k -minrate:v:1 1500k -maxrate:v:1 1500k -bufsize:v:1 2400k -c:v:1 h264_nvenc -qp:v:1 19  \
-profile:v:1 high -rc:v:1 cbr_ld_hq -level:v:1 4.1 -r:v 60 -g:v 120 -bf:v:1 3 -refs:v 16 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-map "[c]" -map "[d]" -map "1:a" \
-f tee  \
"[select=\'v:0,a\':f=flv:onfail=ignore]"rtmp://$SERVER.twitch.tv/app/$STREAM_KEY"| \
[select=\'v:1,a\':f=flv:onfail=ignore]"rtmp://a.rtmp.youtube.com/live2/<stream key>""

(b). VAAPI:

ffmpeg -loglevel $LOGLEVEL \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-threads:v 2 -threads:a 8 -complex_filter_threads 2 -filter_threads 2 \
-f x11grab -thread_queue_size 512 -s "$INRES" -framerate "$FPS" -i :0.0 \
-f alsa -thread_queue_size 512 -ac 2 -i hw:0,0 \
-filter_complex "split=2[a][b]; \
[a]format=nv12,hwupload,scale_vaapi=w=1280:h=720[c]; \
[b]format=nv12,hwupload,scale_vaapi=w=1024:h=768[d]" \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v:0 2400k -minrate:v:0 2400k -maxrate:v:0 2400k -bufsize:v:0 2400k -c:v:0 h264_vaapi -qp:v:0 19  \
-profile:v:0 high -level:v:0 4.1 -r:v 60 -g:v 120 -bf:v:0 3 -refs:v 16 \
-b:v:1 1500k -minrate:v:1 1500k -maxrate:v:1 1500k -bufsize:v:1 2400k -c:v:1 h264_vaapi -qp:v:1 19  \
-profile:v:1 high -level:v:1 4.1 -r:v 60 -g:v 120 -bf:v:1 3 -refs:v 16 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-map "[c]" -map "[d]" -map "1:0" \
-f tee  \
"[select=\'v:0,a\':f=flv:onfail=ignore]"rtmp://$SERVER.twitch.tv/app/$STREAM_KEY"| \
[select=\'v:1,a\':f=flv:onfail=ignore]"rtmp://a.rtmp.youtube.com/live2/<stream key>""

Substitute the used variables with your own values.

Todo: Document the use of QuickSync (QSV) encoders in live-streaming.

@nrdxp
Copy link

nrdxp commented Nov 8, 2019

anyway to have a webcam overlay with kmsgrab?

@kokoko3k
Copy link

I tried to use kmsgrab with nvidia, but it failed that way:

    koko@slimer# sudo cat /sys/module/nvidia_drm/parameters/modeset
    Y

    koko@slimer# sudo setcap cap_sys_admin+ep /usr/bin/ffmpeg 

    koko@slimer# ffmpeg  -device /dev/dri/card0 -f kmsgrab -i  - -f image2pipe -
    ffmpeg version n4.2.1 Copyright (c) 2000-2019 the FFmpeg developers
      built with gcc 9.2.0 (GCC)
      configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-fontconfig --enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libdrm --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libjack --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-nvdec --enable-nvenc --enable-omx --enable-shared --enable-version3
      libavutil      56. 31.100 / 56. 31.100
      libavcodec     58. 54.100 / 58. 54.100
      libavformat    58. 29.100 / 58. 29.100
      libavdevice    58.  8.100 / 58.  8.100
      libavfilter     7. 57.100 /  7. 57.100
      libswscale      5.  5.100 /  5.  5.100
      libswresample   3.  5.100 /  3.  5.100
      libpostproc    55.  5.100 / 55.  5.100
    [kmsgrab @ 0x56355affb7c0] Failed to open DRM device.
    pipe:: Invalid argument

I've posted the request to the official forum, but developer are silent.
Is it supposed to fail for a limitation of the nvidia driver, maybe, or am i doing something wrong?

Thanks!

@koreanfan
Copy link

Hello. I try your guide with modifications for amd card with vaapi. If i use vaapi with libx264 then i can stream on youtube and i have green color information about stream. If i use libx264 than i also can stream and have green color. But when i stream only with vaapi i have orange color information then shows red color and i have notification smth lika that "video buffering or low speed" and video of this broken stream can shows up in videos after 8-10 or even not shown. When i have red color with only vaapi i immediatly stop this stream and start stream with libx264 or libx264 with vaapi and its work good. So tis not porblem of my connection to the internet. Smth wrong with vaapi driver i guess or smth else. I try many setups for streaming with only vaapi and i dont have good result, all streams are broken. But when i use vaa[i to create screencapture into a file all works fine. If i use obs linux for stream with vaapi it works only with low bitrate<4000 with 30fps for 1920X1080 or 60 fps for 720p. When i use obs in windows i can set higher framerate, resol, bitrate cuz in windows used amd amf encoder. Why all can stream with ffmpeg vaapi from linux terminal but i cant stream. Videocard amd rx560, cpu fx8300, 8gb ram. So its not very rare pc and vaapi must work. Please help me.

@Brainiarc7
Copy link
Author

Brainiarc7 commented Feb 3, 2020 via email

@koreanfan
Copy link

AMD pro drivers on linux demonstrated very low fps in games with opengl. To use vulcan with amd drivers you need hi speed HDD lika ssd, cuz vulcan use hdd buffer for pre-caching resources. So i cant use vulcan cuz i have low fps and feezes with that api. I install debian bullseye and install latest obs and i can stream now with ffmpeg vaapi.

@koreanfan
Copy link

I try ffmpeg commandline again. How to stream a specific application and not the entire desktop with ffmpeg from linux. In the internet i found only that you can grab only desktop screen or region from desktop screen but not specific aaplication window.

@koreanfan
Copy link

i try your code
With webcam overlay and logo:

This will place your webcam overlay in the top right, and a logo in the bottom left:

ffmpeg -loglevel debug
-threads:v 2 -threads:a 8 -filter_threads 2
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va
-i logo.png -filter_complex
"[0:v]format=nv12,hwupload,scale_vaapi=1024:-1,setpts=PTS-STARTPTS[bg];
[1:v]format=nv12,hwupload,scale_vaapi=120:-1,setpts=PTS-STARTPTS[fg];
[bg][fg]overlay=W-w-10:10[bg2];
[bg2][3:v]overlay=W-w-10:H-h-10,format=yuv420p[out]"
-map "[out]" -map 2:a
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16
-f flv rtmp://live.twitch.tv/app/

But it doesnt work. I cant figure out proper code for transition from soft to hardware accel vaapi for
ffmpeg -y -f x11grab -video_size 1920x1080 -framerate 30 -i $DISPLAY
-f pulse -ac 2 -i default -i logo.png -i screenlogo.png -filter_complex
"[0:v]scale=1280:-1,setpts=PTS-STARTPTS[bg];[2:v]scale=162:-1,setpts=PTS-STARTPTS[bg2];[3:v]scale=120:-1,setpts=PTS-STARTPTS[bg3];
[bg][bg2]overlay=0:H-h[bg4];[bg4][bg3]overlay=W-w:0,format=yuv420p[v]"
-map "[v]" -map 1:a -c:v libx264 -g 60 -preset ultrafast
-b:v 3M -maxrate 3M -c:a aac -b:a 160k -ar 44100 -b:a 128k
-f flv x264.flv

@Brainiarc7
Copy link
Author

Brainiarc7 commented Jun 15, 2020 via email

@koreanfan
Copy link

Ok

@koreanfan
Copy link

Also i have mouse accel issue when i use vaapi encoder with obs or ffmpeg for stream/record dota 2 with opengl renderer. I create topics:
daniel-schuermann/mesa#184
ValveSoftware/Dota-2#1770
http://ffmpeg.org/pipermail/ffmpeg-user/2020-June/048983.html
I think this is video driver or x11 bug ,but other said this is dota 2 bug.

@koreanfan
Copy link

@simpletvlc
Copy link

ffmpeg -re -i https://radio.radyotvonline.net/radio/playlist.m3u8 -stream_loop -1 -i /home/media/video.mov -c:v libx264 -pix_fmt yuv420p -preset ultrafast -c:a mp3 -ab 128k -flvflags no_duration_filesize -f flv rtmp://127.0.0.1:25462/live/stream

I add mov on the radio url and stream it. How can I do this completely with the gpu server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment