Brainiarc7/ffmpeg-livestream-to-streaming-sites-vaapi-nvenc.md

## ffmpeg-livestream-to-streaming-sites-vaapi-nvenc.md

      
    Raw
  

              ffmpeg-livestream-to-streaming-sites-vaapi-nvenc.md
            
          
    Streaming your Linux desktop to Youtube and Twitch via Nvidia's NVENC and VAAPI:
Considerations to take when live streaming:
The following best practice observations apply when using a hardware-based encoder for live streaming to any platform:


Set the buffer size (-bufsize:v) equal to the target bitrate (-b:v). You want to ensure that you're encoding in CBR mode.


Set up the encoders as shown:


(a). Omit the presets as they'll override the desired rate control mode.
(b). Use a rate  control mode that enforces a constant rate control mode. That way, we can provide a stream with a perfect fixed bit-rate (per variant, where applicable, with multiple outputs as shown in some examples below).
(c ). For multiple  outputs (where enabled), call up the versatile tee muxer, outputting to multiple streaming services (or to a file, if  so desired). Substitute all variables (such as stream keys) with your own.
(d). The thread count is also lowered to ensure that VBV underflows do not occur. Note that higher thread counts for hardware-accelerated encoders and decoders have rapidly diminishing returns.
(e. On consumer SKUs, NVENC is limited to two encode sessions only. This is by design. You can override this artificial limit by using this project.  This does not apply to the likes of AMD's VCE (via VAAPI) or Intel's VAAPI and QSV implementations.

With Nvidia's NVENC:

Without scaling the output:
If you want the output video frame size to be the same as the input for Twitch:
ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-b:v 2400k -minrate:v 2400k -maxrate:v 2400k -bufsize:v 2400k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>
To do the same for Youtube, do:
ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2  -b:a 128k \
-b:v 2400k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>
Note: Ensure that the size specified above (-s ) does match your screen's resolution. Also, ensure that the stream key used is correct, otherwise your live broadcast will fail.
You can optionally capture from the first KMS device (as long as you're a member of the video group) as shown:
For Twitch:
ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-device /dev/dri/card0 -f kmsgrab -i - \
-thread_queue_size 1024 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>
To do the same for Youtube, do:
ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>
Scaling the output:
If you want the output video frame size to be smaller than the input then you can insert the appropriate scale video filter for NVENC:
ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-filter:v "hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos" \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>
And for Youtube:
ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-filter:v "hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos" \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>
Note that the examples above will use the CUDA-based NPP acceleration in the GPU to re-scale the output to a perfect 720p (HD-ready) video stream on broadcast. This will require you to build ffmpeg with support for both NVENC and the CUDA SDK.
With webcam overlay:
This will place your webcam overlay in the top right:
ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-filter:v "hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos" \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>
And for Youtube:
ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-filter:v "hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos" \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r 60 -g 120 -bf:v 3 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>
You can see additional details your webcam with something like:
ffmpeg -f v4l2 -list_formats all -i /dev/video0 
Or with:
v4l2-ctl --list-formats-ext
See the documentation on the video4linux2 (v4l2) input device for more info.
With webcam overlay and logo:
This will place your webcam overlay in the top right, and a logo in the bottom left:
ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-i logo.png -filter_complex \
"[0:v]hwupload_cuda,scale_npp=1024:-1,setpts=PTS-STARTPTS[bg]; \
[1:v]hwupload_cuda,scale_npp=120:-1,setpts=PTS-STARTPTS[fg]; \
[bg][fg]overlay=W-w-10:10[bg2]; \
[bg2][3:v]overlay=W-w-10:H-h-10,format=yuv420p[out]"
-map "[out]" -map 2:a \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>
And for Youtube:
ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-i logo.png -filter_complex \
"[0:v]hwupload_cuda,scale_npp=1024:-1,setpts=PTS-STARTPTS[bg]; \
[1:v]hwupload_cuda,scale_npp=120:-1,setpts=PTS-STARTPTS[fg]; \
[bg][fg]overlay=W-w-10:10[bg2]; \
[bg2][3:v]overlay=W-w-10:H-h-10[out]"
-map "[out]" -map 2:a \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>
Streaming a file to Youtube:
ffmpeg -re -threads:v 2 -threads:a 8 -complex_filter_threads 2 -filter_threads 2 \
-i input.mkv \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>
Outputting to multiple streaming services & local file:
You can use the tee muxer to efficiently stream to multiple sites and save a local copy if desired. Using tee will allow you to encode only once and send the same data to multiple outputs. Using the onfail option will allow the other streams to continue if one fails.
ffmpeg -threads:v 2 -threads:a 8 -complex_filter_threads 2 -filter_threads 2 \
-i input -map 0 -bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f tee \
"[f=flv:onfail=ignore]rtmp://live.twitch.tv/app/<stream key>|  [f=flv:onfail=ignore]rtmp://a.rtmp.youtube.com/live2/<stream key>|local_file.mkv"
The example above will stream to both Youtube and twitch TV and at the same time, store a copy of the video stream on the local filesystem. Modify paths as needed.
Notes:
You can use xwininfo | grep geometry to select the target window and get placement coordinates. For example, an output of -geometry 800x600+284+175 would result in using -video_size 800x600 -i :0.0+284,175. You can also use it to automatically enter the input screen size: -video_size $(xwininfo -root | awk '/-geo/{print $2}').
The pulse input device (requires --enable-libpulse) can be an alternative to the ALSA input device, as in:
-f pulse -i default
As per your preference.
Part 2: Using Intel's VAAPI to achieve the same:
Using x11grab as shown below:
Without scaling the output:
If you want the output video frame size to be the same as the input for Twitch:
ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>
To do the same for Youtube, do:
ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>
With scaling the output:
If you want the output video frame size to be scaled for Twitch:
ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload,scale_vaapi=w=1920:h=1080:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>
To do the same for Youtube, do:
ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload,scale_vaapi=w=1920:h=1080:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>
Note: Ensure that you've set the admin SETCAP capability  for your FFmpeg binary if you intend to capture from KMS devices:
sudo setcap cap_sys_admin+ep /path/to/ffmpeg

As shown in the examples below. Note that KMS capture is highly prone to failure on multi-GPU systems when using direct device derivation for VAAPI encoder contexts, as shown below. Where possible, stick to x11grab instead.
Tips with using KMS:
KMS surfaces can be mapped directly to VAAPI, as shown in the example below, with scaling enabled:
ffmpeg -threads:v 2 -threads:a 8 -filter_threads 2 \
-framerate 60 -f kmsgrab -i - \
-vf 'hwmap=derive_device=vaapi,scale_vaapi=w=1920:h=1080:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16  \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Also, ensure that the stream key used is correct, otherwise your live broadcast will fail.
Scaling the output:
If you want the output video frame size to be smaller than the input then you can insert the appropriate scale video filter for VAAPI:
ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-f kmsgrab -i - \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-vf 'hwmap=derive_device=vaapi,scale_vaapi=w=1280:h=720:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>
And for Youtube:
ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-f kmsgrab -framerate 60 -i - \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-vf 'hwmap=derive_device=vaapi,scale_vaapi=w=1280:h=720:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>
Note that the examples above will use Intel's VAAPI-based scaling engines (available since Sandybridge) in the GPU to re-scale the output to a perfect 720p (HD-ready) video stream on broadcast.
With webcam overlay:
This will place your webcam overlay in the top right:
ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 1024 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>
And for Youtube:
ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 60 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>
You can see additional details your webcam with something like:
 ffmpeg -f v4l2 -list_formats all -i /dev/video0 

Or with:
v4l2-ctl --list-formats-ext

See the documentation on the video4linux2 (v4l2) input device for more info.
Note: Your webcam may natively support whatever frame size you want to overlay onto the main video, so scaling the webcam video as shown in this example can be omitted (just set the appropriate v4l2 -video_size and remove the scale=120:-1,). See this answer on superuser for more details on the same.
With webcam overlay and logo:
This will place your webcam overlay in the top right, and a logo in the bottom left:
ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-i logo.png -filter_complex \
"[0:v]format=nv12,hwupload,scale_vaapi=1024:-1,setpts=PTS-STARTPTS[bg]; \
[1:v]format=nv12,hwupload,scale_vaapi=120:-1,setpts=PTS-STARTPTS[fg]; \
[bg][fg]overlay=W-w-10:10[bg2]; \
[bg2][3:v]overlay=W-w-10:H-h-10,format=yuv420p[out]"
-map "[out]" -map 2:a 
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>
And for Youtube:
ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-i logo.png -filter_complex \
"[0:v]format=nv12,hwupload,scale_vaapi=1024:-1,setpts=PTS-STARTPTS[bg]; \
[1:v]format=nv12,hwupload,scale_vaapi=120:-1,setpts=PTS-STARTPTS[fg]; \
[bg][fg]overlay=W-w-10:10[bg2]; \
[bg2][3:v]overlay=W-w-10:H-h-10,format=yuv420p[out]"
-map "[out]" -map 2:a 
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>
Streaming a file to Youtube:
ffmpeg -threads:v 2 -threads:a 8 -complex_filter_threads 2 -filter_threads 2 \
-re -i input.mkv -init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va  \ 
-vf 'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-c:a aac -b:a 160k -ac 2 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>
Outputting to multiple streaming services & local file:
You can use the tee muxer to efficiently stream to multiple sites and save a local copy if desired. Using tee will allow you to encode only once and send the same data to multiple outputs. Using the onfail option will allow the other streams to continue if one fails.
ffmpeg -threads:v 2 -threads:a 8 -filter_threads 2 \
-re -i input.mkv -init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf 'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \ 
-c:a aac -b:a 160k -ac 2 -map 0 \
-f tee \
"[f=flv:onfail=ignore]rtmp://live.twitch.tv/app/<stream key>|[f=flv:onfail=ignore]rtmp://a.rtmp.youtube.com/live2/<stream key>|local_file.mkv"
The example above will stream to both YouTube and twitch TV and at the same time, store a copy of the video stream on the local file-system. Modify paths as needed.
Note: For VAAPI, use the scaler on local file encodes if, and only if, you need to use it, and for best results, stick to the original video stream dimensions, or downscale if needed (scale_vaapi=w=x:h=y) as up-scaling can introduce artifacting.
To determine the video stream's properties, you can run:
ffprobe -i path-to-video.ext

Where path-to-video-ext refers to the absolute path and name of the file.
Extras:
Outputting to multiple outputs can be achieved as shown, using a sample for both NVENC and VAAPI: (As requested by @semeion below, substitute the scale values with the resolutions you want).
(a). NVENC:
ffmpeg -loglevel $LOGLEVEL \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-f x11grab -thread_queue_size 512 -s "$INRES" -framerate "$FPS" -i :0.0 \
-f alsa -thread_queue_size 512 -ac 2 -i hw:0,0 \
-filter_complex "split=2[a][b]; \
[a]hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos[c]; \
[b]hwupload_cuda,scale_npp=w=1024:h=768:interp_algo=lanczos[d]" \
-b:v:0 2400k -minrate:v:0 2400k -maxrate:v:0 2400k -bufsize:v:0 2400k -c:v:0 h264_nvenc -qp:v:0 19  \
-profile:v:0 high -rc:v:0 cbr_ld_hq -level:v:0 4.1 -r:v 60 -g:v 120 -bf:v:0 3 -refs:v 16 \
-b:v:1 1500k -minrate:v:1 1500k -maxrate:v:1 1500k -bufsize:v:1 2400k -c:v:1 h264_nvenc -qp:v:1 19  \
-profile:v:1 high -rc:v:1 cbr_ld_hq -level:v:1 4.1 -r:v 60 -g:v 120 -bf:v:1 3 -refs:v 16 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-map "[c]" -map "[d]" -map "1:a" \
-f tee  \
"[select=\'v:0,a\':f=flv:onfail=ignore]"rtmp://$SERVER.twitch.tv/app/$STREAM_KEY"| \
[select=\'v:1,a\':f=flv:onfail=ignore]"rtmp://a.rtmp.youtube.com/live2/<stream key>""
(b). VAAPI:
ffmpeg -loglevel $LOGLEVEL \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-threads:v 2 -threads:a 8 -complex_filter_threads 2 -filter_threads 2 \
-f x11grab -thread_queue_size 512 -s "$INRES" -framerate "$FPS" -i :0.0 \
-f alsa -thread_queue_size 512 -ac 2 -i hw:0,0 \
-filter_complex "split=2[a][b]; \
[a]format=nv12,hwupload,scale_vaapi=w=1280:h=720[c]; \
[b]format=nv12,hwupload,scale_vaapi=w=1024:h=768[d]" \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v:0 2400k -minrate:v:0 2400k -maxrate:v:0 2400k -bufsize:v:0 2400k -c:v:0 h264_vaapi -qp:v:0 19  \
-profile:v:0 high -level:v:0 4.1 -r:v 60 -g:v 120 -bf:v:0 3 -refs:v 16 \
-b:v:1 1500k -minrate:v:1 1500k -maxrate:v:1 1500k -bufsize:v:1 2400k -c:v:1 h264_vaapi -qp:v:1 19  \
-profile:v:1 high -level:v:1 4.1 -r:v 60 -g:v 120 -bf:v:1 3 -refs:v 16 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-map "[c]" -map "[d]" -map "1:0" \
-f tee  \
"[select=\'v:0,a\':f=flv:onfail=ignore]"rtmp://$SERVER.twitch.tv/app/$STREAM_KEY"| \
[select=\'v:1,a\':f=flv:onfail=ignore]"rtmp://a.rtmp.youtube.com/live2/<stream key>""
Substitute the used variables with your own values.
Todo: Document the use of QuickSync (QSV) encoders in live-streaming.