Skip to content

Instantly share code, notes, and snippets.

@sidsethupathi
Last active April 21, 2020 17:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sidsethupathi/b464a6dc30907768a074d8dc526b2b66 to your computer and use it in GitHub Desktop.
Save sidsethupathi/b464a6dc30907768a074d8dc526b2b66 to your computer and use it in GitHub Desktop.
gstreamer nvcodec host vs GL memory profiled with nvprof

Running gstreamer 1.17 built through gst-build (gst-plugins-bad@b5a28df0f312f6e603d0c729f9f799a25a1f0a87)

  • Low res (320x420), using host memory is faster than using GL memory.
  • High res (3840x2160), using GL memory is faster than using host memory.
  • nvprof measurements of memory copy instructions don't explain that behavior.

Low resolution

# gst-discoverer-1.0 low_res.ts 

Properties:
  Duration: 0:09:59.997352000
  Seekable: yes
  Live: no
  container: MPEG-2 Transport Stream
    video: H.264 (Main Profile)
      Stream ID: 0332177a9fc52fecf7b4f60fea697c811a71987e037b0ae394d0273ffbc8abca:1/00000041
      Width: 320
      Height: 240
      Depth: 24
      Frame rate: 30/1
      Pixel aspect ratio: 1/1
      Interlaced: false
      Bitrate: 0
      Max bitrate: 0

Low resolution transcode using host memory

Execution time: 0:00:11.252269640

gst-launch-1.0 filesrc location=low_res.ts ! tsdemux ! h264parse ! nvh264dec ! "video/x-raw" ! nvh264enc ! fakesink

            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   49.35%  368.89ms     36000  10.246us  5.9200us  19.872us  [CUDA memcpy HtoD]
                   40.12%  299.85ms     36000  8.3290us  5.3120us  24.256us  [CUDA memcpy DtoH]
                    6.34%  47.353ms     36000  1.3150us  1.2150us  9.3440us  Convert_PL2BL
                    4.19%  31.291ms     18000  1.7380us  1.6640us  2.2400us  ConvertNV24toNV12
                    0.01%  77.632us        68  1.1410us     704ns  2.6240us  [CUDA memset]

Low resolution transcode using GL memory

Execution time: 0:00:20.584277338

gst-launch-1.0 filesrc location=low_res.ts ! tsdemux ! h264parse ! nvh264dec ! "video/x-raw(memory:GLMemory)" ! nvh264enc ! fakesink

            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   50.84%  79.086ms     72000  1.0980us     864ns  14.208us  [CUDA memcpy DtoD]
                   27.69%  43.070ms     36000  1.1960us     991ns  14.016us  Convert_PL2BL
                   21.41%  33.308ms     18000  1.8500us  1.5030us  2.8480us  ConvertNV24toNV12
                    0.05%  78.944us        68  1.1600us     672ns  2.6560us  [CUDA memset]

High resolution

# gst-discoverer-1.0 hi_res.ts 

Properties:
  Duration: 0:09:59.998824481
  Seekable: yes
  Live: no
  container: MPEG-2 Transport Stream
    video: H.264 (Main Profile)
      Stream ID: 431405b912470c7752b10402f2c5e9e93da618a677918195c637c8d0371e5414:1/00000041
      Width: 3840
      Height: 2160
      Depth: 24
      Frame rate: 30/1
      Pixel aspect ratio: 1/1
      Interlaced: false
      Bitrate: 0
      Max bitrate: 0

High resolution transode using host memory

Execution time: 0:03:20.462018560

gst-launch-1.0 filesrc location=hi_res.ts ! tsdemux ! h264parse ! nvh264dec ! "video/x-raw" ! nvh264enc ! fakesink

            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   54.40%  46.3980s     36000  1.2888ms  738.27us  2.7441ms  [CUDA memcpy HtoD]
                   42.74%  36.4568s     36000  1.0127ms  599.52us  3.0454ms  [CUDA memcpy DtoH]
                    1.47%  1.25313s     18000  69.618us  67.584us  72.192us  ConvertNV24toNV12
                    1.39%  1.18157s     36000  32.821us  23.328us  45.856us  Convert_PL2BL
                    0.00%  81.504us        66  1.2340us     704ns  2.6560us  [CUDA memset]

High resolution transcode using GL memory

Execution time: 0:02:18.106101429

gst-launch-1.0 filesrc location=hi_res.ts ! tsdemux ! h264parse ! nvh264dec ! "video/x-raw(memory:GLMemory)" ! nvh264enc ! fakesink

            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   50.48%  2.63958s     72000  36.660us  22.976us  58.976us  [CUDA memcpy DtoD]
                   25.11%  1.31285s     36000  36.468us  23.744us  49.024us  Convert_PL2BL
                   24.41%  1.27668s     18000  70.926us  67.585us  71.872us  ConvertNV24toNV12
                    0.00%  81.536us        66  1.2350us     704ns  2.9120us  [CUDA memset]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment