Skip to content

Instantly share code, notes, and snippets.

@ilia-cher
Created December 7, 2020 21:23
Show Gist options
  • Save ilia-cher/b42ef2727ed85a91d4a9c6a938e49695 to your computer and use it in GitHub Desktop.
Save ilia-cher/b42ef2727ed85a91d4a9c6a938e49695 to your computer and use it in GitHub Desktop.
iliacher@devgpu083:~/fbcode (20ae4497)$ ./buck-out/gen/caffe2/binaries/record_function_benchmark
Warm up
Tensor GEMM benchmark (1x1, 10000): 22792 us.
Tensor GEMM benchmark (16x16, 10000): 31387 us.
Pure RecordFunction benchmark (10000): 44 us.
Running without observers
Tensor GEMM benchmark (1x1, 10000): 7626 us.
Tensor GEMM benchmark (16x16, 10000): 10927 us.
Pure RecordFunction benchmark (10000): 84 us.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1207 13:21:49.587466 1378905 record_function.cpp:85] Adding a non-sampled callback / callback with high sampling frequency can cause singnificant runtime overhead
Running with empty non-sampled observer
Tensor GEMM benchmark (1x1, 10000): 14189 us.
Tensor GEMM benchmark (16x16, 10000): 17925 us.
Pure RecordFunction benchmark (10000): 1042 us.
Running with empty sampled observer
Tensor GEMM benchmark (1x1, 10000): 8669 us.
Tensor GEMM benchmark (16x16, 10000): 12231 us.
Pure RecordFunction benchmark (10000): 174 us.
Checking number of sampled observer invocations
Pure RecordFunction runtime of 10000000 iterations: 155283 us, number of callback invocations: 998, expected number: ~1000 invocations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment