Skip to content

Instantly share code, notes, and snippets.

View sonots's full-sized avatar
🤗

Naotoshi Seo sonots

🤗
View GitHub Profile
@sonots
sonots / preprocessor.pyx
Created October 13, 2017 15:54
preprocessor in cython
# A trick to embed preprocessors in cython
cdef extern from *:
cdef void EMIT_IF_PYTHON_VERSION_HEX_LT_37 "#if PY_VERSION_HEX < 0x03070000 //" ()
cdef void EMIT_ELSE "#else //" ()
cdef void EMIT_ENDIF "#endif //" ()
EMIT_IF_PYTHON_VERSION_HEX_LT_37()
EMIT_ELSE()
@sonots
sonots / slack_integration.md
Last active September 28, 2017 06:16
What happens to slack integration when a member is deactivated?
Integration Auto-Disable? Notify? What we need to do NOTE
email No Amazingly, it is not disabled
github Yes Yes Click enable link on the notification
google calendar don't know
hubot don't know
@sonots
sonots / bm_string_interpolation.rb
Created September 20, 2017 08:38
string interpolation performance 2.4.1 vs trunk (2.5.0dev)
b = 'b'
1_000_000.times { "a#{b}c" }
@sonots
sonots / cudaStreamCallback.md
Last active November 2, 2018 14:12
Concurrency is lost by cudaStreamCallback?
$ nvcc simpleCallback.cu -O2 -o simpleCallback
$ nvprof -f -o simpleCallback.nvvp ./simpleCallback | grep elapsed
No callback: elapsed time = 1.534s
One callback: elapsed time = 1.498s
Two callback: elapsed time = 3.718s
Four callback: elapsed time = 5.194s

As increasing callbacks, it becomes slow...

@sonots
sonots / nvvp.md
Last active March 11, 2024 00:53
How to use NVIDIA profiler

Usually, located at /usr/local/cuda/bin

Non-Visual Profiler

$ nvprof python train_mnist.py

I prefer to use --print-gpu-trace.

https://github.com/sonots/embulk-filter-timestamp_format/tree/master/bench

🍣  $ embulk run -I lib bench/config_java.yml
2017-07-10 17:58:32.524 +0900: Embulk v0.8.27
2017-07-10 17:58:33.709 +0900 [INFO] (0001:transaction): Loaded plugin embulk/filter/timestamp_format from a load path
2017-07-10 17:58:33.727 +0900 [INFO] (0001:transaction): Listing local files at directory 'bench' filtering filename by prefix 'dummy'
2017-07-10 17:58:33.729 +0900 [INFO] (0001:transaction): "follow_symlinks" is set false. Note that symbolic links to directories are skipped.
2017-07-10 17:58:33.734 +0900 [INFO] (0001:transaction): Loading files [bench/dummy.csv]
2017-07-10 17:58:33.792 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=16 / output tasks 8 = input tasks 1 * 8
@sonots
sonots / test.pyx
Created June 28, 2017 10:11
cython で .pyx も含めて stacktrace を出したい
import traceback
traceback.print_stack() #<= python レイヤーまでしかでない
@sonots
sonots / cudaMallocVScuMemAllocBench.cu
Last active July 9, 2017 11:55
nvcc cudaMallocVScuMemAllocBench.cu -L /usr/local/cuda/lib64 -l cuda
#include <sys/time.h>
#include <cuda.h>
#include <cuda_runtime.h>
#include <stdio.h>
#define CHECK(call) \
{ \
const cudaError_t error = call; \
if (error != cudaSuccess) \
{ \
@sonots
sonots / cudaMallocGetAlignment.cu
Last active September 2, 2019 02:39
invetigation on cudaMalloc alignment => aligned to at least **512** bytes
#include <sys/time.h>
#include <cuda_runtime.h>
#include <stdio.h>
void test(int size)
{
float *d1, *d2;
cudaMalloc(&d1, size);
cudaMalloc(&d2, size);
@sonots
sonots / cudaMallocBench.cu
Last active January 4, 2023 09:28
Benchmark of cudaMalloc. Allocate 1MB of memory totally with several block sizes
#include <sys/time.h>
#include <cuda_runtime.h>
#include <stdio.h>
inline double seconds()
{
struct timeval tp;
struct timezone tzp;
int i = gettimeofday(&tp, &tzp);
return ((double)tp.tv_sec + (double)tp.tv_usec * 1.e-6);