Skip to content

Instantly share code, notes, and snippets.

@takagi
takagi / read.lisp
Created March 26, 2015 11:51
Comparing efficiency of READ-BYTE with READ-SEQUENCE.
(require :sb-sprof)
(defun test-read-byte0 ()
(with-open-file (in "data" :direction :input
:element-type '(unsigned-byte 8))
(loop repeat (* 4 1024 1024)
do (read-byte in))))
(defun profile-read-byte0 ()
(sb-sprof:with-profiling (:max-samples 100
@takagi
takagi / tsuru.lisp
Last active August 29, 2015 14:17
Tsuru Capital recruiting test code sample.
;;;
;;; Fundamental WORD/INT types and readers
;;;
(deftype word8 ()
`(unsigned-byte 8))
(deftype word16 ()
`(unsigned-byte 16))
@takagi
takagi / flexi-streams.lisp
Created September 9, 2015 14:03
flexi-streams's external format
(with-open-file (in "/Users/mtakagi/Desktop/bin" :direction :input
:element-type 'unsigned-byte)
(let ((buffer (make-array 256 :element-type 'unsigned-byte)))
(read-sequence buffer in)
(flexi-streams:octets-to-string buffer :external-format :utf-8)))
https://github.com/takagi/cl-cuda/tree/issue/49.symbol-macro
・時間を食う処理は、update-density と update-force で、全体の 90% 以上
・アルゴリズムは所与としたとき、GPU の使い方のレベルで高速化する余地はあるか?
 →メモリアクセスにあまり局所性がなさそう
 →グローバルメモリへのアクセスが律速なので、それ以上はもうやりようがない?
・グリッドやブロックの割当てはどのようにやるもの?
@takagi
takagi / dot.lisp
Last active December 13, 2016 17:43
(defun make-dvec (input-dimension initial-element)
(make-array input-dimension :element-type 'double-float :initial-element initial-element))
(defmacro dovec (vec var &body body)
`(loop for ,var fixnum from 0 to (1- (length ,vec)) do ,@body))
(defun dot (x y)
(declare (type (simple-array double-float) x y)
(optimize (speed 3) (safety 0)))
(let ((result 0.0d0))
@takagi
takagi / cifar_fp16
Last active June 10, 2019 06:54
Comparison of Chainer's cifar example between in FP32 mode and in FP16 mode
$ CHAIENR_DTYPE=float16 python train_cifar.py -d 0
Device: @cupy:0
# Minibatch-size: 64
# epoch: 300
Using CIFAR10 dataset.
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
1 2.32253 2.12119 0.175971 0.192178 23.2918
2 1.7554 1.87858 0.304497 0.302747 49.5816
3 1.46396 1.61379 0.450664 0.416202 75.7544
@takagi
takagi / dcgan_fp16
Last active June 10, 2019 07:25
Comparison of Chainer's dcgan example between in FP32 mode and in FP16 mode
$ CHAINER_DTYPE=float16 python train_dcgan.py -d 0
Device: @cupy:0
# Minibatch-size: 50
# n_hidden: 100
# epoch: 1000
epoch iteration gen/loss dis/loss ................] 0.01%
0 100 nan nan
0 200 nan nan
0 300 nan nan
@takagi
takagi / mnist_fp16
Created June 10, 2019 08:41
Comparison of Chainer's mnist example between in FP32 mode and in FP16 mode
$ CHAINER_DTYPE=float16 python train_mnist.py -d 0
Device: @cupy:0
# unit: 1000
# Minibatch-size: 100
# epoch: 20
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
1 nan nan 0.0994271 0.0980225 3.91818
2 nan nan 0.0997917 0.0980225 6.22553
3 nan nan 0.0995833 0.0980225 8.72424
@takagi
takagi / memnn_fp16
Last active June 14, 2019 02:14
Comparison of Chainer's memnn example between in FP32 mode and in FP16 mode
$ CHAINER_DTYPE=float16 python train_memnn.py tasks_1-20_v1-2/en-10k/qa1_single-supporting-fact_train.txt tasks_1-20_v1-2/en-10k/qa1_single-supporting-fact_test.txt -d 0
Training data: tasks_1-20_v1-2/en-10k/qa1_single-supporting-fact_train.txt: 2000
Test data: tasks_1-20_v1-2/en-10k/qa1_single-supporting-fact_test.txt: 200
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy
1 nan nan 0.0017004 0
2 nan nan 0 0
3 nan nan 0 0
4 nan nan 0 0
5 nan nan 0 0
6 nan nan 0 0
$ CHAINER_DTYPE=float16 python postagging.py -d 0
[nltk_data] Downloading package brown to /home/ext-
[nltk_data] mtakagi/nltk_data...
[nltk_data] Package brown is already up-to-date!
# of sentences: 57340
# of words: 56057
# of pos: 472
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
0 244.875 18.3736
0 373.75 34.9924