y2q-actionman/agenda.org

## agenda.org

      
    Raw
  

              agenda.org
            
          
    型宣言を付けたら遅くなった話

概要

概要

Common Lisp コードの最適化方法と、その実例の紹介

「最適化」のつもりが「遅く」なってしまった事例を元に、 Common Lisp コー
  ド最適化の技法を概説したいと思います。
自己紹介

自己紹介


  名前

  所属

  仕事
何らかの Common Lisp コードを書く
  SNS account
twitter.com/y2q_actionman

最適化の概要

そもそも最適化とは


  Design Level
    早いアルゴリズムとデータ構造を使う
  
  Source Code Level
    早いコードを書く
  
  Build Level
    ビルド対象の環境に応じて早くなるよう調整
  
  Compiler Level
    最適化コンパイラを使う
  
  Assembly Level
    早い CPU 命令を使う
  
  Run Time
    実行時情報を元に最適化
  

出典: Wikipedia: Program Optimization
今回の話の対象


  Design Level
    早いアルゴリズムとデータ構造を使う
  
  Source Code Level
    早いコードを書く
  
  Build Level
    ビルド対象の環境に応じて早くなるよう調整
  
  Compiler Level
    最適化コンパイラを使う
  
  Assembly Level
    早い CPU 命令を使う
  
  Run Time
    実行時情報を元に最適化
  

ここでは、 2. 3. 4. の話をします
Common Lisp の最適化コンパイラの使用

Common Lisp の最適化コンパイラの使用


  最適化宣言を付ける
  型宣言を付ける
  コンパイラの出力や、 disassemble, time を見て調節
  環境依存の技を駆使 (面倒)

コード例: 最適化前

(defun make-test-array (size)
  (loop with arr = (make-array (list size)
                               :element-type 'double-float)
     for i from 0 below size
     do (setf (aref arr i) (coerce i 'double-float))
     finally (return arr)))

(defun test-array-sum/no-optimize
    (size &optional (times 10))
  ;; この関数を高速化したい
  (flet ((summing (arr)
           (loop for d across arr
                 sum d)))
    (let ((arr (make-test-array size)))
      (dotimes (i times) (summing arr)))))
最適化宣言を付ける

コード例: 最適化宣言追加

(defun test-array-sum/add-declaration
    (size &optional (times 10))
  (flet ((summing (arr)
           ;; 追加
           (declare (optimize (speed 3) (safety 0)
                              (debug 0)))
           (loop for d across arr
                 sum d)))
    (let ((arr (make-test-array size)))
      (dotimes (i times) (summing arr)))))
OPTIMIZE 宣言

OPTIMIZE 宣言 を付けることで、コンパイラに指示ができる。

  compilation-speed
speed of the compilation process
  debug
ease of debugging
  safety
run-time error checking
  space
both code size and run-time space
  speed
speed of the object code

型宣言を付ける

コード例: 引数への型宣言追加

(defun test-array-sum/add-types-1
    (size &optional (times 10))
  (flet ((summing (arr)
           (declare (optimize (speed 3) (safety 0)
                              (debug 0))
                    ;; 変数 arr への型宣言
                    (type (simple-array double-float)
                          arr))
           (loop for d across arr
                 sum d)))
    (let ((arr (make-test-array size)))
      (dotimes (i times) (summing arr)))))
コード例: Loop 変数への型宣言追加

(defun test-array-sum/add-types-2
    (size &optional (times 10))
  (flet ((summing (arr)
           (declare (optimize (speed 3) (safety 0)
                              (debug 0))
                    (type (simple-array double-float)
                          arr))
           ;; Loop 変数への型宣言
           (loop for d double-float across arr
                 sum d double-float)))
    (let ((arr (make-test-array size)))
      (dotimes (i times) (summing arr)))))
型宣言の方法


  TYPE 宣言
    宣言の仕組みを使って、変数の型を指定。
  
  THE 特殊オペレータ
    式から戻ってくる値の型を指定できる。
    例
    
      (+ x y) 指定なし
      (the fixnum (+ x y)) fixnum であると指定
    
  
  LOOP マクロ中での指定
    LOOP マクロの構文上、 type-spec が置ける所に置ける。
      (詳しくは、 LOOP マクロの構文を参照..)
  
  配列の要素の型を指定
    
      (make-array 10)

        何でも入る、 10 要素の配列
      (make-array 10 :element-type 'double-float)

        double-float 専用の、 10 要素の配列
    
  
計測結果

計測結果


  real non-GC GC
  何もなし 0.194 0.172 0.025
  最適化宣言追加 0.224 0.195 0.032
  引数に型宣言 0.156 0.134 0.024
  Loop 変数に型宣言 0.023 0.023 0

time で見てみるデモ

本題 「型宣言を付けたら遅くなった」例

本題 「型宣言を付けたら遅くなった」例

直感的には:

  型宣言を付けると
  コンパイラが最適化に使える情報が増えて
  最適化されたコードが出てきて
  早いプログラムになる

そう上手くいかなかった例をご紹介します

対象の処理系は、 Allegro CL と SBCL です。
出典

出典

ある機械学習アルゴリズムを実装するコードの一部

  あらかじめ、いくつかの数値 (double-float) を計算。
  vector-push-extend で、配列に入れておく。
  以後頻繁に、その値を読みだして各種計算を行う。

配列を読み出す部分を高速化 しようとして、 型を指定 すると・・
  遅くなった？？
早い順番に並べてみよう

(simple-array double-float)

(defun test-sarray-dfloat (size &optional (times 10))
  (flet ((summing (arr)
           (declare (optimize (speed 3) (safety 0)
                              (debug 0))
                    ;;(simple-array double-float)で
                    (type (simple-array double-float (*))
                          arr))
           (loop for d double-float across arr
                 sum d double-float)))
    (let ((arr (make-array (list size) :element-type 'double-float)))
      (check-type arr (simple-array double-float (*)))
      (loop for i from 0 below size
            do (setf (aref arr i) (coerce i 'double-float)))
      (format t "measuring: ~a~%" '(simple-array double-float (*)))
      (time (dotimes (i times) (summing arr)))
      (terpri))))
(simple-array t)

(defun test-sarray-t (size &optional (times 10))
  (flet ((summing (arr)
           (declare (optimize (speed 3) (safety 0)
                              (debug 0))
                    ;;(simple-array t)で集計
                    (type (simple-array t (*))
                          arr))
           (loop for d double-float across arr
                 sum d double-float)))
    (let ((arr (make-array (list size) :element-type 't)))
      (check-type arr (simple-array t (*)))
      (loop for i from 0 below size
            do (setf (aref arr i) (coerce i 'double-float)))
      (format t "measuring: ~a~%" '(simple-array t (*)))
      (time (dotimes (i times) (summing arr)))
      (terpri))))
(array double-float)

(defun test-array-dfloat (size &optional (times 10))
  (flet ((summing (arr)
           (declare (optimize (speed 3) (safety 0)
                              (debug 0))
                    ;;(array double-float)で集計
                    (type (array double-float (*))
                          arr))
           (loop for d double-float across arr
                 sum d double-float)))
    (let ((arr (make-array (list size) :element-type 'double-float)))
      (check-type arr (array double-float (*)))
      (loop for i from 0 below size
            do (setf (aref arr i) (coerce i 'double-float)))
      (format t "measuring: ~a~%" '(array double-float (*)))
      (time (dotimes (i times) (summing arr)))
      (terpri))))
(array t)

(defun test-array-t (size &optional (times 10))
  (flet ((summing (arr)
           (declare (optimize (speed 3) (safety 0)
                              (debug 0))
                    ;;(array t)で集計
                    (type (array t (*))
                          arr))
           (loop for d double-float across arr
                 sum d double-float)))
    (let ((arr (make-array (list size) :element-type 't)))
      (check-type arr (array t (*)))
      (loop for i from 0 below size
            do (setf (aref arr i) (coerce i 'double-float)))
      (format t "measuring: ~a~%" '(array t (*)))
      (time (dotimes (i times) (summing arr)))
      (terpri))))
予想

予想


  double-float t
  simple-array 最速 
  array 最遅

結果

結果

10000000 要素の配列について、実時間を計測

  double-float t
  simple-array 0.112 0.168
  array 1.021 0.526

Demo

何がコードに起こったか？

コンパイラの診断メッセージを見る

コンパイラの警告を見てみる

; file: /Users/y2q/acl_caverts/test.lisp
; in: DEFINE-ARRAY-SUM-TEST TEST-ARRAY-DFLOAT
;     (DEFINE-ARRAY-SUM-TEST TEST-ARRAY-DFLOAT (ARRAY DOUBLE-FLOAT (*)))
; --> DEFUN PROGN SB-IMPL::%DEFUN BLOCK FLET BLOCK LOOP BLOCK LET 
; --> SB-LOOP::WITH-SUM-COUNT LET SB-LOOP::LOOP-BODY TAGBODY 
; --> SB-LOOP::LOOP-REALLY-DESETQ SETQ THE AREF 
; ==>
;   (SB-KERNEL:HAIRY-DATA-VECTOR-REF ARRAY SB-INT:INDEX)
; 
; note: unable to
;   avoid runtime dispatch on array element type
; due to type uncertainty:
;   The first argument is a (VECTOR DOUBLE-FLOAT), not a SIMPLE-ARRAY.

simple-array でないので、実行時の dispatch を回避できなかった。
array と simple-array

simple-array とは?

simple-array とは

  adjustable ではない
  displaced array ではない
  fill-pointer を持たない

adjustable array とは


  adjust-array 関数
配列の次元を変更することが出来る関数。
  adjustable array
adjust-array に便利な内部構造が使用される。


  adjustable array
adjust-array を適用する場合に、
    その配列が直接変更される。
  非 adjustable array
adjust-array を適用すると、
    配列がコピーされる場合がある。

displaced array とは


  自分自身の記憶領域を持たず、別の配列を指している配列
  対象の配列について、指定した offset 分だけずらした
    index を使って参照することが出来る。

fill-pointer とは

fill-pointer とは
vector の要素を、 active とそうでないものに分けることが出来る。

以下の関数と一緒に使うと、 vector をスタックのように使用できる。

  vector-pop
vector の fill-pointer を減らして、
    値を削除する。
  vector-push
vector の fill-pointer を増やして、
    値を追加する。
  vector-push-extend
vector-push と同様。
    ただし、 fill-pointer が vector の末尾にまで来た場合、
    adjust-array で vector を拡大する。

fill-pointer 例 (初期状態)

(make-array 10 :initial-contents '(1 2 3 4 5 0 0 0 0 0)
                :fill-pointer 5)
;; => #(1 2 3 4 5)

+--+--+--+--+--+--+--+--+--+--+
| 1| 2| 3| 4| 5| 0| 0| 0| 0| 0|
+--+--+--+--+--+--+--+--+--+--+
                |               
                +-- fill-pointer

fill-pointer 例 (push)

(vector-push 6 *)
;; => 5
 **
;; => #(1 2 3 4 5 6)

+--+--+--+--+--+--+--+--+--+--+
| 1| 2| 3| 4| 5| 6| 0| 0| 0| 0|
+--+--+--+--+--+--+--+--+--+--+
                   |               
                   +-- fill-pointer

fill-pointer 例 (pop)

(vector-pop *)
;; => 6
 **
;; => #(1 2 3 4 5)

+--+--+--+--+--+--+--+--+--+--+
| 1| 2| 3| 4| 5| 6| 0| 0| 0| 0|
+--+--+--+--+--+--+--+--+--+--+
                |               
                +-- fill-pointer

array と simple-array の速度差

simple-array とは

  adjustable ではない
  displaced array ではない
  fill-pointer を持たない


  上記三つがどれか一つでもあると、 配列参照はそれなりに面倒な処理
    になる
  実行時の判断が必要。最適化が難しく、 inline 化などもされない

general array と specialized array

specialized array

specizlized array とは

  指定された型のオブジェクトしか入れられないような array
  指定した型を効率よく格納できる構造が使われる

例) general array に double-float を格納

配列 #(1.0 2.0 3.0) の内部表現の模式図
+---+---+---+--- 
|   |   |   | ...
+-+-+-+-+-+-+---
  |   |   |     +-------+
  |   |   +---->|tag|3.0|
  |   |         +-------+
  |   |   +-------+     
  |   +-->|tag|2.0|
  |       +-------+
  |   +-------+
  +-->|tag|1.0|
      +-------+
              

例) specialized array に double-float を格納

配列 #(1.0 2.0 3.0) の内部表現の模式図
+---+---+---+--- 
|1.0|2.0|3.0| ...
+---+---+---+---

Array Upgrading


  make-array で :element-type を指定すると、格納できる要素
    の型を指定できる。
  あまりに複雑な型を要素に指定すると、指定通りにいかない場合も。
    
      general array と同等になったりする。
    
  
upgraded-array-element-type で確認可能。実装ごとに異なる。
数値の boxing

boxing とは


  オブジェクトをマシンに適した表現から、 Lisp オブジェクト
    に変えること
  直感的には..
    
      unboxed double-float

        64 個の bit 列
      boxed double-float

        上記 bit 列を指すポインタと、型を示す tag の組
    
  
  Java でいうところの以下の関係
    
      primitive type (int, float, …)
      それらのオブジェクト表現 (Integer, Float, …)
    
  
型宣言と boxing

型宣言を付けると、指定した型専用のコードが出力される。

  boxing が省略される場合がある。
  unboxed number だけで計算が行われるようになったり

array と boxing

数値 (number) を格納する時に..

  general array
boxed number を格納
  specialized array
unboxed number を格納
    -> 空間効率が改善
  

関数 と boxing


  関数の引数と戻り値は、任意の Lisp オブジェクトで
    ある可能性がある。
  通常、 unboxed number は、関数に渡す時や
    関数から戻す時に boxing される。

inlining と boxing


  関数が inline 展開される場合は
  Lisp 関数の呼び出し sequence が省略されるので、
  boxing が省略されることがある

-> 今回の問題の核心
真相

実験に使ったコード

(defun test-sarray-dfloat (size &optional (times 10))
  (flet ((summing (arr)
           (declare (optimize (speed 3) (safety 0)
                              (debug 0))
                    (type (simple-array double-float (*))
                          arr))
           (loop for d double-float across arr
                 sum d double-float)))
    ...

  配列を集計する部分には、型宣言を付けまくっている
    -> unboxed double-float で計算される
  配列参照時の、数値の受け渡しのみが違う。

(simple-array double-float)

(defun test-sarray-dfloat (size &optional (times 10))
  (flet ((summing (arr)
           (declare (optimize (speed 3) (safety 0)
                              (debug 0))
                    ;;(simple-array double-float)で
                    (type (simple-array double-float (*))
                          arr))
           (loop for d double-float across arr
                 sum d double-float)))
    ...

  配列の内部表現
unboxed double-float
  配列参照
simple-array なので inlining される

(simple-array double-float) つづき


  コンパイラから見える処理
    
      配列から unboxed double-float を取り出す。
      それをそのまま unboxed double-float で計算するコードに渡す
    
  
  一切の boxing がない

(simple-array t)

(defun test-sarray-t (size &optional (times 10))
  (flet ((summing (arr)
           (declare (optimize (speed 3) (safety 0)
                              (debug 0))
                    ;; (simple-array t) で集計
                    (type (simple-array t (*))
                          arr))
           (loop for d double-float across arr
                 sum d double-float)))
    ...

  配列の内部表現
boxed double-float
  配列参照
simple-array なので inlining される

(simple-array t) つづき


  コンパイラから見える処理
    
      配列から boxed double-float を取り出す。
      それを開けて、 unboxed double-float を取り出す
      その数値を、 unboxed double-float を計算するコードに渡す
    
  
  配列参照に伴う boxing はない。

(array t)

(defun test-array-t (size &optional (times 10))
  (flet ((summing (arr)
           (declare (optimize (speed 3) (safety 0)
                              (debug 0))
                    ;; (array t) で集計
                    (type (array t (*))
                          arr))
           (loop for d double-float across arr
                 sum d double-float)))
    ...

  配列の内部表現
boxed double-float
  配列参照
inlining されない

(array t) つづき


  コンパイラから見える処理
    
      配列参照の関数呼び出しをする。
        
          配列要素の boxed double-float を、そのまま
            Lisp オブジェクトとして返す。
        
      
      返ってきた boxed double-float を開けて、
        unboxed double-float を取り出す。
      その数値を、 unboxed double-float を計算するコードに渡す
    
  
  関数呼び出しが挟まってしまうが、配列参照に伴う
    boxing はない。

(array double-float)

(defun test-array-dfloat (size &optional (times 10))
  (flet ((summing (arr)
           (declare (optimize (speed 3) (safety 0)
                              (debug 0))
                    ;; (array double-float) で集計
                    (type (array double-float (*))
                          arr))
           (loop for d double-float across arr
                 sum d double-float)))
    ...

  配列の内部表現
unboxed double-float
  配列参照
inlining されない

(array double-float) つづき


  コンパイラから見える処理
    
      配列参照の関数呼び出しをする。
        
          配列要素の unboxed double-float を取り出す。
          それを boxing して Lisp オブジェクトに変えて
            返す。
        
      
      返ってきた boxed double-float を開けて、
        unboxed double-float を取り出す。
      その数値を、 unboxed double-float を計算するコードに渡す
    
  
  関数呼び出しがされてしまい、さらに呼び出し規約
    の都合で boxing が挟まる。
  一回の配列参照ごとに、 double-float の boxing が起こる!

対策

何がいけなかったのか？

何がいけなかったのか？

(array double-float) という指定:

  格納時の空間効率を改善
  参照時の効率は悪化

測定対象のコード:

  配列参照ばかり

指定が噛み合っていなかった
対策

対策1 : simple-array にする

Pros:

  一番早い
  コードも自明

Cons:

  array の高級な機能が使えない
    
      fill-pointer
      displaced array
      adjustable array
    
    simple-array にして、対応する機能を手書きすることも
      考えられるが、二度手間
  

対策2 : (array t) にする

Pros:

  そこそこ早い
  array の高級な機能が使える

Cons:

  コードが全く自明じゃない

対策3 : 処理系拡張を使う

配列の内部の記憶領域を、 simple-array として取り出す
  手段がある。

  Allegro CL : with-underlying-simple-vector
  SBCL : array-storage-vector

Pros:

  早くなる
  コードの見た目が どうみても汚く なる
    (汚いコードは汚く見えるべき)

Cons:

  とても危険

出典コードには

出典元 (再掲)

ある機械学習アルゴリズムを実装するコードの一部

  あらかじめ、いくつかの数値 (double-float) を計算。
  vector-push-extend で、配列に入れておく。
  以後、その値を頻繁に読みだし、各種計算を行う。

方針

ある機械学習アルゴリズムを実装するコードの一部

  あらかじめ、いくつかの数値 (double-float) を計算。
  vector-push-extend で、配列に入れておく。
    -> simple-array が使えない。
  
  以後、その値を頻繁に読みだし、各種計算を行う。
    -> simple-array が有利。
  

結果


  あらかじめ、いくつかの数値 (double-float) を計算。
  vector-push-extend で、配列に入れておく。
  coerce で、 simple-array に変換 (<- 追加)
  以後、その値を頻繁に読みだし、各種計算を行う。

まとめ

まとめ


  実例をもとに、最適化の手法を紹介。
  Common Lisp は、十分 CPU の性能を生かすコードを
    生成することが出来る。
  処理系の深淵に潜る場合もあるが、手段は充実している。
    意味不明な挙動に困らされることも(あまり)ない。
  Common Lisp を使って、早いコードを書こう!!

おまけ：雑多なトピック

Common Lisp の提供する解析手段


  cl:time
コードの実行に要した時間、
    メモリ使用量を計測する
  cl:disassemble
逆アセンブル結果を見る。
    コードのインライン化などが確認可能。

処理系ごとの手段


  コンパイラの警告や解析結果を見る
    
      Allegro CL
Compiler Explanations
      SBCL
診断メッセージ
    
  
  プロファイラ
    
      Allegro CL
runtime-analyzer
      SBCL
二種類のプロファイラがある
    
  
  外部ライブラリ
    Linux perftools など

性能改善手段


  define-compiler-macro
    
      コンパイル時に適用されるマクロ。
      コンパイル時の情報を利用し、効率改善のために使う。
      inline 関数の代わりにも
    
  
  Allegro CL の sys::immed-args-call
    
      関数の呼び出し規約を書き換える手段。
      引数、または返り値に、Lisp オブジェクトではなく
        unboxed number を使用できるようにする。
      とてもあぶない
        (実例: defmethod した関数に適用したら SIGBUS 喰らった)
    
  
## test_add_type.lisp
(defun make-test-array (size)
  (loop with arr = (make-array (list size) :element-type 'double-float)
     for i from 0 below size
     do (setf (aref arr i) (coerce i 'double-float))
     finally (return arr)))

(defun test-array-sum/no-optimize (size &optional (times 10))
  (flet ((summing (arr)
           (loop for d across arr
                 sum d)))
    (let ((arr (make-test-array size)))
      (dotimes (i times) (summing arr)))))

(defun test-array-sum/add-declaration (size &optional (times 10))
  (flet ((summing (arr)
           (declare (optimize (speed 3) (safety 0) (debug 0))) ;
           (loop for d across arr
                 sum d)))
    (let ((arr (make-test-array size)))
      (dotimes (i times) (summing arr)))))

(defun test-array-sum/add-types-1 (size &optional (times 10))
  (flet ((summing (arr)
           (declare (optimize (speed 3) (safety 0) (debug 0))
                    (type (simple-array double-float) arr)) ;
           (loop for d across arr
                 sum d)))
    (let ((arr (make-test-array size)))
      (dotimes (i times) (summing arr)))))

(defun test-array-sum/add-types-2 (size &optional (times 10))
  (flet ((summing (arr)
           (declare (optimize (speed 3) (safety 0) (debug 0))
                    (type (simple-array double-float) arr))
           (loop for d double-float across arr ;
                 sum d double-float)))         ;
    (let ((arr (make-test-array size)))
      (dotimes (i times) (summing arr)))))

#|
SBCL example

CL-USER> (gc)
NIL
CL-USER> (time (test-array-sum/no-optimize 1000000))
Evaluation took:
  0.194 seconds of real time
  0.196797 seconds of total run time (0.188522 user, 0.008275 system)
  [ Run times consist of 0.025 seconds GC time, and 0.172 seconds non-GC time. ]
  101.55% CPU
  446,380,801 processor cycles
  328,012,272 bytes consed

NIL
CL-USER> (gc)
NIL
CL-USER> (time (test-array-sum/add-declaration 1000000))
Evaluation took:
  0.224 seconds of real time
  0.226986 seconds of total run time (0.196372 user, 0.030614 system)
  [ Run times consist of 0.032 seconds GC time, and 0.195 seconds non-GC time. ]
  101.34% CPU
  515,238,374 processor cycles
  328,012,240 bytes consed

NIL
CL-USER> (gc)
NIL
CL-USER> (time (test-array-sum/add-types-1 1000000))
Evaluation took:
  0.156 seconds of real time
  0.157867 seconds of total run time (0.149668 user, 0.008199 system)
  [ Run times consist of 0.024 seconds GC time, and 0.134 seconds non-GC time. ]
  101.28% CPU
  357,313,858 processor cycles
  328,011,392 bytes consed

NIL
CL-USER> (gc)
NIL
CL-USER> (time (test-array-sum/add-types-2 1000000))
Evaluation took:
  0.023 seconds of real time
  0.023060 seconds of total run time (0.019489 user, 0.003571 system)
  100.00% CPU
  52,123,820 processor cycles
  8,000,016 bytes consed

NIL
CL-USER>
|#

## test_curious.lisp
(defmacro define-array-sum-test (test-name array-typespec)
  (destructuring-bind (array-type element-type &rest dimensions) array-typespec
    (declare (ignore array-type dimensions))
    `(defun ,test-name (size &optional (times 10))
       (flet ((summing (arr)
		(declare (optimize (speed 3) (safety 0) (debug 0))
			 (type ,array-typespec arr))
		(loop for d double-float across arr
		   sum d double-float)))
	 (let ((arr (make-array (list size) :element-type ',element-type)))
	   (check-type arr ,array-typespec)
	   ;; array setup
	   (loop for i from 0 below size
	      do (setf (aref arr i) (coerce i 'double-float)))
	   ;; measuring
	   (format t "measuring: ~A~%" ',array-typespec)
	   (time (dotimes (i times)
		   (summing arr)))
	   (terpri))))))

(define-array-sum-test test-sarray-dfloat
    (simple-array double-float (*)))

(define-array-sum-test test-sarray-t
    (simple-array t (*)))

(define-array-sum-test test-array-dfloat
    (array double-float (*)))

(define-array-sum-test test-array-t
    (array t (*)))

(defun run-test (size)
  (test-sarray-dfloat size)
  (test-sarray-t size)
  (test-array-dfloat size)
  (test-array-t size))

#|
Allegro CL example.

CL-USER> (lisp-implementation-type)

"International Allegro CL Enterprise Edition"
CL-USER> (lisp-implementation-version)

"9.0 [64-bit Linux (x86-64) *SMP*] (Apr 30, 2014 12:14)"
("lisp_build 40")
CL-USER> (run-test 10000000)


measuring: (SIMPLE-ARRAY DOUBLE-FLOAT (*))
; cpu time (non-gc) 0.100006 sec user, 0.000000 sec system
; cpu time (gc)     0.000000 sec user, 0.000000 sec system
; cpu time (total)  0.100006 sec user, 0.000000 sec system
; cpu time (thread) 0.096006 sec user, 0.000000 sec system
; real time  0.097259 sec (102.8%)
; space allocation:
;  2,535 cons cells, 3,392 other bytes, 0 static bytes
; Page Faults: major: 0 (gc: 0), minor: 0 (gc: 0)

measuring: (SIMPLE-ARRAY T (*))
; cpu time (non-gc) 0.280017 sec user, 0.000000 sec system
; cpu time (gc)     0.000000 sec user, 0.000000 sec system
; cpu time (total)  0.280017 sec user, 0.000000 sec system
; cpu time (thread) 0.276017 sec user, 0.000000 sec system
; real time  0.276685 sec (101.2%)
; space allocation:
;  507 cons cells, 34,656 other bytes, 0 static bytes
; Page Faults: major: 0 (gc: 0), minor: 0 (gc: 0)

measuring: (ARRAY DOUBLE-FLOAT (*))
; cpu time (non-gc) 2.424151 sec user, 0.000000 sec system
; cpu time (gc)     0.160010 sec user, 0.000000 sec system
; cpu time (total)  2.584161 sec user, 0.000000 sec system
; cpu time (thread) 2.384149 sec user, 0.000000 sec system
; real time  2.551673 sec (101.3%)
; space allocation:
;  -2,897 cons cells, 3,200,050,288 other bytes, 0 static bytes
; Page Faults: major: 0 (gc: 11), minor: 11 (gc: 11)

measuring: (ARRAY T (*))
; cpu time (non-gc) 0.760048 sec user, 0.000000 sec system
; cpu time (gc)     0.000000 sec user, 0.000000 sec system
; cpu time (total)  0.760048 sec user, 0.000000 sec system
; cpu time (thread) 0.752047 sec user, 0.000000 sec system
; real time  0.750244 sec (101.3%)
; space allocation:
;  0 cons cells, 5,888 other bytes, 0 static bytes
; Page Faults: major: 0 (gc: 0), minor: 0 (gc: 0)

NIL
|#

#|
SBCL example

CL-USER> (run-test 10000000)
measuring: (SIMPLE-ARRAY DOUBLE-FLOAT (*))
Evaluation took:
  0.112 seconds of real time
  0.112888 seconds of total run time (0.112202 user, 0.000686 system)
  100.89% CPU
  257,649,342 processor cycles
  0 bytes consed


measuring: (SIMPLE-ARRAY T (*))
Evaluation took:
  0.168 seconds of real time
  0.169145 seconds of total run time (0.168348 user, 0.000797 system)
  100.60% CPU
  386,746,516 processor cycles
  0 bytes consed


measuring: (ARRAY DOUBLE-FLOAT (*))
Evaluation took:
  1.021 seconds of real time
  1.038737 seconds of total run time (0.963884 user, 0.074853 system)
  [ Run times consist of 0.166 seconds GC time, and 0.873 seconds non-GC time. ]
  101.76% CPU
  2,348,581,080 processor cycles
  1,600,028,672 bytes consed


measuring: (ARRAY T (*))
Evaluation took:
  0.526 seconds of real time
  0.527937 seconds of total run time (0.525773 user, 0.002164 system)
  100.38% CPU
  1,209,749,122 processor cycles
  0 bytes consed


NIL
CL-USER>
|#
`(+ x y)`	指定なし
`(the fixnum (+ x y))`	`fixnum` であると指定
	real	non-GC	GC
何もなし	0.194	0.172	0.025
最適化宣言追加	0.224	0.195	0.032
引数に型宣言	0.156	0.134	0.024
Loop 変数に型宣言	0.023	0.023	0
	(defun make-test-array (size)
	(loop with arr = (make-array (list size) :element-type 'double-float)
	for i from 0 below size
	do (setf (aref arr i) (coerce i 'double-float))
	finally (return arr)))

	(defun test-array-sum/no-optimize (size &optional (times 10))
	(flet ((summing (arr)
	(loop for d across arr
	sum d)))
	(let ((arr (make-test-array size)))
	(dotimes (i times) (summing arr)))))

	(defun test-array-sum/add-declaration (size &optional (times 10))
	(flet ((summing (arr)
	(declare (optimize (speed 3) (safety 0) (debug 0))) ;
	(loop for d across arr
	sum d)))
	(let ((arr (make-test-array size)))
	(dotimes (i times) (summing arr)))))

	(defun test-array-sum/add-types-1 (size &optional (times 10))
	(flet ((summing (arr)
	(declare (optimize (speed 3) (safety 0) (debug 0))
	(type (simple-array double-float) arr)) ;
	(loop for d across arr
	sum d)))
	(let ((arr (make-test-array size)))
	(dotimes (i times) (summing arr)))))

	(defun test-array-sum/add-types-2 (size &optional (times 10))
	(flet ((summing (arr)
	(declare (optimize (speed 3) (safety 0) (debug 0))
	(type (simple-array double-float) arr))
	(loop for d double-float across arr ;
	sum d double-float))) ;
	(let ((arr (make-test-array size)))
	(dotimes (i times) (summing arr)))))

	#\|
	SBCL example

	CL-USER> (gc)
	NIL
	CL-USER> (time (test-array-sum/no-optimize 1000000))
	Evaluation took:
	0.194 seconds of real time
	0.196797 seconds of total run time (0.188522 user, 0.008275 system)
	[ Run times consist of 0.025 seconds GC time, and 0.172 seconds non-GC time. ]
	101.55% CPU
	446,380,801 processor cycles
	328,012,272 bytes consed

	NIL
	CL-USER> (gc)
	NIL
	CL-USER> (time (test-array-sum/add-declaration 1000000))
	Evaluation took:
	0.224 seconds of real time
	0.226986 seconds of total run time (0.196372 user, 0.030614 system)
	[ Run times consist of 0.032 seconds GC time, and 0.195 seconds non-GC time. ]
	101.34% CPU
	515,238,374 processor cycles
	328,012,240 bytes consed

	NIL
	CL-USER> (gc)
	NIL
	CL-USER> (time (test-array-sum/add-types-1 1000000))
	Evaluation took:
	0.156 seconds of real time
	0.157867 seconds of total run time (0.149668 user, 0.008199 system)
	[ Run times consist of 0.024 seconds GC time, and 0.134 seconds non-GC time. ]
	101.28% CPU
	357,313,858 processor cycles
	328,011,392 bytes consed

	NIL
	CL-USER> (gc)
	NIL
	CL-USER> (time (test-array-sum/add-types-2 1000000))
	Evaluation took:
	0.023 seconds of real time
	0.023060 seconds of total run time (0.019489 user, 0.003571 system)
	100.00% CPU
	52,123,820 processor cycles
	8,000,016 bytes consed

	NIL
	CL-USER>
	\|#
	(defmacro define-array-sum-test (test-name array-typespec)
	(destructuring-bind (array-type element-type &rest dimensions) array-typespec
	(declare (ignore array-type dimensions))
	`(defun ,test-name (size &optional (times 10))
	(flet ((summing (arr)
	(declare (optimize (speed 3) (safety 0) (debug 0))
	(type ,array-typespec arr))
	(loop for d double-float across arr
	sum d double-float)))
	(let ((arr (make-array (list size) :element-type ',element-type)))
	(check-type arr ,array-typespec)
	;; array setup
	(loop for i from 0 below size
	do (setf (aref arr i) (coerce i 'double-float)))
	;; measuring
	(format t "measuring: ~A~%" ',array-typespec)
	(time (dotimes (i times)
	(summing arr)))
	(terpri))))))

	(define-array-sum-test test-sarray-dfloat
	(simple-array double-float (*)))

	(define-array-sum-test test-sarray-t
	(simple-array t (*)))

	(define-array-sum-test test-array-dfloat
	(array double-float (*)))

	(define-array-sum-test test-array-t
	(array t (*)))

	(defun run-test (size)
	(test-sarray-dfloat size)
	(test-sarray-t size)
	(test-array-dfloat size)
	(test-array-t size))

	#\|
	Allegro CL example.

	CL-USER> (lisp-implementation-type)

	"International Allegro CL Enterprise Edition"
	CL-USER> (lisp-implementation-version)

	"9.0 [64-bit Linux (x86-64) SMP] (Apr 30, 2014 12:14)"
	("lisp_build 40")
	CL-USER> (run-test 10000000)


	measuring: (SIMPLE-ARRAY DOUBLE-FLOAT (*))
	; cpu time (non-gc) 0.100006 sec user, 0.000000 sec system
	; cpu time (gc) 0.000000 sec user, 0.000000 sec system
	; cpu time (total) 0.100006 sec user, 0.000000 sec system
	; cpu time (thread) 0.096006 sec user, 0.000000 sec system
	; real time 0.097259 sec (102.8%)
	; space allocation:
	; 2,535 cons cells, 3,392 other bytes, 0 static bytes
	; Page Faults: major: 0 (gc: 0), minor: 0 (gc: 0)

	measuring: (SIMPLE-ARRAY T (*))
	; cpu time (non-gc) 0.280017 sec user, 0.000000 sec system
	; cpu time (gc) 0.000000 sec user, 0.000000 sec system
	; cpu time (total) 0.280017 sec user, 0.000000 sec system
	; cpu time (thread) 0.276017 sec user, 0.000000 sec system
	; real time 0.276685 sec (101.2%)
	; space allocation:
	; 507 cons cells, 34,656 other bytes, 0 static bytes
	; Page Faults: major: 0 (gc: 0), minor: 0 (gc: 0)

	measuring: (ARRAY DOUBLE-FLOAT (*))
	; cpu time (non-gc) 2.424151 sec user, 0.000000 sec system
	; cpu time (gc) 0.160010 sec user, 0.000000 sec system
	; cpu time (total) 2.584161 sec user, 0.000000 sec system
	; cpu time (thread) 2.384149 sec user, 0.000000 sec system
	; real time 2.551673 sec (101.3%)
	; space allocation:
	; -2,897 cons cells, 3,200,050,288 other bytes, 0 static bytes
	; Page Faults: major: 0 (gc: 11), minor: 11 (gc: 11)

	measuring: (ARRAY T (*))
	; cpu time (non-gc) 0.760048 sec user, 0.000000 sec system
	; cpu time (gc) 0.000000 sec user, 0.000000 sec system
	; cpu time (total) 0.760048 sec user, 0.000000 sec system
	; cpu time (thread) 0.752047 sec user, 0.000000 sec system
	; real time 0.750244 sec (101.3%)
	; space allocation:
	; 0 cons cells, 5,888 other bytes, 0 static bytes
	; Page Faults: major: 0 (gc: 0), minor: 0 (gc: 0)

	NIL
	\|#

	#\|
	SBCL example

	CL-USER> (run-test 10000000)
	measuring: (SIMPLE-ARRAY DOUBLE-FLOAT (*))
	Evaluation took:
	0.112 seconds of real time
	0.112888 seconds of total run time (0.112202 user, 0.000686 system)
	100.89% CPU
	257,649,342 processor cycles
	0 bytes consed


	measuring: (SIMPLE-ARRAY T (*))
	Evaluation took:
	0.168 seconds of real time
	0.169145 seconds of total run time (0.168348 user, 0.000797 system)
	100.60% CPU
	386,746,516 processor cycles
	0 bytes consed


	measuring: (ARRAY DOUBLE-FLOAT (*))
	Evaluation took:
	1.021 seconds of real time
	1.038737 seconds of total run time (0.963884 user, 0.074853 system)
	[ Run times consist of 0.166 seconds GC time, and 0.873 seconds non-GC time. ]
	101.76% CPU
	2,348,581,080 processor cycles
	1,600,028,672 bytes consed


	measuring: (ARRAY T (*))
	Evaluation took:
	0.526 seconds of real time
	0.527937 seconds of total run time (0.525773 user, 0.002164 system)
	100.38% CPU
	1,209,749,122 processor cycles
	0 bytes consed


	NIL
	CL-USER>
	\|#