Skip to content

Instantly share code, notes, and snippets.

@y-ken
Forked from sonots/bench_out_parser.rb
Last active August 29, 2015 14:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save y-ken/9267ce894fbdc0c8dbab to your computer and use it in GitHub Desktop.
Save y-ken/9267ce894fbdc0c8dbab to your computer and use it in GitHub Desktop.
fluent-plugin-parser を用いてベンチマークするサンプルです
require_relative './test/helper'
require 'benchmark'
Fluent::Test.setup
def create_driver(config, tag = 'foo.bar')
Fluent::Test::OutputTestDriver.new(Fluent::ParserOutput, tag).configure(config)
end
# setup
time = Time.now.to_i
CONFIG = %[
add_prefix parsed
key_name message
]
ltsv_message = {'message' => "time:2013-11-20 23:39:42 +0900\tlevel:ERROR\tmethod:POST\turi:/api/v1/people\treqtime:3.1983877060667103"}
ltsv_driver = create_driver(CONFIG + %[format ltsv])
tsv_message = {'message' => "2013-11-20 23:39:42 +0900\tERROR\tPOST\t/api/v1/people\t3.1983877060667103"}
tsv_driver = create_driver(CONFIG + %[format tsv\nkeys time,level,method,uri,reqtime])
regex_message = {'message' => "time:2013-11-20 23:39:42 +0900\tlevel:ERROR\tmethod:POST\turi:/api/v1/people\treqtime:3.1983877060667103"}
regex_driver = create_driver(CONFIG + %[format /^(?<time>[^\t]*)(?<level>[^\t]*)(?<method>[^\t]*)(?<uri>[^\t]*)(?<reqtime>[^\t]*)/])
apache2_message = {'message' => '172.21.65.11 - - [07/Jan/2014:16:09:26 +0900] "GET /mypage HTTP/1.1" 302 - "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.33 Safari/535.11"'}
fast_regex_driver = create_driver(CONFIG + %[format /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/])
greedy_regex_driver = create_driver(CONFIG + %[format /^(?<host>.*) .* (?<user>.*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>.*) +\S*)?" (?<code>.*) (?<size>.*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/])
apache2_ltsv_message = {'message' => 'host:172.21.65.11 user:- time:07/Jan/2014:16:09:26 +0900 method:GET path:/mypage code:302 size:- referer:- agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.33 Safari/535.11'}
# bench
n = 100000
Benchmark.bm(7) do |x|
x.report("apache_ltsv") { ltsv_driver.run { n.times { ltsv_driver.emit(apache2_ltsv_message, time) } } }
x.report("fast_regex") { fast_regex_driver.run { n.times { fast_regex_driver.emit(apache2_message, time) } } }
x.report("greedy_regex") { greedy_regex_driver.run { n.times { greedy_regex_driver.emit(apache2_message, time) } } }
end
# user system total real
#apache_ltsv 27.210000 5.230000 32.440000 ( 33.003831)
#fast_regex 10.880000 1.060000 11.940000 ( 12.438561)
#greedy_regex 43.230000 0.850000 44.080000 ( 44.597733)
@sonots
Copy link

sonots commented Jul 17, 2014

私の環境では ltsv のほうが早かったです。mac と centos 5.8. ruby 2.1.2

$ bundle exec ruby bench_out_parser.rb
/Users/seo.naotoshi/src/github.com/tagomoris/fluent-plugin-parser/lib/fluent/plugin/fixed_parser.rb:249: warning: character class has ']' without escape
/Users/seo.naotoshi/src/github.com/tagomoris/fluent-plugin-parser/lib/fluent/plugin/fixed_parser.rb:249: warning: character class has ']' without escape
              user     system      total        real
apache_ltsv  1.770000   0.050000   1.820000 (  2.323165)
/Users/seo.naotoshi/src/github.com/tagomoris/fluent-plugin-parser/lib/fluent/plugin/fixed_parser.rb:75: warning: character class has ']' without escape
fast_regex  4.920000   0.060000   4.980000 (  5.472021)
/Users/seo.naotoshi/src/github.com/tagomoris/fluent-plugin-parser/lib/fluent/plugin/fixed_parser.rb:75: warning: character class has ']' without escape
greedy_regex 34.550000   0.070000  34.620000 ( 35.164849)

@sonots
Copy link

sonots commented Jul 17, 2014

念のため 1.9.3 でもやってみましたが似たようなものでした。

@y-ken
Copy link
Author

y-ken commented Jul 17, 2014

@sonots さん
早速ありがとうございます!
次の計測方法(Fluentdのparserを直接呼び出す)ですと、いかがでしょうか?

$ git clone https://github.com/y-ken/fluentd_parser_benchmark.git
$ cd fluentd_parser_benchmark/
$ bundle install --path vendor/bundle
$ bundle exec ruby benchmark.rb
              user     system      total        real
ltsv     16.260000   0.040000  16.300000 ( 16.303698)
faster_regexp 10.040000   0.020000  10.060000 ( 10.072818)
greedy_regexp 40.450000   0.060000  40.510000 ( 40.872464)
$ ruby -v
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin13.0]

@sonots
Copy link

sonots commented Jul 17, 2014

@y-ken そちらで追試したところ、確かに ltsv のほうが遅かったです。

$ bundle exec ruby benchmark.rb
              user     system      total        real
ltsv     17.870000   0.060000  17.930000 ( 17.994229)
faster_regexp 11.350000   0.030000  11.380000 ( 11.430420)
greedy_regexp 41.920000   0.110000  42.030000 ( 42.467583)

ただ、ベンチマークを次のように実際の parser と同じようにチューニングした所、ltsv のほうが早くなりました。=> https://gist.github.com/sonots/dd16bd7a65b7dd79f3c8

bundle exec ruby benchmark.rb
              user     system      total        real
ltsv      1.330000   0.000000   1.330000 (  1.328088)
faster_regexp  2.390000   0.010000   2.400000 (  2.406384)
greedy_regexp 31.560000   0.060000  31.620000 ( 31.788143)

うーん、LabeledTSVParser.new のコストが高いんですかね。

EDIT: parser.call をコメントアウトして prase しないベンチマークの結果がこちらです

              user     system      total        real
ltsv      9.520000   0.030000   9.550000 (  9.570326)
faster_regexp  4.450000   0.020000   4.470000 (  4.470674)
greedy_regexp  4.940000   0.020000   4.960000 (  4.987224)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment