Skip to content

Instantly share code, notes, and snippets.

@MnO2
Last active July 15, 2019 16:04
Show Gist options
  • Save MnO2/85b8c7871edfe530a0fed34ec006d5e1 to your computer and use it in GitHub Desktop.
Save MnO2/85b8c7871edfe530a0fed34ec006d5e1 to your computer and use it in GitHub Desktop.
Benchmarks between cedarwood and aho-corasick

Background:

  • Revert a change on DiGraph representation to make the comparison to be apple to apple as much as possible to my knowledge.
  • For aho-corasick test case I removed the code for dynamic insertion feature in Jieba since aho-corasick doesn't support dynamic rebuilding of state graph, but it doesn't affect the code path used in benchmarks.

cedarwood:

Gnuplot not found, disabling plotting
jieba cut/no hmm        time:   [5.7790 us 5.7979 us 5.8200 us]
                        thrpt:  [19.664 MiB/s 19.738 MiB/s 19.803 MiB/s]
                 change:
                        time:   [+7.6765% +9.1326% +10.534%] (p = 0.00 < 0.05)
                        thrpt:  [-9.5305% -8.3684% -7.1292%]
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe
jieba cut/with hmm      time:   [8.0036 us 8.0301 us 8.0610 us]
                        thrpt:  [14.197 MiB/s 14.252 MiB/s 14.299 MiB/s]
                 change:
                        time:   [+6.1904% +8.0564% +10.755%] (p = 0.00 < 0.05)
                        thrpt:  [-9.7104% -7.4558% -5.8296%]
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) high mild
  9 (9.00%) high severe
jieba cut/cut_all       time:   [4.0113 us 4.0206 us 4.0306 us]
                        thrpt:  [28.393 MiB/s 28.464 MiB/s 28.530 MiB/s]
                 change:
                        time:   [+2.0405% +3.7239% +5.4264%] (p = 0.00 < 0.05)
                        thrpt:  [-5.1471% -3.5902% -1.9997%]
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe
jieba cut/cut_for_search
                        time:   [9.1305 us 9.1718 us 9.2167 us]
                        thrpt:  [12.417 MiB/s 12.477 MiB/s 12.534 MiB/s]
                 change:
                        time:   [+4.4348% +6.5492% +8.7662%] (p = 0.00 < 0.05)
                        thrpt:  [-8.0596% -6.1467% -4.2465%]
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  7 (7.00%) high mild
  7 (7.00%) high severe

aho-corasick (NFA)

Gnuplot not found, disabling plotting
jieba cut/no hmm        time:   [8.8387 us 8.8741 us 8.9140 us]
                        thrpt:  [12.838 MiB/s 12.896 MiB/s 12.948 MiB/s]
                 change:
                        time:   [+52.915% +56.293% +60.675%] (p = 0.00 < 0.05)
                        thrpt:  [-37.763% -36.018% -34.604%]
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe
jieba cut/with hmm      time:   [11.271 us 11.311 us 11.359 us]
                        thrpt:  [10.075 MiB/s 10.117 MiB/s 10.153 MiB/s]
                 change:
                        time:   [+37.538% +41.531% +45.852%] (p = 0.00 < 0.05)
                        thrpt:  [-31.437% -29.344% -27.293%]
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe
jieba cut/cut_all       time:   [4.2747 us 4.3068 us 4.3479 us]
                        thrpt:  [26.321 MiB/s 26.572 MiB/s 26.771 MiB/s]
                 change:
                        time:   [+4.8126% +6.6805% +8.4397%] (p = 0.00 < 0.05)
                        thrpt:  [-7.7829% -6.2622% -4.5916%]
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe
jieba cut/cut_for_search
                        time:   [12.902 us 12.959 us 13.030 us]
                        thrpt:  [8.7829 MiB/s 8.8311 MiB/s 8.8697 MiB/s]
                 change:
                        time:   [+38.426% +42.113% +45.865%] (p = 0.00 < 0.05)
                        thrpt:  [-31.443% -29.633% -27.759%]
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) high mild
  10 (10.00%) high severe

aho-corasick (DFA enabled)

     Running target/release/deps/jieba_benchmark-b1dbba6c9adea591
Gnuplot not found, disabling plotting
jieba cut/no hmm        time:   [8.0768 us 8.4702 us 9.0067 us]
                        thrpt:  [12.706 MiB/s 13.511 MiB/s 14.169 MiB/s]
                 change:
                        time:   [-5.7439% +0.4358% +7.1091%] (p = 0.91 > 0.05)
                        thrpt:  [-6.6372% -0.4339% +6.0939%]
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe
jieba cut/with hmm      time:   [10.260 us 10.324 us 10.424 us]
                        thrpt:  [10.979 MiB/s 11.085 MiB/s 11.154 MiB/s]
                 change:
                        time:   [-9.6201% -8.7676% -7.9562%] (p = 0.00 < 0.05)
                        thrpt:  [+8.6439% +9.6102% +10.644%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
jieba cut/cut_all       time:   [4.1104 us 4.1196 us 4.1305 us]
                        thrpt:  [27.706 MiB/s 27.779 MiB/s 27.842 MiB/s]
                 change:
                        time:   [-9.2606% -7.7680% -6.2018%] (p = 0.00 < 0.05)
                        thrpt:  [+6.6119% +8.4223% +10.206%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe
jieba cut/cut_for_search
                        time:   [11.844 us 11.958 us 12.126 us]
                        thrpt:  [9.4373 MiB/s 9.5705 MiB/s 9.6621 MiB/s]
                 change:
                        time:   [-12.098% -9.8210% -7.4515%] (p = 0.00 < 0.05)
                        thrpt:  [+8.0514% +10.891% +13.763%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment