Skip to content

Instantly share code, notes, and snippets.

@zamazan4ik
Created January 7, 2024 21:45
Show Gist options
  • Save zamazan4ik/5440096a8a9b3265b402d9481eab3e10 to your computer and use it in GitHub Desktop.
Save zamazan4ik/5440096a8a9b3265b402d9481eab3e10 to your computer and use it in GitHub Desktop.
Tokenizers: Instrumented compared to Release
Running `/home/zamazan4ik/open_source/tokenizers/tokenizers/target/x86_64-unknown-linux-gnu/release/deps/bert_benchmark-265329d2ad95192b --bench`
Benchmarking WordPiece BERT encode
Benchmarking WordPiece BERT encode: Warming up for 3.0000 s
Benchmarking WordPiece BERT encode: Collecting 20 samples in estimated 5.0017 s (208k iterations)
Benchmarking WordPiece BERT encode: Analyzing
WordPiece BERT encode time: [24.207 µs 24.210 µs 24.213 µs]
change: [+61.027% +61.214% +61.393%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 20 measurements (10.00%)
1 (5.00%) low mild
1 (5.00%) high severe
Benchmarking WordPiece BERT encode batch
Benchmarking WordPiece BERT encode batch: Warming up for 3.0000 s
Benchmarking WordPiece BERT encode batch: Collecting 20 samples in estimated 5.7485 s (100 iterations)
Benchmarking WordPiece BERT encode batch: Analyzing
WordPiece BERT encode batch
time: [57.032 ms 57.226 ms 57.431 ms]
change: [+1575.8% +1585.2% +1593.9%] (p = 0.00 < 0.05)
Performance has regressed.
Benchmarking WordPiece Train vocabulary (small)
Benchmarking WordPiece Train vocabulary (small): Warming up for 3.0000 s
Benchmarking WordPiece Train vocabulary (small): Collecting 10 samples in estimated 6.8133 s (110 iterations)
Benchmarking WordPiece Train vocabulary (small): Analyzing
WordPiece Train vocabulary (small)
time: [60.248 ms 60.786 ms 61.230 ms]
change: [+106.76% +109.96% +112.70%] (p = 0.00 < 0.05)
Performance has regressed.
Benchmarking WordPiece Train vocabulary (big)
Benchmarking WordPiece Train vocabulary (big): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 38.3s.
Benchmarking WordPiece Train vocabulary (big): Collecting 10 samples in estimated 38.301 s (10 iterations)
Benchmarking WordPiece Train vocabulary (big): Analyzing
WordPiece Train vocabulary (big)
time: [3.7710 s 3.7951 s 3.8188 s]
change: [+328.18% +334.44% +340.41%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) low mild
Running `/home/zamazan4ik/open_source/tokenizers/tokenizers/target/x86_64-unknown-linux-gnu/release/deps/bpe_benchmark-6e02138d48001b9d --bench`
Benchmarking BPE GPT2 encode
Benchmarking BPE GPT2 encode: Warming up for 3.0000 s
Benchmarking BPE GPT2 encode: Collecting 20 samples in estimated 5.0007 s (346k iterations)
Benchmarking BPE GPT2 encode: Analyzing
BPE GPT2 encode time: [14.469 µs 14.509 µs 14.558 µs]
change: [+36.503% +37.071% +37.816%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 20 measurements (35.00%)
1 (5.00%) low severe
2 (10.00%) low mild
4 (20.00%) high severe
Benchmarking BPE GPT2 encode batch
Benchmarking BPE GPT2 encode batch: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 10.
Benchmarking BPE GPT2 encode batch: Collecting 20 samples in estimated 5.1965 s (210 iterations)
Benchmarking BPE GPT2 encode batch: Analyzing
BPE GPT2 encode batch time: [24.331 ms 24.474 ms 24.650 ms]
change: [+483.43% +487.49% +491.52%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 20 measurements (10.00%)
2 (10.00%) high mild
Benchmarking BPE GPT2 encode, no cache
Benchmarking BPE GPT2 encode, no cache: Warming up for 3.0000 s
Benchmarking BPE GPT2 encode, no cache: Collecting 20 samples in estimated 5.0020 s (185k iterations)
Benchmarking BPE GPT2 encode, no cache: Analyzing
BPE GPT2 encode, no cache
time: [26.906 µs 26.917 µs 26.927 µs]
change: [+36.510% +36.733% +36.942%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 20 measurements (10.00%)
1 (5.00%) low mild
1 (5.00%) high mild
Benchmarking BPE GPT2 encode batch, no cache
Benchmarking BPE GPT2 encode batch, no cache: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 9.5s, enable flat sampling, or reduce sample count to 10.
Benchmarking BPE GPT2 encode batch, no cache: Collecting 20 samples in estimated 9.5234 s (210 iterations)
Benchmarking BPE GPT2 encode batch, no cache: Analyzing
BPE GPT2 encode batch, no cache
time: [45.166 ms 45.350 ms 45.531 ms]
change: [+1041.3% +1046.3% +1050.9%] (p = 0.00 < 0.05)
Performance has regressed.
Benchmarking BPE Train vocabulary (small)
Benchmarking BPE Train vocabulary (small): Warming up for 3.0000 s
Benchmarking BPE Train vocabulary (small): Collecting 10 samples in estimated 6.6338 s (110 iterations)
Benchmarking BPE Train vocabulary (small): Analyzing
BPE Train vocabulary (small)
time: [59.898 ms 60.111 ms 60.491 ms]
change: [+125.30% +128.63% +132.18%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking BPE Train vocabulary (big)
Benchmarking BPE Train vocabulary (big): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 38.8s.
Benchmarking BPE Train vocabulary (big): Collecting 10 samples in estimated 38.804 s (10 iterations)
Benchmarking BPE Train vocabulary (big): Analyzing
BPE Train vocabulary (big)
time: [3.8338 s 3.8530 s 3.8725 s]
change: [+330.01% +335.95% +341.73%] (p = 0.00 < 0.05)
Performance has regressed.
Running `/home/zamazan4ik/open_source/tokenizers/tokenizers/target/x86_64-unknown-linux-gnu/release/deps/layout_benchmark-1b6429727c41181c --bench`
Benchmarking TemplateProcessing single encode
Benchmarking TemplateProcessing single encode: Warming up for 3.0000 s
Benchmarking TemplateProcessing single encode: Collecting 20 samples in estimated 5.0000 s (2.5M iterations)
Benchmarking TemplateProcessing single encode: Analyzing
TemplateProcessing single encode
time: [1.2197 µs 1.2248 µs 1.2322 µs]
change: [+28.641% +34.412% +40.339%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 20 measurements (10.00%)
1 (5.00%) high mild
1 (5.00%) high severe
Benchmarking TemplateProcessing pair encode
Benchmarking TemplateProcessing pair encode: Warming up for 3.0000 s
Benchmarking TemplateProcessing pair encode: Collecting 20 samples in estimated 5.0006 s (1.3M iterations)
Benchmarking TemplateProcessing pair encode: Analyzing
TemplateProcessing pair encode
time: [2.8160 µs 2.8354 µs 2.8682 µs]
change: [+23.616% +31.051% +38.425%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 20 measurements (15.00%)
1 (5.00%) high mild
2 (10.00%) high severe
Running `/home/zamazan4ik/open_source/tokenizers/tokenizers/target/x86_64-unknown-linux-gnu/release/deps/unigram_benchmark-60e4f6f5f848a010 --bench`
Benchmarking Unigram Train vocabulary (small)
Benchmarking Unigram Train vocabulary (small): Warming up for 3.0000 s
Benchmarking Unigram Train vocabulary (small): Collecting 10 samples in estimated 5.6442 s (330 iterations)
Benchmarking Unigram Train vocabulary (small): Analyzing
Unigram Train vocabulary (small)
time: [16.982 ms 17.052 ms 17.117 ms]
change: [+112.62% +115.04% +117.39%] (p = 0.00 < 0.05)
Performance has regressed.
Benchmarking Unigram Train vocabulary (medium)
Benchmarking Unigram Train vocabulary (medium): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 35.8s.
Benchmarking Unigram Train vocabulary (medium): Collecting 10 samples in estimated 35.817 s (10 iterations)
Benchmarking Unigram Train vocabulary (medium): Analyzing
Unigram Train vocabulary (medium)
time: [3.5309 s 3.5400 s 3.5500 s]
change: [+453.13% +459.35% +465.02%] (p = 0.00 < 0.05)
Performance has regressed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment