Skip to content

Instantly share code, notes, and snippets.

@sweeneyde
Last active October 26, 2021 21:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sweeneyde/91855e50feb9992b604ddda2d4f1511e to your computer and use it in GitHub Desktop.
Save sweeneyde/91855e50feb9992b604ddda2d4f1511e to your computer and use it in GitHub Desktop.
store_subscr results with 3 specializations

3 Specialized opcodes:

* STORE_SUBSCR_LIST_INT
* STORE_SUBSCR_DICT_UNICODE
* STORE_SUBSCR_BYTEARRAY_INT

All benchmarks below are with GCC PGO on WSL, but without CPU isolation.

Pyperformance:

Benchmark 1025_main store_subscr3
nbody 126 ms 106 ms: 1.19x faster
unpack_sequence 45.4 ns 41.9 ns: 1.08x faster
meteor_contest 110 ms 106 ms: 1.03x faster
logging_format 8.91 us 8.61 us: 1.03x faster
logging_simple 7.85 us 7.62 us: 1.03x faster
richards 65.7 ms 64.1 ms: 1.02x faster
logging_silent 123 ns 121 ns: 1.02x faster
json_loads 29.3 us 28.8 us: 1.02x faster
regex_compile 160 ms 158 ms: 1.02x faster
sympy_integrate 24.5 ms 24.2 ms: 1.01x faster
pickle_list 4.31 us 4.26 us: 1.01x faster
nqueens 97.2 ms 96.0 ms: 1.01x faster
scimark_lu 151 ms 150 ms: 1.01x faster
scimark_sor 167 ms 166 ms: 1.01x faster
hexiom 8.02 ms 7.96 ms: 1.01x faster
sympy_sum 198 ms 197 ms: 1.01x faster
python_startup_no_site 6.35 ms 6.36 ms: 1.00x slower
xml_etree_iterparse 111 ms 112 ms: 1.01x slower
unpickle_pure_python 298 us 301 us: 1.01x slower
regex_v8 23.4 ms 23.6 ms: 1.01x slower
xml_etree_process 62.5 ms 63.2 ms: 1.01x slower
dulwich_log 84.1 ms 85.2 ms: 1.01x slower
deltablue 5.21 ms 5.28 ms: 1.01x slower
pathlib 21.2 ms 21.5 ms: 1.02x slower
pickle_dict 27.8 us 28.2 us: 1.02x slower
pickle_pure_python 406 us 413 us: 1.02x slower
mako 12.5 ms 12.8 ms: 1.02x slower
chaos 87.8 ms 89.7 ms: 1.02x slower
scimark_sparse_mat_mult 5.96 ms 6.09 ms: 1.02x slower
pyflate 566 ms 579 ms: 1.02x slower
unpickle_list 5.12 us 5.25 us: 1.03x slower
json_dumps 13.4 ms 13.7 ms: 1.03x slower
scimark_fft 400 ms 414 ms: 1.03x slower
fannkuch 463 ms 479 ms: 1.03x slower
telco 6.73 ms 6.97 ms: 1.04x slower
chameleon 7.60 ms 7.99 ms: 1.05x slower
pickle 12.0 us 12.6 us: 1.05x slower
Geometric mean (ref) 1.00x faster

Benchmark hidden because not significant (18): go, django_template, xml_etree_parse, raytrace, sympy_expand, regex_effbot, sympy_str, crypto_pyaes, python_startup, pidigits, regex_dna, 2to3, float, xml_etree_generate, tornado_http, scimark_monte_carlo, spectral_norm, unpickle

Microbenchmarks:

from pyperf import Runner
runner = Runner()

runner.timeit("dict[str]=...",
    setup="from itertools import repeat; d = dict()",
    stmt="for s in repeat('foobar', 100_000):\n"
         "    d[s] = d[s] = d[s] = d[s] = d[s] = d[s] = d[s] = d[s] = d[s] = d[s] = s"
)
runner.timeit("dict[tuple]=...",
    setup="from itertools import repeat; d = dict()",
    stmt="for s in repeat((1, 2), 100_000):\n"
         "    d[s] = d[s] = d[s] = d[s] = d[s] = d[s] = d[s] = d[s] = d[s] = d[s] = s"
)
runner.timeit("list[int]=...",
    setup="from itertools import repeat; L = [0]*10",
    stmt="for i in repeat(2, 100_000):\n"
         "    L[i] = L[i] = L[i] = L[i] = L[i] = L[i] = L[i] = L[i] = L[i] = L[i] = i"
)
runner.timeit("bytearray[int]=...",
    setup="from itertools import repeat; a = bytearray([0]*10)",
    stmt="for i in repeat(2, 100_000):\n"
         "    a[i] = a[i] = a[i] = a[i] = a[i] = a[i] = a[i] = a[i] = a[i] = a[i] = i"
)
Benchmark main_subscr_micro subscr_micro
bytearray[int]=... 16.6 ms 8.14 ms: 2.03x faster
list[int]=... 15.6 ms 7.77 ms: 2.01x faster
dict[str]=... 18.5 ms 13.2 ms: 1.41x faster
dict[tuple]=... 23.5 ms 23.1 ms: 1.02x faster
Geometric mean (ref) 1.56x faster

Specialization details:

Summary:
      bm_crypto_pyaes: 99.7%
   bm_django_template: 77.0%
          bm_fannkuch: 33.5%
    bm_meteor_contest: 99.9%
             bm_nbody: 99.98%
           bm_nqueens: 66.3%
     bm_regex_compile: 98.2%
         bm_regex_dna: 99.87%
          bm_regex_v8: 99.47%
           bm_scimark: 0.01% <-- array.array throws things off
  
        weighted mean: 46.9%
  mean of percentages: 77.4%


bm_crypto_pyaes: 99.7%
    store_subscr.specialization_success : 60
    store_subscr.specialization_failure : 47
    store_subscr.hit : 428697
    store_subscr.deferred : 616
    store_subscr.miss : 7
    store_subscr.deopt : 0
    store_subscr.unquickened : 363

bm_django_template: 77.0%
    store_subscr.specialization_success : 91
    store_subscr.specialization_failure : 1359
    store_subscr.hit : 290305 
    store_subscr.deferred : 83763
    store_subscr.miss : 119
    store_subscr.deopt : 0
    store_subscr.unquickened : 1200

bm_fannkuch: 33.5%
    store_subscr.specialization_success : 52
    store_subscr.specialization_failure : 108223
    store_subscr.hit : 3538352
    store_subscr.deferred : 6925551
    store_subscr.miss : 7
    store_subscr.deopt : 0
    store_subscr.unquickened : 353

bm_meteor_contest: 99.9%
    store_subscr.specialization_success : 51
    store_subscr.specialization_failure : 20
    store_subscr.hit : 1030974
    store_subscr.deferred : 571
    store_subscr.miss : 7
    store_subscr.deopt : 0
    store_subscr.unquickened : 337

bm_nbody: 99.98%
    store_subscr.specialization_success : 62
    store_subscr.specialization_failure : 20
    store_subscr.hit : 6001629
    store_subscr.deferred : 572
    store_subscr.miss : 7
    store_subscr.deopt : 0
    store_subscr.unquickened : 376

bm_nqueens: 66.3%
    store_subscr.specialization_success : 53
    store_subscr.specialization_failure : 6870
    store_subscr.hit : 878448
    store_subscr.deferred : 438965
    store_subscr.miss : 7
    store_subscr.deopt : 0
    store_subscr.unquickened : 349

bm_regex_compile: 98.2%
    store_subscr.specialization_success : 56
    store_subscr.specialization_failure : 540
    store_subscr.hit : 1866375
    store_subscr.deferred : 33863
    store_subscr.miss : 46
    store_subscr.deopt : 0
    store_subscr.unquickened : 329

bm_regex_dna: 99.87%
    store_subscr.specialization_success : 53
    store_subscr.specialization_failure : 20
    store_subscr.hit : 815135
    store_subscr.deferred : 596
    store_subscr.miss : 7
    store_subscr.deopt : 0
    store_subscr.unquickened : 337

bm_regex_v8: 99.47%
    store_subscr.specialization_success : 55
    store_subscr.specialization_failure : 34
    store_subscr.hit : 332275
    store_subscr.deferred : 1335
    store_subscr.miss : 13
    store_subscr.deopt : 0
    store_subscr.unquickened : 326

bm_scimark: 0.01%
    store_subscr.specialization_success : 50
    store_subscr.specialization_failure : 146953
    store_subscr.hit : 1666
    store_subscr.deferred : 9403966
    store_subscr.miss : 7
    store_subscr.deopt : 0
    store_subscr.unquickened : 407
@sweeneyde
Copy link
Author

Another run:

Benchmark 1026_main 1026_store_subscr
nbody 128 ms 102 ms: 1.25x faster
pickle_dict 30.4 us 28.4 us: 1.07x faster
hexiom 8.35 ms 8.03 ms: 1.04x faster
pickle_list 4.54 us 4.38 us: 1.03x faster
django_template 46.5 ms 45.5 ms: 1.02x faster
json_dumps 13.8 ms 13.5 ms: 1.02x faster
pidigits 180 ms 177 ms: 1.02x faster
fannkuch 468 ms 461 ms: 1.02x faster
sympy_integrate 24.5 ms 24.1 ms: 1.01x faster
meteor_contest 111 ms 109 ms: 1.01x faster
scimark_lu 152 ms 150 ms: 1.01x faster
unpack_sequence 40.5 ns 40.3 ns: 1.01x faster
regex_dna 200 ms 201 ms: 1.01x slower
python_startup_no_site 6.35 ms 6.40 ms: 1.01x slower
python_startup 9.02 ms 9.09 ms: 1.01x slower
mako 12.8 ms 12.9 ms: 1.01x slower
go 176 ms 178 ms: 1.01x slower
unpickle_pure_python 298 us 302 us: 1.01x slower
json_loads 28.7 us 29.2 us: 1.02x slower
xml_etree_iterparse 110 ms 112 ms: 1.02x slower
pickle_pure_python 409 us 417 us: 1.02x slower
dulwich_log 83.8 ms 85.6 ms: 1.02x slower
chameleon 7.65 ms 7.83 ms: 1.02x slower
regex_effbot 3.17 ms 3.24 ms: 1.02x slower
xml_etree_parse 153 ms 157 ms: 1.03x slower
deltablue 5.16 ms 5.33 ms: 1.03x slower
logging_silent 121 ns 126 ns: 1.04x slower
telco 6.74 ms 7.01 ms: 1.04x slower
pathlib 20.7 ms 21.7 ms: 1.05x slower
unpickle_list 4.95 us 5.26 us: 1.06x slower
Geometric mean (ref) 1.00x faster

Benchmark hidden because not significant (25): pickle, tornado_http, crypto_pyaes, sympy_expand, nqueens, regex_compile, scimark_sparse_mat_mult, xml_etree_process, 2to3, sympy_sum, scimark_monte_carlo, logging_simple, raytrace, chaos, float, scimark_sor, xml_etree_generate, sympy_str, pyflate, spectral_norm, regex_v8, richards, scimark_fft, logging_format, unpickle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment