Skip to content

Instantly share code, notes, and snippets.

@andresriancho
Last active August 29, 2015 14:21
Show Gist options
  • Save andresriancho/cf2fa1ce239b30f37bd9 to your computer and use it in GitHub Desktop.
Save andresriancho/cf2fa1ce239b30f37bd9 to your computer and use it in GitHub Desktop.
Ordered dicts are hard (?)

TL;DR Use ruamel.ordereddict instead of Python's OrderedDict. Significant improvement in memory and CPU usage.

I was having memory issues with w3af so I started to experiment with different libraries for ordered dicts (since profiling was showing some strange things in that area). These are the results of some memory profiling tests I run:

  • Lower memory usage: 11.574 MiB ruamel.ordereddict
  • Higher memory usage: 69.742 MiB SQLMap's ordered dict; almost the same memory usage as the Python's collections.OrderedDict

When running the tests 100 times using timeit this is what I got on my workstation:

  • Faster: ruamel.ordereddict 7.66 seconds
  • Slowest: Python's collections.OrderedDict 166 seconds

Since ruamel.ordereddict is implemented in C it makes a lot of sense to see it win, but I wasn't expecting 7sec vs. 166!

My test suite was really quick and easy; maybe Python's OrderedDict performs well in other tests?

Memory usage test suite and raw results

C implementation of OrderedDict

[pablo:/tmp] [dic] $ python -m memory_profiler test_ruamel_odict.py 
Filename: test_ruamel_odict.py

Line #    Mem usage    Increment   Line Contents
================================================
     4   10.789 MiB    0.000 MiB   @profile
     5                             def test():
     6   10.793 MiB    0.004 MiB       store = []
     7   29.180 MiB   18.387 MiB       for _ in xrange(50000):
     8   29.180 MiB    0.000 MiB           store.append(OrderedDict())
     9                             
    10                             
    11   11.301 MiB  -17.879 MiB       store = []
    12   28.953 MiB   17.652 MiB       for _ in xrange(50000):
    13   28.953 MiB    0.000 MiB           o = OrderedDict()
    14   28.953 MiB    0.000 MiB           o['abc'] = 'def'
    15   28.953 MiB    0.000 MiB           o['def'] = '123'
    16   28.953 MiB    0.000 MiB           store.append(o)
    17                             
    18                             
    19   11.703 MiB  -17.250 MiB       store = []
    20   11.703 MiB    0.000 MiB       for _ in xrange(50000):
    21   11.574 MiB   -0.129 MiB           o = OrderedDict()
    22   11.574 MiB    0.000 MiB           o['abc'] = 'def'
    23   11.574 MiB    0.000 MiB           del o['abc']

The original OrderedDict we used in w3af

[pablo:/tmp] [dic] $ python -m memory_profiler test_w3af_odict.py 
Filename: test_w3af_odict.py

Line #    Mem usage    Increment   Line Contents
================================================
     4    9.773 MiB    0.000 MiB   @profile
     5                             def test():
     6    9.777 MiB    0.004 MiB       store = []
     7   42.895 MiB   33.117 MiB       for _ in xrange(50000):
     8   42.895 MiB    0.000 MiB           store.append(OrderedDict())
     9                             
    10                             
    11   10.512 MiB  -32.383 MiB       store = []
    12   44.832 MiB   34.320 MiB       for _ in xrange(50000):
    13   44.832 MiB    0.000 MiB           o = OrderedDict()
    14   44.832 MiB    0.000 MiB           o['abc'] = 'def'
    15   44.832 MiB    0.000 MiB           o['def'] = '123'
    16   44.832 MiB    0.000 MiB           store.append(o)
    17                             
    18                             
    19   13.082 MiB  -31.750 MiB       store = []
    20   13.086 MiB    0.004 MiB       for _ in xrange(50000):
    21   13.086 MiB    0.000 MiB           o = OrderedDict()
    22   13.086 MiB    0.000 MiB           o['abc'] = 'def'
    23   13.086 MiB    0.000 MiB           del o['abc']

The OrderedDict used in sqlmap

[pablo:/tmp] [dic] 43s $ python -m memory_profiler test_sqlmap_odict.py 
Filename: test_sqlmap_odict.py

Line #    Mem usage    Increment   Line Contents
================================================
     4    9.848 MiB    0.000 MiB   @profile
     5                             def test():
     6    9.855 MiB    0.008 MiB       store = []
     7   59.648 MiB   49.793 MiB       for _ in xrange(50000):
     8   59.648 MiB    0.000 MiB           store.append(OrderedDict())
     9                             
    10                             
    11   59.266 MiB   -0.383 MiB       store = []
    12   69.738 MiB   10.473 MiB       for _ in xrange(50000):
    13   69.738 MiB    0.000 MiB           o = OrderedDict()
    14   69.738 MiB    0.000 MiB           o['abc'] = 'def'
    15   69.738 MiB    0.000 MiB           o['def'] = '123'
    16   69.738 MiB    0.000 MiB           store.append(o)
    17                             
    18                             
    19   69.738 MiB    0.000 MiB       store = []
    20   69.742 MiB    0.004 MiB       for _ in xrange(50000):
    21   69.742 MiB    0.000 MiB           o = OrderedDict()
    22   69.742 MiB    0.000 MiB           o['abc'] = 'def'
    23   69.742 MiB    0.000 MiB           del o['abc']

An ordered dict implemented in cython

[pablo:/tmp] [dic] $ python -m memory_profiler test_cython_odict.py 
Filename: test_cython_odict.py

Line #    Mem usage    Increment   Line Contents
================================================
     4    9.824 MiB    0.000 MiB   @profile
     5                             def test():
     6    9.832 MiB    0.008 MiB       store = []
     7   45.680 MiB   35.848 MiB       for _ in xrange(50000):
     8   45.680 MiB    0.000 MiB           store.append(OrderedDict())
     9                             
    10                             
    11   45.297 MiB   -0.383 MiB       store = []
    12   55.773 MiB   10.477 MiB       for _ in xrange(50000):
    13   55.773 MiB    0.000 MiB           o = OrderedDict()
    14   55.773 MiB    0.000 MiB           o['abc'] = 'def'
    15   55.773 MiB    0.000 MiB           o['def'] = '123'
    16   55.773 MiB    0.000 MiB           store.append(o)
    17                             
    18                             
    19   55.773 MiB    0.000 MiB       store = []
    20   55.773 MiB    0.000 MiB       for _ in xrange(50000):
    21   55.773 MiB    0.000 MiB           o = OrderedDict()
    22   55.773 MiB    0.000 MiB           o['abc'] = 'def'
    23   55.773 MiB    0.000 MiB           del o['abc']

Python's OrderedDict

[pablo:/tmp] [dic] 3s $ python -m memory_profiler test_collections_odict.py 
Filename: test_collections_odict.py

Line #    Mem usage    Increment   Line Contents
================================================
     4    9.707 MiB    0.000 MiB   @profile
     5                             def test():
     6    9.711 MiB    0.004 MiB       store = []
     7   59.520 MiB   49.809 MiB       for _ in xrange(50000):
     8   59.520 MiB    0.000 MiB           store.append(OrderedDict())
     9                             
    10                             
    11   59.137 MiB   -0.383 MiB       store = []
    12   69.621 MiB   10.484 MiB       for _ in xrange(50000):
    13   69.621 MiB    0.000 MiB           o = OrderedDict()
    14   69.621 MiB    0.000 MiB           o['abc'] = 'def'
    15   69.621 MiB    0.000 MiB           o['def'] = '123'
    16   69.621 MiB    0.000 MiB           store.append(o)
    17                             
    18                             
    19   69.621 MiB    0.000 MiB       store = []
    20   69.625 MiB    0.004 MiB       for _ in xrange(50000):
    21   69.625 MiB    0.000 MiB           o = OrderedDict()
    22   69.625 MiB    0.000 MiB           o['abc'] = 'def'
    23   69.625 MiB    0.000 MiB           del o['abc']

CPU tests

All look like this, but changing the imported module:

import timeit
from ruamel.ordereddict import ordereddict as OrderedDict


def test():
    store = []
    for _ in xrange(50000):
        store.append(OrderedDict())


    store = []
    for _ in xrange(50000):
        o = OrderedDict()
        o['abc'] = 'def'
        o['def'] = '123'
        store.append(o)


    store = []
    for _ in xrange(50000):
        o = OrderedDict()
        o['abc'] = 'def'
        del o['abc']


if __name__ == '__main__':
    print(timeit.timeit("test()", setup="from __main__ import test", number=100))
    #test()

Test runs

[pablo:/tmp] [dic] 1 $ python test_ruamel_odict.py 
7.6665160656
[pablo:/tmp] [dic] 7s $ python test_cython_odict.py 
78.6574699879
[pablo:/tmp] [dic] 1m24s $ python test_collections_odict.py 
162.376011133
[pablo:/tmp] [dic] 2m48s $ python test_w3af_odict.py 
120.46101284
[pablo:/tmp] [dic] 2m0s $ python test_sqlmap_odict.py 
157.93473196
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment