TL;DR Use ruamel.ordereddict instead of Python's OrderedDict. Significant improvement in memory and CPU usage.
I was having memory issues with w3af
so I started to experiment with different libraries for ordered dicts (since profiling was showing some strange things in that area). These are the results of some memory profiling tests I run:
- Lower memory usage: 11.574 MiB
ruamel.ordereddict
- Higher memory usage: 69.742 MiB SQLMap's ordered dict; almost the same memory usage as the Python's collections.OrderedDict
When running the tests 100 times using timeit this is what I got on my workstation:
- Faster:
ruamel.ordereddict
7.66 seconds - Slowest: Python's collections.OrderedDict 166 seconds
Since ruamel.ordereddict
is implemented in C it makes a lot of sense to see it win, but I wasn't expecting 7sec vs. 166!
My test suite was really quick and easy; maybe Python's OrderedDict performs well in other tests?
[pablo:/tmp] [dic] $ python -m memory_profiler test_ruamel_odict.py
Filename: test_ruamel_odict.py
Line # Mem usage Increment Line Contents
================================================
4 10.789 MiB 0.000 MiB @profile
5 def test():
6 10.793 MiB 0.004 MiB store = []
7 29.180 MiB 18.387 MiB for _ in xrange(50000):
8 29.180 MiB 0.000 MiB store.append(OrderedDict())
9
10
11 11.301 MiB -17.879 MiB store = []
12 28.953 MiB 17.652 MiB for _ in xrange(50000):
13 28.953 MiB 0.000 MiB o = OrderedDict()
14 28.953 MiB 0.000 MiB o['abc'] = 'def'
15 28.953 MiB 0.000 MiB o['def'] = '123'
16 28.953 MiB 0.000 MiB store.append(o)
17
18
19 11.703 MiB -17.250 MiB store = []
20 11.703 MiB 0.000 MiB for _ in xrange(50000):
21 11.574 MiB -0.129 MiB o = OrderedDict()
22 11.574 MiB 0.000 MiB o['abc'] = 'def'
23 11.574 MiB 0.000 MiB del o['abc']
[pablo:/tmp] [dic] $ python -m memory_profiler test_w3af_odict.py
Filename: test_w3af_odict.py
Line # Mem usage Increment Line Contents
================================================
4 9.773 MiB 0.000 MiB @profile
5 def test():
6 9.777 MiB 0.004 MiB store = []
7 42.895 MiB 33.117 MiB for _ in xrange(50000):
8 42.895 MiB 0.000 MiB store.append(OrderedDict())
9
10
11 10.512 MiB -32.383 MiB store = []
12 44.832 MiB 34.320 MiB for _ in xrange(50000):
13 44.832 MiB 0.000 MiB o = OrderedDict()
14 44.832 MiB 0.000 MiB o['abc'] = 'def'
15 44.832 MiB 0.000 MiB o['def'] = '123'
16 44.832 MiB 0.000 MiB store.append(o)
17
18
19 13.082 MiB -31.750 MiB store = []
20 13.086 MiB 0.004 MiB for _ in xrange(50000):
21 13.086 MiB 0.000 MiB o = OrderedDict()
22 13.086 MiB 0.000 MiB o['abc'] = 'def'
23 13.086 MiB 0.000 MiB del o['abc']
[pablo:/tmp] [dic] 43s $ python -m memory_profiler test_sqlmap_odict.py
Filename: test_sqlmap_odict.py
Line # Mem usage Increment Line Contents
================================================
4 9.848 MiB 0.000 MiB @profile
5 def test():
6 9.855 MiB 0.008 MiB store = []
7 59.648 MiB 49.793 MiB for _ in xrange(50000):
8 59.648 MiB 0.000 MiB store.append(OrderedDict())
9
10
11 59.266 MiB -0.383 MiB store = []
12 69.738 MiB 10.473 MiB for _ in xrange(50000):
13 69.738 MiB 0.000 MiB o = OrderedDict()
14 69.738 MiB 0.000 MiB o['abc'] = 'def'
15 69.738 MiB 0.000 MiB o['def'] = '123'
16 69.738 MiB 0.000 MiB store.append(o)
17
18
19 69.738 MiB 0.000 MiB store = []
20 69.742 MiB 0.004 MiB for _ in xrange(50000):
21 69.742 MiB 0.000 MiB o = OrderedDict()
22 69.742 MiB 0.000 MiB o['abc'] = 'def'
23 69.742 MiB 0.000 MiB del o['abc']
[pablo:/tmp] [dic] $ python -m memory_profiler test_cython_odict.py
Filename: test_cython_odict.py
Line # Mem usage Increment Line Contents
================================================
4 9.824 MiB 0.000 MiB @profile
5 def test():
6 9.832 MiB 0.008 MiB store = []
7 45.680 MiB 35.848 MiB for _ in xrange(50000):
8 45.680 MiB 0.000 MiB store.append(OrderedDict())
9
10
11 45.297 MiB -0.383 MiB store = []
12 55.773 MiB 10.477 MiB for _ in xrange(50000):
13 55.773 MiB 0.000 MiB o = OrderedDict()
14 55.773 MiB 0.000 MiB o['abc'] = 'def'
15 55.773 MiB 0.000 MiB o['def'] = '123'
16 55.773 MiB 0.000 MiB store.append(o)
17
18
19 55.773 MiB 0.000 MiB store = []
20 55.773 MiB 0.000 MiB for _ in xrange(50000):
21 55.773 MiB 0.000 MiB o = OrderedDict()
22 55.773 MiB 0.000 MiB o['abc'] = 'def'
23 55.773 MiB 0.000 MiB del o['abc']
[pablo:/tmp] [dic] 3s $ python -m memory_profiler test_collections_odict.py
Filename: test_collections_odict.py
Line # Mem usage Increment Line Contents
================================================
4 9.707 MiB 0.000 MiB @profile
5 def test():
6 9.711 MiB 0.004 MiB store = []
7 59.520 MiB 49.809 MiB for _ in xrange(50000):
8 59.520 MiB 0.000 MiB store.append(OrderedDict())
9
10
11 59.137 MiB -0.383 MiB store = []
12 69.621 MiB 10.484 MiB for _ in xrange(50000):
13 69.621 MiB 0.000 MiB o = OrderedDict()
14 69.621 MiB 0.000 MiB o['abc'] = 'def'
15 69.621 MiB 0.000 MiB o['def'] = '123'
16 69.621 MiB 0.000 MiB store.append(o)
17
18
19 69.621 MiB 0.000 MiB store = []
20 69.625 MiB 0.004 MiB for _ in xrange(50000):
21 69.625 MiB 0.000 MiB o = OrderedDict()
22 69.625 MiB 0.000 MiB o['abc'] = 'def'
23 69.625 MiB 0.000 MiB del o['abc']
All look like this, but changing the imported module:
import timeit
from ruamel.ordereddict import ordereddict as OrderedDict
def test():
store = []
for _ in xrange(50000):
store.append(OrderedDict())
store = []
for _ in xrange(50000):
o = OrderedDict()
o['abc'] = 'def'
o['def'] = '123'
store.append(o)
store = []
for _ in xrange(50000):
o = OrderedDict()
o['abc'] = 'def'
del o['abc']
if __name__ == '__main__':
print(timeit.timeit("test()", setup="from __main__ import test", number=100))
#test()
[pablo:/tmp] [dic] 1 $ python test_ruamel_odict.py
7.6665160656
[pablo:/tmp] [dic] 7s $ python test_cython_odict.py
78.6574699879
[pablo:/tmp] [dic] 1m24s $ python test_collections_odict.py
162.376011133
[pablo:/tmp] [dic] 2m48s $ python test_w3af_odict.py
120.46101284
[pablo:/tmp] [dic] 2m0s $ python test_sqlmap_odict.py
157.93473196