methane/python-ram-usage.md Secret

## python-ram-usage.md

      
    Raw
  

              python-ram-usage.md
            
          
    Investigating Python memory usage of one real Web application

preface

I read this blog https://engineering.instagram.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172#.lenebvdgn
After reading it, I’m thinking about how Python can reduce memory usage of Web applications. My company develop medium-sized API server, using Flask, SQLAlchemy and typing heavily.
So let’s investigate memory usage of real world web application.
investigating overall memory usage

import app  # application using Flask and SQLAlchemy.
import sys

allobjs = sys.getobjects(0)  # using Python is built --with-pydebug
from collections import Counter

mem_stats = Counter()
count_stats = Counter()

for o in allobjs:
    # Py_TRACE_REFS takes 16bytes for each object.
    mem_stats[type(o)] += max(0, sys.getsizeof(o) - 16)
    count_stats[type(o)] += 1

for t, s in mem_stats.most_common(20):
    c = count_stats[t]
    print(f"{t} {s} ({s/1024:.2f}KB) / {c} = {s/c:.3f}bytes")

Sorry, this script counts static memory usage.  It doesn’t care about dynamic memory usage of processing real request.
$ python3 invmem.py
<class 'str'> 7767161 (7585.12KB) / 71099 = 109.244bytes
<class 'dict'> 7467344 (7292.33KB) / 16535 = 451.608bytes
<class 'tuple'> 5526072 (5396.55KB) / 73754 = 74.926bytes
<class 'code'> 2437824 (2380.69KB) / 16859 = 144.601bytes
<class 'bytes'> 2416128 (2359.50KB) / 34519 = 69.994bytes
<class 'function'> 2347632 (2292.61KB) / 17262 = 136.000bytes
<class 'type'> 2298816 (2244.94KB) / 2370 = 969.965bytes
<class 'set'> 1345376 (1313.84KB) / 4349 = 309.353bytes
<class 'weakref'> 833520 (813.98KB) / 10419 = 80.000bytes
<class 'int'> 527360 (515.00KB) / 17877 = 29.499bytes
<class 'list'> 427400 (417.38KB) / 4436 = 96.348bytes
<class 'typing.GenericMeta'> 315408 (308.02KB) / 355 = 888.473bytes
<class 'sqlalchemy.sql.visitors.VisitableType'> 276984 (270.49KB) / 254 = 1090.488bytes
<class '_io.BufferedWriter'> 266768 (260.52KB) / 3 = 88922.667bytes
<class 'collections.deque'> 213616 (208.61KB) / 338 = 632.000bytes
<class 'getset_descriptor'> 165312 (161.44KB) / 2296 = 72.000bytes
<class '_io.BufferedReader'> 131248 (128.17KB) / 1 = 131248.000bytes
<class 'frozenset'> 129408 (126.38KB) / 324 = 399.407bytes
<class 'enum.EnumMeta'> 115992 (113.27KB) / 110 = 1054.473bytes
<class 'cell'> 110784 (108.19KB) / 2308 = 48.000bytes

As you can see, str is most big memory eater.  I found SQLAlchemy uses very long docstrings.
Let’s use -OO to remove docstrings.
$ python3 -OO invmem.py
<class 'dict'> 7462960 (7288.05KB) / 16531 = 451.452bytes
<class 'tuple'> 5463120 (5335.08KB) / 73437 = 74.392bytes
<class 'str'> 4644814 (4535.95KB) / 63233 = 73.456bytes
<class 'code'> 2423712 (2366.91KB) / 16761 = 144.604bytes
<class 'bytes'> 2376452 (2320.75KB) / 34323 = 69.238bytes
<class 'function'> 2347360 (2292.34KB) / 17260 = 136.000bytes
<class 'type'> 2298240 (2244.38KB) / 2370 = 969.722bytes
<class 'set'> 1345376 (1313.84KB) / 4349 = 309.353bytes
<class 'weakref'> 833520 (813.98KB) / 10419 = 80.000bytes
<class 'int'> 507924 (496.02KB) / 17292 = 29.373bytes
<class 'list'> 427096 (417.09KB) / 4435 = 96.301bytes
<class 'typing.GenericMeta'> 315408 (308.02KB) / 355 = 888.473bytes
<class 'sqlalchemy.sql.visitors.VisitableType'> 276984 (270.49KB) / 254 = 1090.488bytes
<class '_io.BufferedWriter'> 266768 (260.52KB) / 3 = 88922.667bytes
<class 'collections.deque'> 213616 (208.61KB) / 338 = 632.000bytes
<class 'getset_descriptor'> 165312 (161.44KB) / 2296 = 72.000bytes
<class '_io.BufferedReader'> 131248 (128.17KB) / 1 = 131248.000bytes
<class 'frozenset'> 129408 (126.38KB) / 324 = 399.407bytes
<class 'enum.EnumMeta'> 115992 (113.27KB) / 110 = 1054.473bytes
<class 'cell'> 110784 (108.19KB) / 2308 = 48.000bytes

dict and tuple eats most memory now.
investigating dicts

#invdict.py
import app
import sys

alldicts = sys.getobjects(0, dict)
for d in alldicts:
    print(d.keys())

and result
# python3 invmem.py | sort | uniq -c | sort -nr > dicts.txt
1618 dict_keys([])
1198 dict_keys(['data', '_remove', '_pending_removals', '_iterating'])
 712 dict_keys(['_value_', '_name_', '__objclass__'])
 622 dict_keys(['name', 'loader', 'origin', 'loader_state', 'submodule_search_locations', '_set_fileattr', '_cached', '_initializing'])
 601 dict_keys(['name', 'path'])
 382 dict_keys(['return'])
 338 dict_keys(['class_', 'key', 'impl', 'comparator', '_of_type', '__doc__'])
 325 dict_keys(['key', 'name', 'table', 'type', 'is_literal', 'primary_key', 'nullable', 'default', 'server_default', 'server_onupdate', 'index', 'unique', 'system', 'doc', 'onupdate', 'autoincrement', 'constraints', 'foreign_keys', '_creation_order', 'dispatch', 'comparator', 'proxy_set'])
 295 dict_keys(['display_width', 'unsigned', 'zerofill'])
 272 dict_keys(['__module__', '__doc__'])
 256 dict_keys(['__wrapped__'])
 231 dict_keys(['__module__', '__slots__', '__doc__', '__abstractmethods__', '_abc_registry', '_abc_cache', '_abc_negative_cache', '_abc_negative_cache_version', '__parameters__', '__args__', '__origin__', '__extra__', '__next_in_mro__', '__orig_bases__', '__subclasshook__', '__tree_hash__'])
 166 dict_keys(['name', 'mod', 'attr'])
 162 dict_keys(['fget', '__doc__', '__name__'])
 161 dict_keys(['dict_', 'return'])
 161 dict_keys(['__module__', '__init__', 'serialize', 'deserialize', '__dict__', '__weakref__', '__doc__'])
 140 dict_keys(['for_update', 'arg', 'dispatch', 'column'])
 129 dict_keys(['cond', 'param', 'return'])
 119 dict_keys(['args', 'self_arg', 'apply_pos', 'apply_kw'])
 111 dict_keys(['__module__', '__slots__', '__new__', '__doc__', '__abstractmethods__', '_abc_registry', '_abc_cache', '_abc_negative_cache', '_abc_negative_cache_version', '__parameters__', '__args__', '__origin__', '__extra__', '__next_in_mro__', '__orig_bases__', '__subclasshook__', '__tree_hash__'])
 110 dict_keys(['name'])
 109 dict_keys(['__isabstractmethod__'])
 106 dict_keys(['data', '_remove', '_pending_removals', '_iterating', '_dirty_len'])
 103 dict_keys(['rule', 'is_leaf', 'map', 'strict_slashes', 'subdomain', 'host', 'defaults', 'build_only', 'alias', 'methods', 'endpoint', 'redirect_to', 'arguments', '_trace', '_converters', '_regex', '_weights', 'provide_automatic_options'])
  98 dict_keys(['_fields', '__module__', '__doc__'])
  86 dict_keys(['name', 'mod'])
  78 dict_keys(['__module__', '__init__', '__dict__', '__weakref__', '__doc__'])
  74 dict_keys(['_list'])
  69 dict_keys(['_loaders', 'path', '_path_mtime', '_path_cache', '_relaxed_path_cache'])

notes


1/10 of dicts are empty.

How much of them are shared-key dict?


['data', '_remove', '_pending_removals', '_iterating'] seems WeakSet.  I is used by typing, through ABCMeta.
['_value_', '_name_', '__objclass__'] seems enum value.  Since we use code generator to share types with client code (C#), it’s not so surprising.  And it’s not general issue.
622 dict_keys(['name', 'loader', 'origin', 'loader_state', 'submodule_search_locations', '_set_fileattr', '_cached', '_initializing'])
This is importlib.ModuleSpec .

Can we use slots for ModuleSpec?


601 dict_keys(['name', 'path'])
Maybe, importlib._bootstrap_external.FIleLoader ?
Can we use slots?
382 dict_keys(['return']) , 161 dict_keys(['dict_', 'return']) , etc…
It seems function’s __annotations__ , but we use them only for PyCharm and mypy. Can we delay building the dict?
SQLAlchemy based application use so much dicts.

investigating shared key dicts

How many of them are shared-key? Let’s investigate.
import app
import sys

alldicts = sys.getobjects(0, dict)
import _testcapi

for d in alldicts:
    if _testcapi.dict_hassplittable(d):
        print("shared", tuple(d.keys()))
    else:
        print(tuple(d.keys()))

top30 are:
1618 ()
1198 shared ('data', '_remove', '_pending_removals', '_iterating')
 712 shared ('_value_', '_name_', '__objclass__')
 622 shared ('name', 'loader', 'origin', 'loader_state', 'submodule_search_locations', '_set_fileattr', '_cached', '_initializing')
 601 shared ('name', 'path')
 382 ('return',)
 338 shared ('class_', 'key', 'impl', 'comparator', '_of_type', '__doc__')
 325 ('key', 'name', 'table', 'type', 'is_literal', 'primary_key', 'nullable', 'default', 'server_default', 'server_onupdate', 'index', 'unique', 'system', 'doc', 'onupdate', 'autoincrement', 'constraints', 'foreign_keys', '_creation_order', 'dispatch', 'comparator', 'proxy_set')
 295 shared ('display_width', 'unsigned', 'zerofill')
 272 ('__module__', '__doc__')
 256 ('__wrapped__',)
 231 ('__module__', '__slots__', '__doc__', '__abstractmethods__', '_abc_registry', '_abc_cache', '_abc_negative_cache', '_abc_negative_cache_version', '__parameters__', '__args__', '__origin__', '__extra__', '__next_in_mro__', '__orig_bases__', '__subclasshook__', '__tree_hash__')
 166 shared ('name', 'mod', 'attr')
 162 shared ('fget', '__doc__', '__name__')
 161 ('dict_', 'return')
 161 ('__module__', '__init__', 'serialize', 'deserialize', '__dict__', '__weakref__', '__doc__')
 140 shared ('for_update', 'arg', 'dispatch', 'column')
 129 ('cond', 'param', 'return')
 119 ('args', 'self_arg', 'apply_pos', 'apply_kw')
 111 ('__module__', '__slots__', '__new__', '__doc__', '__abstractmethods__', '_abc_registry', '_abc_cache', '_abc_negative_cache', '_abc_negative_cache_version', '__parameters__', '__args__', '__origin__', '__extra__', '__next_in_mro__', '__orig_bases__', '__subclasshook__', '__tree_hash__')
 109 ('__isabstractmethod__',)
 106 shared ('data', '_remove', '_pending_removals', '_iterating', '_dirty_len')
 104 shared ('name',)
 103 shared ('rule', 'is_leaf', 'map', 'strict_slashes', 'subdomain', 'host', 'defaults', 'build_only', 'alias', 'methods', 'endpoint', 'redirect_to', 'arguments', '_trace', '_converters', '_regex', '_weights', 'provide_automatic_options')
  98 ('_fields', '__module__', '__doc__')
  86 shared ('name', 'mod')
  78 ('__module__', '__init__', '__dict__', '__weakref__', '__doc__')
  74 shared ('_list',)
  69 shared ('_loaders', 'path', '_path_mtime', '_path_cache', '_relaxed_path_cache')
  68 ('user_id',)


all empty dicts are not shared!

It’s worth enough to optimize empty dict.

ma_keys can be allocated lazily


As intends, instance namespaces are shared well.
but other namespaces or annotations, like ('return',) or ('__wrapped__',)  are not shared

It’s worth enough to have more compact dict optimized for namespaces.


investigating tuple

Memory usage of tuple is surprising.  There are more tuples than strs.
Let’s investigate it!
#invtuple.py
import app
import sys

allobj = sys.getobjects(0, tuple)
for o in reversed(allobj):
    print(o)

Maybe, tuple’s repr doesn’t check NULL.  Since sys.getobjects() returns objects in “most recently created” order, segv happens. That’s why I used reversed().
# python3 invtuple.py | sort | uniq -c | sort -nr > dicts.txt

results are
8146 (None,)
3126 ('self',)
1055 (<class 'object'>,)
 352 ('__class__',)
 343 ('instrument', True)
 342 ('deferred', False)
 341 (('deferred', False), ('instrument', True))
 317 ('NotImplementedError',)
 314 (None, 0)
 261 (None, None)
 241 ('.0', 'x')
 233 ('self', 'other')
 211 (None, False)
 187 (None, True)
 184 (False,)

Notes


More than 10% of tuples are (None,).  It is worth enough to optimize it.

But how they are allocated?  Is it optimized easily?

If it’s def func(a=None) ,we can optimize it in compiler and marshal.loads.


3126 ('self',)  seems function’s co_varnames .  Interning it seems worth enough.
1055 (<class 'object'>,) is mro, maybe.  How about interning it too?

trying reuse (None,) objects

I failed to investigate where (None,)s come from.
(tracemalloc doesn't report traceback for most of them, and segv happens while investigating).
So I wrote small patch and watch it's effect.
patch:
diff -r f44f44b14dfc Python/marshal.c
--- a/Python/marshal.c	Fri Jan 20 08:35:18 2017 +0200
+++ b/Python/marshal.c	Fri Jan 20 19:11:05 2017 +0900
@@ -1190,6 +1190,34 @@ r_object(RFILE *p)
         if (v == NULL)
             break;
 
+        if (n == 1) {
+            v2 = r_object(p);
+            if (v2 == NULL) {
+                if (!PyErr_Occurred())
+                    PyErr_SetString(PyExc_TypeError,
+                        "NULL object in marshal data for tuple");
+                Py_CLEAR(v);
+                break;
+            }
+            PyTuple_SET_ITEM(v, 0, v2);
+
+            if (v2 == Py_None) {
+                static PyObject *none_tuple = NULL;
+                if (none_tuple == NULL) {
+                    Py_INCREF(v);
+                    none_tuple = v;
+                }
+                else {
+                    Py_DECREF(v);
+                    Py_INCREF(none_tuple);
+                    v = none_tuple;
+                }
+            }
+
+            retval = v;
+            break;
+        }
+
         for (i = 0; i < n; i++) {
             v2 = r_object(p);
             if ( v2 == NULL ) {

effect:
$ python3 -OO invtuple.py | sort | uniq -c | sort -nr > tuples2.txt
3126 ('self',)
1055 (<class 'object'>,)
 401 (None,)
 352 ('__class__',)
 343 ('instrument', True)

8146 -> 401, nice!