Skip to content

Instantly share code, notes, and snippets.

@nascheme
Last active September 9, 2017 23:00
Show Gist options
  • Save nascheme/69ad9ef533e0c654bf7566f437405ab2 to your computer and use it in GitHub Desktop.
Save nascheme/69ad9ef533e0c654bf7566f437405ab2 to your computer and use it in GitHub Desktop.
CPython dev sprint 2017: Startup speed: idea, lazy creation of module definitions, global values
See:
https://public.etherpad-mozilla.org/p/cpython-dev-sprint-2017
https://github.com/warsaw/lazyimport/blob/master/lazy_compile.py
https://github.com/warsaw/lazyimport/blob/master/lazy_helper.py
This idea is based on a comment from Larry Hastings. PHP got a good
speedup by not creating all functions defined in the source. Python could
doso mething similar for classes and functions. Perhaps without not too
much backwards compatibility problems. This would be a huge win for
startup and memory usage of command-line tools that use large libraries
but each invocation only uses a small subset of those libraries. Lazy
loading per module helps but doing it per function and per class would be
much more powerful.
Could be prototyped using an AST transformer. Make function global
variable a property, when accessed actually create function (from marshal
byte string stored in memory). Should be safe to do as AST analysis finds
side-effect potential code and does not make it lazy.
Analyzer prototype: https://github.com/warsaw/lazyimport (lazy_analyze.py)
Same issues as lazy module load safety check:
- from .. import ... could raise error
- class A(B) metaclass side-effects
Safe with command-line flag to turn it on per app? Otherwise, mark each
module safe using compiler directive, e.g.
from __future__ import __lazy__
If it will be future behavior (I think it should be), provide mechanisms
to do non-lazy side effects. If I understand Guido's position, should be
done by calling module function to get the size effects you want. I.e.
the person using a library should determine when side-effects happen, not
happen just as side-effect of import. Idea from Barry Warsaw allow
__init__ function, marks whole module as lazy safe, gets called on import.
Prototype of alternative compiler that produces modules that are loaded
this way:
https://github.com/warsaw/lazyimport/blob/master/lazy_compile.py
Doesn't quite work for a few reasons:
- inpecting the module.__dict__ directly will not show the lazy global
defs. Could break inspection tools, IDEs. Maybe not fatal problem.
dir() still works if done as property on module.
- LOAD_NAME is serious problem. A load of the global from within the
module is done with the LOAD_NAME opcode. That does a direct
PyDict_GetItem. No place to put property hook and wake-up object. The
fact that properties don't work for LOAD_NAME means that PEP 549 has a
similar issue (I think).
How to fix the LOAD_NAME problem? In retrospect, LOAD_NAME should not
exist and should have just LOAD_ATTR. LOAD_NAME is there for historical
reasons. It is faster than LOAD_ATTR but the bigger problem is how
'globals' gets passed around. E.g. exec(code, module) does not work, you
have to pass exec() the module __dict__. Then, inside ceval, you don't
have access to the module (could look it up or keep circular ref, ugly).
So, no way to look for property and no way to override __getattr__.
This problem exists for assigning __class__ on modules. You can do
import a
a.b # b is a property
but this does not work:
# implementation of a
import sys
class MyClass(type(sys)):
@property
def x(self):
...
sys.modules[__name__] = MyClass
# try to use x
print(x) # does not exist in the __dict__, LOAD_NAME fails
Some ideas on how to fix. Allow exec() to take a module, make the module
__dict__ a subtype that has weakref to the module. Then LOAD_NAME can get
the module and do LOAD_ATTR. Optimize idea, don't set dict ref to module
unless properties are defined (i.e use LOAD_NAME unless properties are
defined).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment