Skip to content

Instantly share code, notes, and snippets.

@hi-ogawa
Last active April 15, 2023 17:22
Show Gist options
  • Save hi-ogawa/97e767ffb7cc49164c944e6ff4c550f4 to your computer and use it in GitHub Desktop.
Save hi-ogawa/97e767ffb7cc49164c944e6ff4c550f4 to your computer and use it in GitHub Desktop.
Reading cpython

cpython development

  • misc tips (building, testing, editor setup)
  • module/import system
  • byte code
    • compilation
    • execution
  • generator/coroutine/async/await implementation
  • thread management
    • interpreter initialization
    • acquire/release GIL
  • object/class system
  • memory management

misc tips

building and testing

# Building out of tree
mkdir -p __out__ && cd __out__
../configure --with-pydebug --prefix="$PWD/__prefix__" CC=clang CXX=clang++
make -j
make install

# Testing (cf. Tools/scripts/run_test.py, Lib/test/libregrtest)
make test TESTOPTS=--help            # via `make test`
./python -m test --help              # directly run `Lib/test` module
./python -m test -v test_generators -m GeneratorTest  # filter by module and test case name

editor setup

  • ignore directories e.g. __out__, Doc, Misc, PC, PCbuild, Tools/msi, Mac

  • .vscode/settings.json

{
  "python.defaultInterpreterPath": "${workspaceFolder}/__out__/python",
  "python.pythonPath": "${workspaceFolder}/__out__/python"
}
  • .vscode/c_cpp_properties.json
{
  "configurations": [
    {
      "name": "Linux",
      "includePath": [
        "${workspaceFolder}/Include/internal",
        "${workspaceFolder}/Include",
        "${workspaceFolder}/Objects",
        "${workspaceFolder}/Python",
        "${workspaceFolder}",
        "${workspaceFolder}/__out__"
      ],
      "defines": [],
      "compilerPath": "/usr/bin/clang",
      "cStandard": "c17",
      "cppStandard": "c++14",
      "intelliSenseMode": "linux-clang-x64"
    }
  ],
  "version": 4
}
  • .vscode/launch.json (for Linux)
{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "cppdbg",
      "type": "cppdbg",
      "request": "launch",
      "program": "${workspaceFolder}/__out__/python",
      "args": [],
      "stopAtEntry": false,
      "cwd": "${fileDirname}",
      "environment": [],
      "externalConsole": false,
      "MIMode": "gdb",
      "miDebuggerPath": "/usr/bin/gdb"
    }
  ]
}

debugging

  • with lldb
lldb ./python -o "b _PyEval_EvalFrameDefault" -o "r"
  • with vscode
    • use launch.json above

interpreter architecture

interpreter state

_PyRuntimeState (global variable defined in `pylifecycle.c`)
  main_thread (id of main thread)

  pyinterpreters interpreters
    PyInterpreterState* main, head (as linked list)

  _gilstate_runtime_state gilstate
    PyThreadState* tstate_current (atomic address)
    PyInterpreterState*

  _ceval_runtime_state
    _gil_runtime_state (synchronizaton primitives e.g. locked state variable, condition variable for unlock notification)


PyInterpreterState
  _PyRuntimeState* (back pointer)

  pythreads threads
    PyThreadState* (as doubly linked list)

  _ceval_state ceval
    atomic int gil_drop_request

  _gc_runtime_state gc (TODO)


PyThreadState
  PyInterpreterState* (back pointer)
  CFrame* cframe
  PyObject** datestack_top

interpreter startup

pymain_main =>
  pymain_init => Py_InitializeFromConfig =>
    _PyRuntime_Initialize => _PyRuntimeState_Init => init_runtime =>
      _PyEval_InitRuntimeState => _gil_initialize (initialize `_gil_runtime_state`)
    pyinit_core => pyinit_config =>
      pycore_init_runtime
      pycore_create_interpreter =>
        _PyGILState_Init
        PyInterpreterState_New =>
          alloc_interpreter (on `_PyRuntimeState.interpreters.main`)
          init_interpreter
        PyThreadState_New =>
          new_threadstate (allocate on `PyInterpreterState.threads`)
          _PyThreadState_SetCurrent (set current `PyThreadState` as `_PyRuntimeState.gilstate.autoTSSKey`)
        PyThreadState_Swap => _PyThreadState_Swap (set `_PyRuntimeState.gilstate.tstate_current` global)
        init_interp_create_gil =>
          _PyEval_InitGIL =>
            create_gil (on `_PyRuntimeState.ceval.gil` global) =>
              initialize e.g. condition variable, mutex, then set locked = 0
            take_gil(tstate) =>
              (while locked)
                wait for gil condition variable with timeout
                if timed out, SET_GIL_DROP_REQUEST (set PyInterpreterState.ceval.gil_drop_request = 1)
              (otherwise)
                set locked = 1
      pycore_interp_init(tstate) =>
        pycore_init_global_objects
        _PyGC_Init
        pycore_init_types (define builtin types e.g. `type`, `tuple`, `Exception`)
        pycore_init_builtins => _PyBuiltin_Init (define `builtin` module)
        init_importlib =>
          PyImport_ImportFrozenModule("_frozen_importlib") (aka. importlib/_bootstrap.py)
          _PyImport_BootstrapImp (aka "_imp" c extension (cf. PyInit__imp))
          PyObject_CallMethod (call "_frozen_importlib._install") (append default importers to `sys.meta_path`)
    pyinit_main => init_interp_main =>
      init_importlib_external (call "_frozen_importlib._install_external_importers") =>
        import _frozen_importlib_external (aka importlib/_bootstrap_exxternal.py)
        _frozen_importlib_external._install =>
          append `FileFinder` to `sys.path_books`
          append `PathFinder` to `sys.meta_path`
      add_main_module

  Py_RunMain => pymain_run_python =>
    pymain_run_module (when "python -m ...") =>
      invoke "runpy._run_module_as_main" via C API

module system (importlib)

  • default finder/loader (importlib/_bootstrap_external.py)
[python]
importlib._boostrap._find_and_load (cf. `TARGET(IMPORT_NAME)` in ceval.c below) =>
  _find_and_load_unlocked =>
    _find_spec => PathFinder.find_spec => _get_spec
      FileFinder.find_spec =>
        (iterate loaders e.g. SourceFileLoader)
        _get_spec => spec_from_file_location

    _load_unlocked =>
      module_from_spec (setup module.__name__, etc...)
      SourceFileLoader.exec_module (_LoaderBasics.exec_module) =>
        get_code (SourceLoader.get_code) =>
          cache_from_source (return __pycache__/(name).cpython-39.pyc)
          (if .pyc is found) _compile_bytecode => marshal.loads
          (otherwise)
            source_to_code => compile (with "exec" flag)
            _cache_bytecode
        exec(code, module.__dict__)

byte code compilation

_PyAST_Compile =>
  _PyAST_Optimize
  _PySymtable_Build =>
    (1st pass: traverse AST with `symtable_visit_???` and construct `PySTEntryObject`)
      symtable_enter_block
      symtable_visit_stmt/expr =>
        (e.g. ClassDef)
        symtable_add_def(<class name>, DEF_LOCAL)
        symtable_enter_block
        VISIT_SEQ(st, stmt, <class body>) => ...
        symtable_exit_block
      symtable_exit_block
    (2nd pass: traverse `PySTEntryObject` by `analyze_block` and `analyze_child_block`)
      symtable_analyze =>
        analyze_block
          analyze_name
          analyze_child_block
  compiler_mod =>
    (traverse AST with `compiler_???`)
    compiler_body => compiler_visit_stmt =>
      (e.g. FunctionDef)
      compiler_function => ...
    assemble (returns PyCodeObject) =>
      insert_prefix_instructions (GEN_START opcode for generator like function)
      normalization, optimization, ...?
      makecode =>
        compute_code_flags (e.g. CO_GENERATOR, CO_COROUTINE, ...)
        ...
  • Example: FunctionDef
compiler_function =>
  compiler_default_arguments =>
    compiler_visit_defaults =>
      VISIT_SEQ (compiler_visit_expr for default argument expressions)
      ADDOP_I(... BUILD_TUPLE ...) => compiler_addop_i =>
        compiler_addop_i_line (construct `struct instr` on `compiler ~> compiler_unit ~> basicblock`)
  compiler_enter_scope =>
    (construct `compiler_unit` on `compiler ~> compiler_unit`)
    compiler_new_block
  VISIT_IN_SCOPE (compiler_visit_stmt for body)
  assemble => ...
  compiler_exit_scope
  compiler_make_closure =>
    ADDOP_LOAD_CONST (for PyCodeObject)
    ADDOP_I MAKE_FUNCTION
  compiler_nameop (Store `name`) => LOAD_NAME
  • Example: ClassDef
compiler_visit_stmt => compiler_class =>
  compiler_enter_scope
  ... compiler_body and assemble to PyCodeObject ...
  compiler_exit_scope
  opcode LOAD_BUILD_CLASS
  compiler_make_closure with class body PyCodeObject => ... opcode MAKE_FUNCTION
  compiler_call_helper => ... (see compiler_call below)
  • Example: expr_ty
(symtable)
symtable_visit_expr =>
  (if `yield` or `yield from`)
    set PySTEntryObject.ste_generator
  (if `await`)
    set PySTEntryObject.ste_coroutine

(compiler)
compiler_visit_expr =>
  compiler_visit_expr1 =>
    (if function call)
      compiler_call =>
        visit callee expresson
        compiler_call_helper =>
          visit argument expressions
          ADDOP_I CALL_FUNCTION (with number of arguments)
          (for "star" argument (e.g. *args, **kwargs) CALL_FUNCTION_EX)

    (if await expression)
      visit the operand of `await`
      ADDOP GET_AWAITABLE
      ADDOP_LOAD_CONST Py_None
      ADDOP YIELD_FROM

    (if yield expression)
      visit the operand of `await`
      ADDOP YIELD_VALUE

    (if identifier (aka Name_kind))
      compiler_nameop

byte code execution

  • LOAD_NAME
  • CALL_FUNCTION
  • MAKE_FUNCTION
  • RETURN_VALUE
  • LOAD_BUILD_CLASS
PyEval_EvalCode (given PyCodeObject) =>
  _PyThreadState_GET
  _PyFunction_FromConstructor
  _PyEval_Vector =>
    _PyEvalFramePushAndInit
    _PyEval_EvalFrame =>
      _PyEvalFramePushAndInit (take `InterpreterFrame` from `.datastack_top`)
      _PyEval_EvalFrameDefault => ...
    _PyEvalFrameClearAndPop


_PyEval_EvalFrameDefault
  local variables: opcode, oparg, retval, next_instr, stack_pointer, ...
  initialize `next_instr` by `InterpreterFrame.f_lasti`
  initialize `stack_pointer` by `_PyFrame_GetStackPointer` (stack machine with `PyObject`s on its stack)
  DISPATCH =>
    NEXTOPARG =>
      set `opcode` and `oparg` from next_instr
      INSTRUCTION_START =>
    DISPATCH_GOTO (goto based on `opcode_targets` array e.g. TARGET_CALL_FUNCTION)


TARGET(LOAD_NAME) =>
  INSTRUCTION_START (update `InterpreterFrame.f_lasti` and `next_instr`)
  load `name` from `locals` dict
  PUSH (push found PyObject to stack_pointer)
  DISPATCH => ...


TARGET(CALL_FUNCTION) =>
  PEEK (get callable PyObject from stack at `oparg + 1`)
  PyObject_Vectorcall (with passing `stack_pointer` as arguments) => ...
  PUSH (push result to stack_pointer)
  (if result == NULL) goto error (to propagate exception)


TARGET(RETURN_VALUE) =>
  POP to `retval`
  (if InterpreterFrame.depth > 0 i.e. "internal function call" without growing c stack e.g. via `CALL_FUNCTION_PY_SIMPLE`)
    _PyFrame_StackPush (push `retval` to the previous frame's stacktop)
    _PyEvalFrameClearAndPop
    goto resume_frame
  (otherwise)
  return retval


TARGET(LOAD_BUILD_CLASS)
  push builtin function "__build_class__"
  which will be called with class body closure


TARGET(MAKE_FUNCTION) =>
  POP PyCodeObject from stack
  PyFunction_New => PyFunction_NewWithQualName =>
    construct PyFunctionObject with vectorcall = _PyFunction_Vectorcall
  ...


_PyFunction_Vectorcall =>
  _PyEval_Vector (actually the same one called from `PyEval_EvalCode`) =>
    (if CO_GENERATOR, CO_COROUTINE, etc...) make_coro =>
      make_coro_frame (not sure what's special)
      _Py_MakeCoro => make_gen (with PyGen_Type, PyAsyncGen_Type, or PyCoro_Type) =>
        set `InterpreterFrame.generator`
    (otherwise)
      _PyEvalFramePushAndInit
      _PyEval_EvalFrame => ...
      _PyEvalFrameClearAndPop


TARGET(GET_AWAITABLE) =>
  _PyCoro_GetAwaitableIter =>
    (if courtine) return coroutine
    (otherwise) call `tp_as_async->am_await` (whose result must pass `PyIter_Check`)
  SET_TOP iterator
  ...

TARGET(IMPORT_NAME) => import_name =>
  PyImport_ImportModuleLevelObject (if `builtins.__import__` not changed) =>
    import_get_module (check if already in sys.modules)
    import_find_and_load (delegate to "importlib._bootstrap._find_and_load")
    ...

(occasionally jump to `check_eval_breaker` for `gil_drop_request` etc...)
eval_frame_handle_pending =>
  (if gil_drop_request)
    drop_gil
    take_gil (will block if other threads took gil first)

object/class system

References

Summary

  • user defined class
    • user defined slots (e.g. __init__)
  • construction
    • constructor (tp_init, tp_new via type_call)
  • destruction
    • destructor (tp_dealloc, tp_finalize, tp_free, ...)
  • memory layout
    • PyObject_HEAD
    • PyObject_VAR_HEAD
    • PyGC_Head for GC types
[ Data structure ]

PyObject
  ob_refcnt
  ob_type


PyVarObject
  ...


PyTypeObject
  ...
  tp_basicsize
  tp_itemsize


[ Demo code to debug into ]
class C:
  def __init__(self):
    self.x = 0

x = C()

del x


[ Example: defining class (cf. `compiler_class` and `LOAD_BUILD_CLASS` above)]
builtin___build_class__ =>
  set metaclass from 1. "metaclass" kwarg; 2. first base class; 3. `PyType_Type`
  _PyEval_Vector (call class body closure)
  PyObject_VectorcallDict (call metaclass e.g. PyType_Type) => ... =>
    type_call =>
      type_new (as PyType_Type.tp_new) =>
        type_new_get_bases (if no bases, then use `PyBaseObject_Type`)
        type_new_impl =>
          type_new_init =>
            type_new_alloc =>
              PyType_GenericAlloc (as metatype->tp_alloc (TODO: how was this setup?))
              setup slots e.g.
                tp_flags (Py_TPFLAGS_HAVE_GC)
                tp_dealloc (subtype_dealloc)
          PyType_Ready => type_ready =>
            type_ready_set_new (e.g. copy tp_new from base class (e.g. PyBaseObject_Type))
          fixup_slot_dispatchers =>
            update_one_slot (e.g. for __init__) =>
              find_name_in_mro (find "__init__" method in the class)
              if found, set `tp_init` to `slot_tp_init`
          type_new_init_subclass (something to do with PySuperType?)
      type_init (as PyType_Type.tp_new)


[ Example: instantiating user defined class (here `callable` is user defined class)]
PyObject_Vectorcall(callable) => _PyObject_VectorcallTstate => _PyObject_MakeTpCall =>
  Py_TYPE(callable)->tp_call(callable, ...) (i.e. PyType_Type.tp_call (type_call) for user defined type) =>
    type_call =>
      object_new =>
        PyType_GenericAlloc (as tp_alloc) =>
          _PyType_AllocNoTrack
          _PyObject_GC_TRACK (for GC type)
        _PyObject_InitializeDict => PyDict_New
      slot_tp_init (object_init if not overriden) =>
        _PyObject_Call (for user defined "__init__" function)


[ Example: object destructoin ]
TARGET(DELETE_NAME) => PyObject_DelItem (from local namespace dict) => ... => delitem_common =>
  Py_DECREF =>
    (if refcnt = 0) _Py_Dealloc =>
      subtype_dealloc (for heap types) => ... see "memory management"

memory management

References

[ Allocation ]
PyType_GenericAlloc (via tp_alloc) =>
  _PyType_AllocNoTrack =>
    PyObject_Malloc
    _PyObject_GC_Link =>
      increment `gc_generation.count`
      if it exceeds `gc_generation.threshold`, then run `gc_collect_generations` (see below)
  _PyObject_GC_TRACK =>
    insert PyGC_Head (casted from PyObject) to
    `PyInterpreterState._gc_runtime_state.generation0` linked list


[ Deallocation (after refcnt = 0) ]
subtype_dealloc =>
  PyObject_GC_UnTrack (remove `PyObject` from the linked list)
  _PyObject_FreeInstanceAttributes =>
    `Py_XDECREF` for each value of `_PyObject_ValuesPointer`
  object_dealloc (as basedealloc) =>
    PyObject_GC_Del (as tp_free) =>
      decrement `gc_generation.count`
      PyObject_Free


[ Garbage collection for container types ]
gc_collect_generations => gc_collect_with_callback => gc_collect_main =>
  deduce_unreachable(base, ...) =>
    update_refs
    subtract_refs =>
      run tp_traverse with visit_decref.
      the objects with refcnt = 0 are not reachable from "outside".
      note that those might still be reachable from other `base` which is reachable from "outside".
    move_unreachable => ...
  finalize_garbage
  handle_resurrected_objects
  delete_garbage

threading

spawning thread

[parent thread]
Thread.start (py) => _thread.start_new_thread (with `_bootstrap` callback)=>
  thread_PyThread_start_new_thread (c) =>
    PyMem_NEW (allocate `bootstate`)
    _PyThreadState_Prealloc => new_threadstate (allocate new PyThreadState)
    PyThread_start_new_thread (with `thread_run` callback) => pthread_create


[child thread]
thread_run (c) =>
  setup PyThreadState (e.g. set thread_id)
  _PyThreadState_SetCurrent
  PyEval_AcquireThread =>
    take_gil (see `interpreter startup` above)
    _PyThreadState_Swap
  PyObject_Call (python callable `_bootstrap`) =>
    _bootstrap (py) => _bootstrap_inner =>
      _set_tstate_lock =>
        (c) thread__set_sentinel =>
          newlockobject
          setup PyThreadState.on_delete_data for release_sentinel
      set self to global `_active` dict
      run (run callable given by a user)
      _delete (remove self from global `_active` dict)
  PyThreadState_Clear =>
    PyThreadState.on_delete => release_sentinel (release lock)


[parent thread]
Thread.join (py) => _wait_for_tstate_lock =>
  self._tstate_lock.acquire =>
    (c) lock_PyThread_acquire_lock => acquire_timed =>
      Py_BEGIN_ALLOW_THREADS ~> PyEval_SaveThread =>
        _PyThreadState_Swap (set global `tstate_current` to NULL)
        drop_gil =>
          set locked = 0
          notify waiting thread on `take_gil` via condition variable
      PyThread_acquire_lock_timed => sem_clockwait (this should block until `release_sentinel` in child thread above)
      Py_END_ALLOW_THREADS ~> PyEval_RestoreThread =>
        take_gil => ...
        _PyThreadState_Swap (restore `tstate_current` as before)

generator

References

[initialization]
_Py_MakeCoro => ...


[next (i.e. send(None))]
gen_iternext (PyGen_Type.tp_iternext) =>
  gen_send_ex2 =>
    _PyFrame_StackPush (push "sent" value to stacktop)
    _PyEval_EvalFrame (will resume from `InterpreterFrame.f_lasti` from the last "yield")
    (if result but not _PyFrameHasCompleted)
      return PYGEN_NEXT
    (otherwise)
      clear frame and return PYGEN_RETURN
  (if PYGEN_RETURN)
    _PyGen_SetStopIterationValue
    Py_CLEAR(result) (i.e. `retval` is NULL)


[yield]
TARGET(YIELD_VALUE) =>
  retval = POP()
  set InterpreterFrame.f_state to FRAME_SUSPENDED
  save current stack_pointer as InterpreterFrame.stacktop
  goto exiting


[yield from]
TARGET(YIELD_FROM) =>
  (if tp_iternext defined)
    type.tp_iternext => ...
  (otherwise)
    _PyObject_CallMethodIdOneArg(... "send" ...)
  (if PYGEN_RETURN (i.e. `retval == NULL` and _PyGen_FetchStopIterationValue))
    SET_TOP(retval)
    DISPATCH => ...
  (otherwise PYGEN_NEXT)
    decrement InterpreterFrame.f_lasti to run the same byte code again
    goto exiting;

asyncio coroutine

References

Examples

Goals

  • GET_AWAITABLE opcode
  • "async def" return value
  • coroutine to task/future
  • leaf of "await" chain (aka Future)
  • asyncio.futures.wrap_future

simplified picture

event loop
  run initial task (e.g. via task.ensure_future(coroutine)) =>
    e.g.
      - schedule more tasks via `loop.call_soon`
      - yield from ... (down the road, it will reach `await <future>` whose callback will resume this generator)
      - return value (internally via StopIteration)
  run next scheduled task from initial task => ...
  ...

python

  • main event loop
[top level]
asyncio.run =>
  events.new_event_loop =>
    get_event_loop_policy => _init_event_loop_policy =>
      import DefaultEventLoopPolicy (i.e _UnixDefaultEventLoopPolicy)
    _UnixDefaultEventLoopPolicy.new_event_loop =>
      _UnixSelectorEventLoop.__init__ => BaseSelectorEventLoop.__init__ => selectors.DefaultSelector
  _UnixSelectorEventLoop.run_until_complete => BaseEventLoop.run_until_complete =>
    tasks.ensure_future =>
      (if coroutine) BaseEventLoop.create_task => tasks.Task =>
        BaseEventLoop.call_soon(Task.__step) => _call_soon =>
          _ready.append(events.Handle(...))
    BaseEventLoop.run_forever =>
      events._set_running_loop
      (while True) _run_once =>
        (...compute select timeout based on _scheduled...)
        DefaultSelector.select(timeout)
        BaseSelectorEventLoop._process_events => BaseEventLoop._add_callback => _ready.append
        (...move _scheduled to _ready if timeout exceeded)
        (iterate over _ready) Handle._run => Context.run => ???
  • Task (coroutine as Future)
# python implementation
[Task.__step]
coro.send (i.e. progressing generator frame)
(if StopIteration exception)
  Future.set_result
(if yielded result with "_asyncio_future_blocking" (i.e. met with `Future.__await__` at the end of await chain))
  _asyncio_future_blocking = False
  Future.add_done_callback(Task.__wakeup ...)


[Task.__wakeup]
(if future's result is ready) self.__step => ...
  • Future (a leaf of await chain)
[Future.__await__]
self._asyncio_future_blocking = True
yield self  (see above for how `Task.__step` handles yielded result)
return self.result()  (this can return "resolved" result since this generator resumes via `Task.__wakeup` called from this future's `done_callback`)

cpython

[initialize coroutine]
make_coro => ... make_gen(&PyCoro_Type, ...)


[yield from coroutine (aka await) cf. `TARGET(YIELD_FROM)` above]
"send" => gen_send => ...

threading

[Main thread]
BaseEventLoop.run_in_executor (returns asyncio.Future)=>
  concurrent.futures.ThreadPoolExecutor.submit (returns concurrent.futures.Future)
  wrap_future =>
    BaseEventLoop.create_future (i.e. asyncio.Future)
    _chain_future (concurrent.futures.Future to asyncio.Future) =>
      concurrent.futures.Future.add_done_callback(_call_set_state)
        (callback will be executed in ThreadPoolExecutor's worker)


[Worker thread of ThreadPoolExecutor]
_call_set_state => BaseEventLoop.call_soon_threadsafe(_set_state) (mostly equivalent to call_soon)


[Main thread]
_set_state => ... => asyncio.Future.set_result(concurrent.futures.Future.result())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment