Skip to content

Instantly share code, notes, and snippets.

@tonybaloney
Created September 26, 2024 07:31
Show Gist options
  • Save tonybaloney/61ee92e172e56289d460bf9e0b2138c0 to your computer and use it in GitHub Desktop.
Save tonybaloney/61ee92e172e56289d460bf9e0b2138c0 to your computer and use it in GitHub Desktop.

PyCon JP 2024 Talk Notes - Unlocking the Parallel Universe: Sub Interpreters and Free-Threading in Python 3.13

Prerequisites

  1. PyCon 2023 – Eric Snow talk on sub interpreters
  2. EuroPython 2022 – Sam Gross talk on free-threading
  3. PyCon 2024 - “Sync vs Async in Python”
  4. PyCon 2024 - Building a JIT compiler for Cpython
  5. PyCon 2024 – Overcoming GIL with sub interpreters and immutability
  6. “Parallelism and Concurrency” chapter from CPython Internals)
  7. My Masters Thesis doi.org/10.25949/23974764.v1

Section 1 - Parallel Execution in Python

Parallel Execution

Model Execution Start-up time Data Exchange Best for…
threads Parallel * small Any Small, IO-bound tasks that don’t require multiple CPU cores
coroutines Concurrent smallest Any Small, IO-bound tasks that don’t require multiple CPU cores
multiprocessing Parallel large Serialization Larger, CPU or IO-bound tasks that require multiple CPU cores
Sub Interpreters Parallel medium** Serialization or Shared Memory Larger, CPU or IO-bound tasks that require multiple CPU cores

Threading Benchmark

Crude sample:

import numpy

# Create a random array of 100,000 integers
a = numpy.random.randint(0, 100, 100_000)
for x in a:
  abs(x - 50)

Benchmark code to get the "2x slower" figure:

import numpy
import threading
# Create a random array of 100,000 integers between 0 and 100
a = numpy.random.randint(0, 100, 100_000)
def simple_abs_range(vec):
  for x in vec:
    abs(x - 50)
def f_linear():
  # Calculate the distance for each value to 50
  simple_abs_range(a)
def f_threaded():
  threads = []
  # Split array into blocks of 100 and start a thread for each
  for ar in numpy.split(a, 100):
    t = threading.Thread(target=simple_abs_range, args=(ar,))
    t.start()
    threads.append(t)
  for t in threads:
    t.join()

Sub Interpreter vs Thread vs multiprocessing benchmark

The Jupyter Notebook for this sample is here.

Demo

The demo code is here

Terms and Conditions

  1. Specializations are not enabled in free threading (yet)
  2. Some benchmarks are slower with free threading
  3. C Extensions need to support multi-phase-init to be supported with sub interpreters 1
  4. Most of your 3rd party C extensions aren’t supported yet 1
  5. Most PyPi C extensions are not thread safe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment