Skip to content

Instantly share code, notes, and snippets.

@westurner
Last active June 4, 2018 00:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save westurner/6f165149df59d697b997d305e9743dee to your computer and use it in GitHub Desktop.
Save westurner/6f165149df59d697b997d305e9743dee to your computer and use it in GitHub Desktop.
Python variables, references, aliases, garbage collection, scope
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Python variables, references, aliases, garbage collection, scope

Resources:

In Python, when you declare a variable A, there is one reference to that allocated section of memory: its reference count is then 1. (When you call sys.getrefcount(A), sys.getrefcount is passed a reference to A, so it returns 2. We'll ignore that one-off for purposes of explanation).

If the refcount is 0 when the garbage collector runs, the memory will be freed.

In Python, variable declaration and initialization are done in the same step. This both delcares the variable A and initializes it to a list containing the one character str 'A':

A = ['A']  # refcount == 1

Lists are mutable in Python. Mutating the list does not change the refcount:

A = ['A']  # refcount == 1
A.append('B')  # refcount == 1

Referencing A in another list increments the refcount:

B = [A]  # refcount(A) == 2

Deleting a variable decrements the refcount by one and removes the variable binding from the scope:

del A  # refcount == 0
A = ['A']
B = [A]
assert refcount(A) == 2
assert refcount(B[0]) == 2
del A
assert B == [['A']]
assert refcount(B[0]) == 1
B.append(B)
assert refcount(B) == 2
(['A'], 2)
(['A'], 2)
(['A'], 1)
([['A'], [...]], 2)
from sys import getrefcount

def refcount(obj, msg=None):
    n = sys.getrefcount(obj) - 3
    print((obj, n) if msg is None else (obj, n, msg))
    return n

A = ['A']
assert refcount(A) == 1
B = ['B']
assert refcount(B) == 1
B = A  # refcount(['B']) == 0
assert refcount(A) == refcount(B) == 2

C = A.copy() + A[:] + ['C']
assert refcount(A) == 2
assert C == ['A', 'A', 'C']

D = None
assert refcount(D, '!') > 0
# None = 1

def func():
    assert A == B == ['A']
    a = A
    assert refcount(a) == 3
    
    # global A # SyntaxError: name 'A' is used prior to global declaration
    # A = a # (local a).refcount = 1, (global a).refcount = 2
    #assert A == 3

    assert refcount(C) == 1
    c = C
    assert refcount(C) == refcount(c) == 2
    c.append('here')
    assert c == ['A','A','C', 'here']
    assert refcount(C) == 2

    E = ['E']
    assert refcount(E) == 1
    D = [E]
    assert refcount(E) == 2
    assert refcount(D) == 1
    
    E.append(A)
    assert E == ['E', ['A']]
    assert refcount(A) == 4
    return E

e = func()
assert refcount(e) == 1
assert refcount(A) == 3
(['A'], 1)
(['B'], 1)
(['A'], 2)
(['A'], 2)
(['A'], 2)
(None, 28781, '!')
(['A'], 3)
(['A', 'A', 'C'], 1)
(['A', 'A', 'C'], 2)
(['A', 'A', 'C'], 2)
(['A', 'A', 'C', 'here'], 2)
(['E'], 1)
(['E'], 2)
([['E']], 1)
(['A'], 4)
(['E', ['A']], 1)
(['A'], 3)

Memory allocation and garbage collection are orthogonal concepts to variable declaration and initialization. Variable scope is a tangential concept.

In C, there is no garbage collector: you must free declared variables in order to release the memory. In C++, variables are defined in a constructor method (like object.__init__() in Python) and freed in a destructor method (like object.__del__() in Python). In Java, there's a garbage collector, too.

We usually don't del(variable) in Python because the garbage collector will free that memory anyway whenever it happens to run and the refcount is zero because the variable has fallen out of scope.

In practice, we name global variables in ALL_CAPS (and may expect them to be constants). We wrap 'private' variable names with dunder (__variable__) so that other code can't modify those object attributes (due to 'name mangling'). Sometimes, we name variables with a single _underscore in order to avoid a 'variable name collision' with outer scopes (or to indicate, by convention, that a variable is a local variable)

In practice, we try to avoid using globals because when or if we try to add threads (or port to C/C++), we're never quite sure whether one thread has modified that global; that's called a race condition. Some languages -- particularly functional languages like Haskell and Erlang -- only have mostly all immutable variables; which avoids race conditions (and the necessary variable locking that's slowing down Python GIL removal efforts).

Is it a box or a bucket? It's a smart pointer to an allocated section of RAM.

When do we get a new box and throw an old one away? Is there a name for the box and the thing in the bucket? Does the bucket change size when?

I think the box/bucket metaphor is confusing and limiting; but I've been doing this for a long time: it's a leaky abstraction.

Commands to build this environment:

conda create -n notebooks python==3.6 notebook pip
source activate notebooks
cd notebooks; mkdir -p src/notebooks; cd src/notebooks
jupyter-notebook &
jupyter-nbconvert --to python ./010-variables.ipng
# coding: utf-8
# # Python variables, references, aliases, garbage collection, scope
# - Objective: teach CS variables, references, and aliases
# - Objective: identify differences between symbolic/mathematical variables and CS variables
# - Objective: teach without using partially-congruent/isomorphic metaphors that may later be confusing or limiting
#
# - Audience:
#
# - Concept: https://en.wikipedia.org/wiki/Variable_(computer_science)
# - Concept: https://en.wikipedia.org/wiki/Variable_(mathematics)
# - Concept: https://en.wikipedia.org/wiki/Variable (disambiguation)
# - Concept: https://en.wikipedia.org/wiki/Reference_(computer_science)
#
# Resources:
#
# - https://www.google.com/search?q=variables+references+in+python
# - https://www.safaribooksonline.com/library/view/python-in-a/0596001886/ch04s03.html
# > A Python program accesses data values through references. A reference is a name that refers to the specific location in memory of a value (object). References take the form of **variables, attributes, and items**. In Python, a variable or other reference has no intrinsic type.
# - https://www.google.com/search?q=variables+references+in+python+site%3Adocs.python.org
#
# - https://docs.python.org/3/glossary.html#term-reference-count
# - https://docs.python.org/3/glossary.html#term-garbage-collection
#
# In Python, when you declare a variable ``A``, there is one reference to that allocated section of memory: its reference count is then 1. (When you call ``sys.getrefcount(A)``, ``sys.getrefcount`` is passed a reference to A, so it returns *2*. We'll ignore that one-off for purposes of explanation).
#
# If the refcount is 0 when the garbage collector runs, the memory will be freed.
#
# In Python, variable declaration and initialization are done in the same step. This both delcares the variable ``A`` and initializes it to a ``list`` containing the one character ``str`` '``A``':
# ```python
# A = ['A'] # refcount == 1
# ```
#
# Lists are **mutable** in Python. Mutating the list does not change the refcount:
# ```python
# A = ['A'] # refcount == 1
# A.append('B') # refcount == 1
# ```
#
# Referencing ``A`` in another ``list`` increments the refcount:
# ```python
# B = [A] # refcount(A) == 2
# ```
#
# Deleting a variable decrements the refcount by one and removes the variable binding from the scope:
# ```python
# del A # refcount == 0
# ```
#
# In[63]:
A = ['A']
B = [A]
assert refcount(A) == 2
assert refcount(B[0]) == 2
del A
assert B == [['A']]
assert refcount(B[0]) == 1
B.append(B)
assert refcount(B) == 2
# In[53]:
from sys import getrefcount
def refcount(obj, msg=None):
n = sys.getrefcount(obj) - 3
print((obj, n) if msg is None else (obj, n, msg))
return n
A = ['A']
assert refcount(A) == 1
B = ['B']
assert refcount(B) == 1
B = A # refcount(['B']) == 0
assert refcount(A) == refcount(B) == 2
C = A.copy() + A[:] + ['C']
assert refcount(A) == 2
assert C == ['A', 'A', 'C']
D = None
assert refcount(D, '!') > 0
# None = 1
def func():
assert A == B == ['A']
a = A
assert refcount(a) == 3
# global A # SyntaxError: name 'A' is used prior to global declaration
# A = a # (local a).refcount = 1, (global a).refcount = 2
#assert A == 3
assert refcount(C) == 1
c = C
assert refcount(C) == refcount(c) == 2
c.append('here')
assert c == ['A','A','C', 'here']
assert refcount(C) == 2
E = ['E']
assert refcount(E) == 1
D = [E]
assert refcount(E) == 2
assert refcount(D) == 1
E.append(A)
assert E == ['E', ['A']]
assert refcount(A) == 4
return E
e = func()
assert refcount(e) == 1
assert refcount(A) == 3
# Memory allocation and garbage collection are orthogonal concepts to variable declaration and initialization.
# Variable scope is a tangential concept.
#
# In C, there is no garbage collector: you must ``free`` declared variables in order to release the memory. In C++, variables are defined in a constructor method (like ``object.__init__()`` in Python) and freed in a destructor method (like ``object.__del__()`` in Python). In Java, there's a garbage collector, too.
#
# We usually don't ``del(variable)`` in Python because the garbage collector will free that memory anyway whenever it happens to run and the refcount is zero because the variable has fallen out of scope.
#
# In practice, we name global variables in ``ALL_CAPS`` (and may expect them to be constants). We wrap 'private' variable names with dunder (``__variable__``) so that other code can't modify those object attributes (due to 'name mangling'). Sometimes, we name variables with a single ``_underscore`` in order to avoid a 'variable name collision' with outer scopes (or to indicate, by convention, that a variable is a local variable)
#
# In practice, we try to avoid using globals because when or if we try to add threads (or port to C/C++), we're never quite sure whether one thread has modified that global; that's called a *race condition*. Some languages -- particularly functional languages like Haskell and Erlang -- only have mostly all immutable variables; which avoids race conditions (and the necessary variable locking that's slowing down Python GIL removal efforts).
#
# Is it a box or a bucket?
# It's a smart pointer to an allocated section of RAM.
#
# When do we get a new box and throw an old one away? Is there a name for the box and the thing in the bucket? Does the bucket change size when?
#
# I think the box/bucket metaphor is confusing and limiting; but I've been doing this for a long time: it's a leaky abstraction.
#
# - https://en.wikipedia.org/wiki/Memory_leak
# - https://en.wikipedia.org/wiki/Race_condition
# - https://en.wikipedia.org/wiki/Smart_pointer
# Commands to build this environment:
# ```bash
# conda create -n notebooks python==3.6 notebook pip
# source activate notebooks
# cd notebooks; mkdir -p src/notebooks; cd src/notebooks
# jupyter-notebook &
# jupyter-nbconvert --to python ./010-variables.ipng
# ```

Python variables, references, aliases, garbage collection, scope

Resources:

In Python, when you declare a variable A, there is one reference to that allocated section of memory: its reference count is then 1. (When you call sys.getrefcount(A), sys.getrefcount is passed a reference to A, so it returns 2. We’ll ignore that one-off for purposes of explanation).

If the refcount is 0 when the garbage collector runs, the memory will be freed.

In Python, variable declaration and initialization are done in the same step. This both delcares the variable A and initializes it to a list containing the one character strA’:

A = ['A']  # refcount == 1

Lists are mutable in Python. Mutating the list does not change the refcount:

A = ['A']  # refcount == 1
A.append('B')  # refcount == 1

Referencing A in another list increments the refcount:

B = [A]  # refcount(A) == 2

Deleting a variable decrements the refcount by one and removes the variable binding from the scope:

del A  # refcount == 0
A = ['A']
B = [A]
assert refcount(A) == 2
assert refcount(B[0]) == 2
del A
assert B == [['A']]
assert refcount(B[0]) == 1
B.append(B)
assert refcount(B) == 2

(['A'], 2) (['A'], 2) (['A'], 1) ([['A'], [...]], 2)

from sys import getrefcount

def refcount(obj, msg=None):
    n = sys.getrefcount(obj) - 3
    print((obj, n) if msg is None else (obj, n, msg))
    return n

A = ['A']
assert refcount(A) == 1
B = ['B']
assert refcount(B) == 1
B = A  # refcount(['B']) == 0
assert refcount(A) == refcount(B) == 2

C = A.copy() + A[:] + ['C']
assert refcount(A) == 2
assert C == ['A', 'A', 'C']

D = None
assert refcount(D, '!') > 0
# None = 1

def func():
    assert A == B == ['A']
    a = A
    assert refcount(a) == 3

    # global A # SyntaxError: name 'A' is used prior to global declaration
    # A = a # (local a).refcount = 1, (global a).refcount = 2
    #assert A == 3

    assert refcount(C) == 1
    c = C
    assert refcount(C) == refcount(c) == 2
    c.append('here')
    assert c == ['A','A','C', 'here']
    assert refcount(C) == 2

    E = ['E']
    assert refcount(E) == 1
    D = [E]
    assert refcount(E) == 2
    assert refcount(D) == 1

    E.append(A)
    assert E == ['E', ['A']]
    assert refcount(A) == 4
    return E

e = func()
assert refcount(e) == 1
assert refcount(A) == 3

(['A'], 1) (['B'], 1) (['A'], 2) (['A'], 2) (['A'], 2) (None, 28781, '!') (['A'], 3) (['A', 'A', 'C'], 1) (['A', 'A', 'C'], 2) (['A', 'A', 'C'], 2) (['A', 'A', 'C', 'here'], 2) (['E'], 1) (['E'], 2) ([['E']], 1) (['A'], 4) (['E', ['A']], 1) (['A'], 3)

Memory allocation and garbage collection are orthogonal concepts to variable declaration and initialization. Variable scope is a tangential concept.

In C, there is no garbage collector: you must free declared variables in order to release the memory. In C++, variables are defined in a constructor method (like object.__init__() in Python) and freed in a destructor method (like object.__del__() in Python). In Java, there’s a garbage collector, too.

We usually don’t del(variable) in Python because the garbage collector will free that memory anyway whenever it happens to run and the refcount is zero because the variable has fallen out of scope.

In practice, we name global variables in ALL_CAPS (and may expect them to be constants). We wrap ‘private’ variable names with dunder (__variable__) so that other code can’t modify those object attributes (due to ‘name mangling’). Sometimes, we name variables with a single _underscore in order to avoid a ‘variable name collision’ with outer scopes (or to indicate, by convention, that a variable is a local variable)

In practice, we try to avoid using globals because when or if we try to add threads (or port to C/C++), we’re never quite sure whether one thread has modified that global; that’s called a race condition. Some languages – particularly functional languages like Haskell and Erlang – only have mostly all immutable variables; which avoids race conditions (and the necessary variable locking that’s slowing down Python GIL removal efforts).

Is it a box or a bucket? It’s a smart pointer to an allocated section of RAM.

When do we get a new box and throw an old one away? Is there a name for the box and the thing in the bucket? Does the bucket change size when?

I think the box/bucket metaphor is confusing and limiting; but I’ve been doing this for a long time: it’s a leaky abstraction.

Commands to build this environment:

conda create -n notebooks python==3.6 notebook pip
source activate notebooks
cd notebooks; mkdir -p src/notebooks; cd src/notebooks
jupyter-notebook &
jupyter-nbconvert --to python ./010-variables.ipng
default: convert-all
convert-all:
jupyter-nbconvert --to python 010-variables.ipynb
# jupyter-nbconvert --to html 010-variables.ipynb
jupyter-nbconvert --to rst 010-variables.ipynb
jupyter-nbconvert --to markdown 010-variables.ipynb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment