-
Objective: teach CS variables, references, and aliases
-
Objective: identify differences between symbolic/mathematical variables and CS variables
-
Objective: teach without using partially-congruent/isomorphic metaphors that may later be confusing or limiting
-
Audience:
-
Concept: https://en.wikipedia.org/wiki/Variable_(computer_science)
-
Concept: https://en.wikipedia.org/wiki/Variable_(mathematics)
-
Concept: https://en.wikipedia.org/wiki/Variable (disambiguation)
-
Concept: https://en.wikipedia.org/wiki/Reference_(computer_science)
Resources:
-
https://www.google.com/search?q=variables+references+in+python
- https://www.safaribooksonline.com/library/view/python-in-a/0596001886/ch04s03.html
A Python program accesses data values through references. A reference is a name that refers to the specific location in memory of a value (object). References take the form of variables, attributes, and items. In Python, a variable or other reference has no intrinsic type.
- https://www.safaribooksonline.com/library/view/python-in-a/0596001886/ch04s03.html
-
https://www.google.com/search?q=variables+references+in+python+site%3Adocs.python.org
-
https://docs.python.org/3/glossary.html#term-reference-count
-
https://docs.python.org/3/glossary.html#term-garbage-collection
In Python, when you declare a variable A
, there is one reference to that allocated section of memory: its reference count is then 1. (When you call sys.getrefcount(A)
, sys.getrefcount
is passed a reference to A, so it returns 2. We'll ignore that one-off for purposes of explanation).
If the refcount is 0 when the garbage collector runs, the memory will be freed.
In Python, variable declaration and initialization are done in the same step. This both delcares the variable A
and initializes it to a list
containing the one character str
'A
':
A = ['A'] # refcount == 1
Lists are mutable in Python. Mutating the list does not change the refcount:
A = ['A'] # refcount == 1
A.append('B') # refcount == 1
Referencing A
in another list
increments the refcount:
B = [A] # refcount(A) == 2
Deleting a variable decrements the refcount by one and removes the variable binding from the scope:
del A # refcount == 0
A = ['A']
B = [A]
assert refcount(A) == 2
assert refcount(B[0]) == 2
del A
assert B == [['A']]
assert refcount(B[0]) == 1
B.append(B)
assert refcount(B) == 2
(['A'], 2)
(['A'], 2)
(['A'], 1)
([['A'], [...]], 2)
from sys import getrefcount
def refcount(obj, msg=None):
n = sys.getrefcount(obj) - 3
print((obj, n) if msg is None else (obj, n, msg))
return n
A = ['A']
assert refcount(A) == 1
B = ['B']
assert refcount(B) == 1
B = A # refcount(['B']) == 0
assert refcount(A) == refcount(B) == 2
C = A.copy() + A[:] + ['C']
assert refcount(A) == 2
assert C == ['A', 'A', 'C']
D = None
assert refcount(D, '!') > 0
# None = 1
def func():
assert A == B == ['A']
a = A
assert refcount(a) == 3
# global A # SyntaxError: name 'A' is used prior to global declaration
# A = a # (local a).refcount = 1, (global a).refcount = 2
#assert A == 3
assert refcount(C) == 1
c = C
assert refcount(C) == refcount(c) == 2
c.append('here')
assert c == ['A','A','C', 'here']
assert refcount(C) == 2
E = ['E']
assert refcount(E) == 1
D = [E]
assert refcount(E) == 2
assert refcount(D) == 1
E.append(A)
assert E == ['E', ['A']]
assert refcount(A) == 4
return E
e = func()
assert refcount(e) == 1
assert refcount(A) == 3
(['A'], 1)
(['B'], 1)
(['A'], 2)
(['A'], 2)
(['A'], 2)
(None, 28781, '!')
(['A'], 3)
(['A', 'A', 'C'], 1)
(['A', 'A', 'C'], 2)
(['A', 'A', 'C'], 2)
(['A', 'A', 'C', 'here'], 2)
(['E'], 1)
(['E'], 2)
([['E']], 1)
(['A'], 4)
(['E', ['A']], 1)
(['A'], 3)
Memory allocation and garbage collection are orthogonal concepts to variable declaration and initialization. Variable scope is a tangential concept.
In C, there is no garbage collector: you must free
declared variables in order to release the memory. In C++, variables are defined in a constructor method (like object.__init__()
in Python) and freed in a destructor method (like object.__del__()
in Python). In Java, there's a garbage collector, too.
We usually don't del(variable)
in Python because the garbage collector will free that memory anyway whenever it happens to run and the refcount is zero because the variable has fallen out of scope.
In practice, we name global variables in ALL_CAPS
(and may expect them to be constants). We wrap 'private' variable names with dunder (__variable__
) so that other code can't modify those object attributes (due to 'name mangling'). Sometimes, we name variables with a single _underscore
in order to avoid a 'variable name collision' with outer scopes (or to indicate, by convention, that a variable is a local variable)
In practice, we try to avoid using globals because when or if we try to add threads (or port to C/C++), we're never quite sure whether one thread has modified that global; that's called a race condition. Some languages -- particularly functional languages like Haskell and Erlang -- only have mostly all immutable variables; which avoids race conditions (and the necessary variable locking that's slowing down Python GIL removal efforts).
Is it a box or a bucket? It's a smart pointer to an allocated section of RAM.
When do we get a new box and throw an old one away? Is there a name for the box and the thing in the bucket? Does the bucket change size when?
I think the box/bucket metaphor is confusing and limiting; but I've been doing this for a long time: it's a leaky abstraction.
- https://en.wikipedia.org/wiki/Memory_leak
- https://en.wikipedia.org/wiki/Race_condition
- https://en.wikipedia.org/wiki/Smart_pointer
Commands to build this environment:
conda create -n notebooks python==3.6 notebook pip
source activate notebooks
cd notebooks; mkdir -p src/notebooks; cd src/notebooks
jupyter-notebook &
jupyter-nbconvert --to python ./010-variables.ipng