moritz/variable.md Secret

## variable.md

      
    Raw
  

              variable.md
            
          
    What's a Variable?

When you learn programming, you typically first learn about basic expressions,
like 2 * 21, and then the next topic is control structures or variables. (If
you start with functional programming, maybe it takes you a bit longer to get
to variables).
So, every programmer knows what a variable is, right?
Turns out, it might not be that easy.
Some people like to say that in ruby, everything is an object. Well, a
variable isn't really an object. The same holds true for other languages.
But let's start from the bottom up. In a low-level programming language like
C, a local variable is a name that the compiler knows, with a type attached.
When the compiler generates code for the function that the variable is in, the
name resolves to an address on the stack. Unless the compiler optimizes the
variable away entirely, or manages it through a CPU register.
So in C, the variable only exists as such while the compiler is running. When
the compiler is finished, and the resulting executable runs, there might be
some stack offset or memory location that corresponds to our understanding of
the variable. (And there might be debugging symbols that allows some mapping
back to the variable name, but that's really a special case).
In case of recursion, a local variable can exist once for each time the
function is called.
Closures

In programming languages with closures, local variables can be referenced from
inner functions. They can't generally live on the stack, because the reference
keeps them alive. Consider this piece of Perl 6 code (though we could write
the same in Javascript, Ruby, Perl 5, Python or most other dynamic languages):
sub outer() {
    my $x = 42;
    return sub inner() {
        say $x;
    }
}

my &callback = outer();
callback();

The outer function has a local (lexical) variable $x, and the inner
function uses it. So once outer has finished running, there's still an
indirect reference to the value stored in this variable.
They say you can solve any problem in computer science through another layer
of indirection, and that's true for the implementation of closures. The
&callback variable, which points to a closure, actually stores two pointers
under the hood. One goes to the static byte code representation of the code,
and the second goes to a run-time data structure called a lexical pad, or
short lexpad. Each time you invoke the outer function, a new instance of the
lexpad is created, and the closure points to the new instance, and the always
the same static code.
But even in dynamic languages with closures, variables themselves don't need
to be objects. If a language forbids the creation of variables at run time,
the compiler knows what variables exist in each scope, and can for example map
each of them to an array index, so the lexpad becomes a compact array, and an
access to a variable becomes an indexing operation into that array. Lexpads
generally live on the heap, and are garbage collected (or reference counted)
just like other objects.
Lexpads are mostly performance optimizations. You could have separate runtime
representations of each variable, but then you'd have to have an allocation for
each variable in each function call you perform, whereas which are generally
much slower than a single allocation of the lexpad.
The Plot Thickens

To summarize, a variable has a name, a scope, and in languages that support
it, a type. Those are properties known to the compiler, but not necessarily
present at run time. At run time, a variable typically resolves to a stack
offset in low-level languages, or to an index into a lexpad in dynamic
languages.
Even in languages that boldly claim that "everything is an object", a variable
often isn't. The value inside a variable may be, but the variable itself
typically not.
Perl 6 Intricacies

The things I've written above generalize pretty neatly to many programming
languages. I am a Perl 6 developer, so I have some insight into how
Perl 6 implements variables. If you don't resist, I'll it with you :-).
Variables in Perl 6 typically come with one more level of indirection, we
which call a container. This allows two types of write operations:
assignment stores a value inside a container (which again might be
referenced to by a variable), and binding, which places either a value or a
container directly into variable.
Here's an example of assignment and binding in action:
my $x;
my $y;
# assignment:
$x = 42;
$y = 'a string';

say $x;     # => 42
say $y;     # => a string

# binding:
$x := $y;

# now $x and $y point to the same container, so that assigning to one
# changes the other:
$y = 21;
say $x;     # => 21

Why, I hear you cry?
There are three major reasons.
The first is that makes assignment something
that's not special. For example in python, if you assign to anything other
than a plain variable, the compiler translates it to some special method call
(obj.attr = x to setattr(obj, 'attr', x), obj[idx] = x to a
__setitem__ call etc.). In Perl 6, if you want to implement something you
can assign to, you simply return a container from that expression, and then
assignment works naturally.
For example an array is basically just a list in which the elements are
containers. This makes @array[$index] = $value work without any special
cases, and allows you to assign to the return value of methods, functions, or
anything else you can think of, as long as the expression returns a container.
The second reason for having both binding and assignment is that it makes it
pretty easy to make things read-only. If you bind a non-container into a
variable, you can't assign to it anymore:
my $a := 42;
$a = "hordor";  # => Cannot assign to an immutable value

Perl 6 uses this mechanism to make function parameters read-only by default.
Likewise, returning from a function or method by default strips the container,
which avoids accidental action-at-a-distance (though an is rw annotation can
prevent that, if you really want it).
This automatic stripping of containers also makes expressions like $a + 2 work,
independently of whether $a holds an integer directly, or a container that
holds an integer. (In the implementation of Perl 6's core types, sometimes
this has to be done manually. If you ever wondered what nqp::decont does in
Rakudo's source code, that's what).
The third reason relates to types.
Perl 6 supports gradual typing, which means you can optionally annotate your
variables (and other things) with types, and Perl 6 enforces them for you. It
detects type errors at compile time where possible, and falls back to run-time
checking types.
The type of a variable only applies to binding, but it inherits this type to
its default container. And the container type is enforced at run time. You can
observe this difference by binding a container with a different constraint:
my Any $x;
my Int $i;
$x := $i;
$x = "foo";     # => Type check failed in assignment to $i; expected Int but got Str ("foo")

Int is a subtype of Any, which is why the binding of $i to $x
succeeds. Now $x and $i share a container that is type-constrained to
Int, so assigning a string to it fails.
Did you notice how the error message mentions $i as the variable name, even though
we've tried to assign to $x? The variable name in the error message is really
a heuristic, which works often enough, but sometimes fails. The container
that's shared between $x and $i has no idea which variable you used to
access it, it just knows the name of the variable that created it, here $i.
Binding checks the variable type, not the container type, so this code doesn't
complain:
my Any $x;
my Int $i;
$x := $i;
$x := "a string";

This distinction between variable type and container type might seem weird for
scalar variables, but it really starts to make sense for arrays, hashes and
other compound data structures that might want to enforce a type constraint on
its elements:
sub f($x) {
    $x[0] = 7;
}
my Str @s;
f(@s);

This code declares an array whose element all must be of type Str (or
subtypes thereof). When you pass it to a function, that function has no
compile-time knowledge of the type. But since $x[0] returns a container with
type constraint Str, assigning an integer to it can produce the error you
expect from it.
Summary

Variables typically only exists as objects at compile time. At run time, they
are just some memory location, either on the stack or in a lexical pad.
Perl 6 makes the understanding of the exact nature of variables a bit more
involved by introducing a layer of containers between variables and values.
This offers great flexibility when writing libraries that behaves like
built-in classes, but comes with the burden of additional complexity.