When you learn programming, you typically first learn about basic expressions,
like 2 * 21
, and then the next topic is control structures or variables. (If
you start with functional programming, maybe it takes you a bit longer to get
to variables).
So, every programmer knows what a variable is, right?
Turns out, it might not be that easy.
Some people like to say that in ruby, everything is an object. Well, a variable isn't really an object. The same holds true for other languages.
But let's start from the bottom up. In a low-level programming language like C, a local variable is a name that the compiler knows, with a type attached. When the compiler generates code for the function that the variable is in, the name resolves to an address on the stack. Unless the compiler optimizes the variable away entirely, or manages it through a CPU register.
So in C, the variable only exists as such while the compiler is running. When the compiler is finished, and the resulting executable runs, there might be some stack offset or memory location that corresponds to our understanding of the variable. (And there might be debugging symbols that allows some mapping back to the variable name, but that's really a special case).
In case of recursion, a local variable can exist once for each time the function is called.
In programming languages with closures, local variables can be referenced from inner functions. They can't generally live on the stack, because the reference keeps them alive. Consider this piece of Perl 6 code (though we could write the same in Javascript, Ruby, Perl 5, Python or most other dynamic languages):
sub outer() {
my $x = 42;
return sub inner() {
say $x;
}
}
my &callback = outer();
callback();
The outer
function has a local (lexical) variable $x
, and the inner
function uses it. So once outer
has finished running, there's still an
indirect reference to the value stored in this variable.
They say you can solve any problem in computer science through another layer
of indirection, and that's true for the implementation of closures. The
&callback
variable, which points to a closure, actually stores two pointers
under the hood. One goes to the static byte code representation of the code,
and the second goes to a run-time data structure called a lexical pad, or
short lexpad. Each time you invoke the outer function, a new instance of the
lexpad is created, and the closure points to the new instance, and the always
the same static code.
But even in dynamic languages with closures, variables themselves don't need to be objects. If a language forbids the creation of variables at run time, the compiler knows what variables exist in each scope, and can for example map each of them to an array index, so the lexpad becomes a compact array, and an access to a variable becomes an indexing operation into that array. Lexpads generally live on the heap, and are garbage collected (or reference counted) just like other objects.
Lexpads are mostly performance optimizations. You could have separate runtime representations of each variable, but then you'd have to have an allocation for each variable in each function call you perform, whereas which are generally much slower than a single allocation of the lexpad.
To summarize, a variable has a name, a scope, and in languages that support it, a type. Those are properties known to the compiler, but not necessarily present at run time. At run time, a variable typically resolves to a stack offset in low-level languages, or to an index into a lexpad in dynamic languages.
Even in languages that boldly claim that "everything is an object", a variable often isn't. The value inside a variable may be, but the variable itself typically not.
The things I've written above generalize pretty neatly to many programming languages. I am a Perl 6 developer, so I have some insight into how Perl 6 implements variables. If you don't resist, I'll it with you :-).
Variables in Perl 6 typically come with one more level of indirection, we which call a container. This allows two types of write operations: assignment stores a value inside a container (which again might be referenced to by a variable), and binding, which places either a value or a container directly into variable.
Here's an example of assignment and binding in action:
my $x;
my $y;
# assignment:
$x = 42;
$y = 'a string';
say $x; # => 42
say $y; # => a string
# binding:
$x := $y;
# now $x and $y point to the same container, so that assigning to one
# changes the other:
$y = 21;
say $x; # => 21
Why, I hear you cry?
There are three major reasons.
The first is that makes assignment something
that's not special. For example in python, if you assign to anything other
than a plain variable, the compiler translates it to some special method call
(obj.attr = x
to setattr(obj, 'attr', x)
, obj[idx] = x
to a
__setitem__
call etc.). In Perl 6, if you want to implement something you
can assign to, you simply return a container from that expression, and then
assignment works naturally.
For example an array is basically just a list in which the elements are
containers. This makes @array[$index] = $value
work without any special
cases, and allows you to assign to the return value of methods, functions, or
anything else you can think of, as long as the expression returns a container.
The second reason for having both binding and assignment is that it makes it pretty easy to make things read-only. If you bind a non-container into a variable, you can't assign to it anymore:
my $a := 42;
$a = "hordor"; # => Cannot assign to an immutable value
Perl 6 uses this mechanism to make function parameters read-only by default.
Likewise, returning from a function or method by default strips the container,
which avoids accidental action-at-a-distance (though an is rw
annotation can
prevent that, if you really want it).
This automatic stripping of containers also makes expressions like $a + 2
work,
independently of whether $a
holds an integer directly, or a container that
holds an integer. (In the implementation of Perl 6's core types, sometimes
this has to be done manually. If you ever wondered what nqp::decont
does in
Rakudo's source code, that's what).
The third reason relates to types.
Perl 6 supports gradual typing, which means you can optionally annotate your variables (and other things) with types, and Perl 6 enforces them for you. It detects type errors at compile time where possible, and falls back to run-time checking types.
The type of a variable only applies to binding, but it inherits this type to its default container. And the container type is enforced at run time. You can observe this difference by binding a container with a different constraint:
my Any $x;
my Int $i;
$x := $i;
$x = "foo"; # => Type check failed in assignment to $i; expected Int but got Str ("foo")
Int
is a subtype of Any
, which is why the binding of $i
to $x
succeeds. Now $x
and $i
share a container that is type-constrained to
Int
, so assigning a string to it fails.
Did you notice how the error message mentions $i
as the variable name, even though
we've tried to assign to $x
? The variable name in the error message is really
a heuristic, which works often enough, but sometimes fails. The container
that's shared between $x
and $i
has no idea which variable you used to
access it, it just knows the name of the variable that created it, here $i
.
Binding checks the variable type, not the container type, so this code doesn't complain:
my Any $x;
my Int $i;
$x := $i;
$x := "a string";
This distinction between variable type and container type might seem weird for scalar variables, but it really starts to make sense for arrays, hashes and other compound data structures that might want to enforce a type constraint on its elements:
sub f($x) {
$x[0] = 7;
}
my Str @s;
f(@s);
This code declares an array whose element all must be of type Str
(or
subtypes thereof). When you pass it to a function, that function has no
compile-time knowledge of the type. But since $x[0]
returns a container with
type constraint Str
, assigning an integer to it can produce the error you
expect from it.
Variables typically only exists as objects at compile time. At run time, they are just some memory location, either on the stack or in a lexical pad.
Perl 6 makes the understanding of the exact nature of variables a bit more involved by introducing a layer of containers between variables and values. This offers great flexibility when writing libraries that behaves like built-in classes, but comes with the burden of additional complexity.