Partial Evaluation, Futamura Projection And Their Applications
What Partial Evaluation Is?
Partial evaluation means to fix some variables in the given code before execution. With a traditional implementation of a compiler or an interpreter, all variables are replaced with its value on each evaluation of that variable. This is because a variable can change at any timing. This is, however, not always true in actual applications. Almost all of large applications has setting variables and data variables. The former is set statically by command-line arguments or setting files, so they does not change on running.
Partial evaluation is formalized as follows. Take an source code
and the set of variables
v can be split into two set
of variables. One is
Vc, whose element has a fixed corresponding
c) before execution. The other is
Vs, whose value cannot be
defined statically, vary on the each execution of that part of the
code. Partial evaluating function
mix has the following type.
mix :: source -> constants, configured values -> s*
s* is a function from data which fix on execution to the
What Futamura Projection Is?
The basic idea of Futamura projection is to see an interpreter and a
source code as input of
mix . A source code is a string. An
interpreter is taken as a 2-ary function.
mix takes a 2-ary
operator and some kind of data.
Let's introduce the notion of the universal evaluator to understand Futamura projection. The most generalized form of evaluation is what takes an interpreter of a language, a source code written in the language, and input data, then outputs the result. With this evaluator, one can run any source code if he has an interpreter for its language (you may think it is obvious, but this characteristic is useful in application. See the next section).
univEval :: interpreter -> source -> input -> output
Futamura projection is equal to partial application on this
mix defined in the previous section. There are 3 kinds of
Futamura projection. The first projection,
mix int s (int: interpreter, s: source)
univEval int s.
The return value is a function with type
input -> output. This is an "executable".
The second is
mix mix int
which corresponds to
The type of the return value is
source -> input -> output, the same as a "compiler".
mix mix1 mix2 (mix2 is for interpreter, from #2).
univEval itself. This is an universal compiler generator.
executable and a
compiler is a bit different from usual
executable is not necessarily written in native code.
It just takes data, then outputs. A
compiler does not necessarily
generates a native code from a source code. It takes an source code
and input and outputs the result.
The merit of this approach is that writing an interpreter is usually easier then writing a compiler from scratch, or using compiler-compiler with a translator description language .
The speed of generated programs is a different question. It is obvious that a native code binary is faster than its interpreter version. However, as mentioned above, Futamura projection is nothing to do with the kind of a generated code. It is up to the implementation. In order to see this point in detail, we will see PyPy, or the RPython toolchain in the next section.
Real World Example - PyPy
PyPy used to have two meanings. Recently, however, the RPython toolchain refers to the translation toolchain for RPython script, and PyPy refers to only the interpreter written in RPython using the toolchain. In this section, I will cover the RPython toolchain, and take PyPy as the example.
The RPython toolchain is practical implementation of
job of the translation toolchain is to translate RPython programs into
an efficient version of that program for one of various target
platforms". What it does is the second Futamura projection. In
our definition, this generates a "compiler" from an interpreter, but
usually the generated program is called as an interpreter, because it
needs a source and data at the same time.
Note that the RPython toolchain can also be used to make an executable,
like a compiler. In this case, the input program is an
ordinal program. Such operation corresponds to the first Futamura
for a hello-world example.
One benefit of this is that one can generate many binaries and byte-codes from a single interpreter implementation, thanks to rich libraries of the toolchain. Currently, C, llvm, JVM and CLI are supported. Moreover, RPython, a reduced Python, is rich enough for fun programming experience. No one want to write enormous C programs, are you?
Another point is that PyPy is faster (about 2 to 5 times) than CPython, the Python implementation in C. These two are interpreters of python, and have almost the same features. The secret is in JIT compiling. Usually, JIT compiler is needed for each interpreter and each platform. The RPython toolchain has only one JIT compiler, but it is applied to the interpreter (written in RPython). Moreover, RPython offers easy ways to configure JIT compilation. By the way, JIT compilation is also an implementation of the first Futamura projection. It takes a byte-code and generate a native code, that is, an executable.
Futamura projection is an abstract notion, so it seems to be difficult
at first glance. However, I found PyPy. It is a widely-used
mix tool. Partial evaluation is easier in
functional languages, because a pure function is free from the
problem of variables' sudden mutation. A strict type system is also
help for compilation into native codes. But Python is a dynamic,
object oriented language. Cool.
I told about Python, but I am not a pythonista. I believe I am a rubyist. However, the JIT / AOT compiling support is miserable for ruby. Especially rails is said to be heavy, memory hungry. Rails is just an web framework. What he does is just to accept a HTTP request, to gather data, then to return something. I am sure that it can be compiled on deployment to be much faster and resource effective.
- Yoshihiko Futamura, "Partial Evaluation of Computation Process—An Approach to a Compiler-Compiler", Higher-Order and Symbolic Computation 12, 381–391 (1999)
- Yoshihiko Futamura, "EL1 Partial Evaluator (Progress Report)"
- PyPy - The RPython Toolchain — PyPy 1.9 documentation
- PyPy - RPython toolchain — pypyja 1.7 documentation
- Motivating JIT Compiler Generation — pypyja 1.7 documentation
- 流行りのJITコンパイラは嫌いですか? — PyPy Advent Calendar 2011 v1.0 documentation
Thanks for the interesting post. I found the "RPython as a Futamura projection" idea very interesting. I have some corrections:
First, you describe third projection as
mix mix1 mix2, where
mix2is second projections(I assume
mix1means first projection, even though you didn't mention it). This is not correct, just like second projection is
mix mix intand not
mix mix1 int, third projection is
mix mix mix. In fact, you can't even have
mix mix1 mix2, here's why:
Second, I don't think RPython is second projection. The reason is that second projection involves self-specialization. RPython is a compiler, and since it's inputs are generally interpreters, it may seem like second projection. But it doesn't involve any self-specializations and so it's not second projection.
Assume that I'm running GCC to compile my interpreter, is that a second projection? If RPython is second projection, than all compilers are second projections.
I agree that RPython is capable of doing first projection. (where mix = RPython, int = RPython program, s = interpreter program)
(One another thing to note here is that, unless RPython is itself written in RPython, it's not possible to go beyond first projection)
EDIT: I thought about this more and I think RPython can't even do first projection.
mixhas to operate on the language that it's written in, and it should output programs in same language. (otherwise you can't self-specialize) But RPython is taking a subset of Python as inputs, and generating
native languageC programs. Also, RPython is not written in this Python subset.