ECL’s comipler source code may be little hard to read. It relies heavily on global variables and the code has grown over many years of fixes and improvements. These notes are meant to serve the purpose of a guide (not a reference manual or a documentation). If you notice that they are not up to date then please submit a patch with corrections.
Syntax tree nodes are represented as instances of the c1form
structure. Each node has a slot name
which is a symbol denoting the
operator and information about the file and position in it.
Operators are dispatched to functions with appropriate tables
associated with a functionality (i.e *p1-dispatch-table*
is a
dispatch table for type propagators associated with c1form
’s).
Object references are used for numerous optimizations. They are
represented as instances of ref
structure descendants:
- var
- variable reference
- fun
- function reference
- blk
- block reference (block/return)
- tag
- tag reference (tagbody/go)
Each reference has an identifier, number of references, flags for cross-closure-boundary and cross-local-function-boundary references and a list of nodes (c1forms) which refer to the object.
When compiling a file (simplified ovierview):
First pass:
- Check if the file exists and that we can perform the compilation
- Estabilish the compilation environment
- Load
cmpinit.lsp
if present in the same directory - Initialize a data section and construct the AST (
compiler-pass1
)
Second pass:
- Compute initialization function name (entry point of the object)
- Propagate types through the AST
- Compile AST to a C source files
.c
and.eclh
(compiler-pass2
) - Dump a data segment in a
.data
file (symbolcompiler_data_text
) - Compile artifacts with the C compiler (
compiler-cc
andbundle-cc
))
- Initialize a data section
Data section contains permanent and temporary objects which are later
serialized in a data segment of the complaition artifacts after the
second pass. Objects put in data section are constants, symbols,
load-time-value
’s etc. The same object may be added few times, then
it is stored as a location. Not object types can be dumped to C file.
- Construct the AST
Each form which is read is passed to t1expr creates a c1form
which
are stored in *top-level-forms*
which are later used in the second
pass. c1form
is created as follows (simplified pseudocode):
(defun t1expr* (form)
(setq form (maybe-expand-symbol-macro form))
(when (atom form)
;; ignore the form
(return))
(destructuring-bind (op args) form
(typecase op
(cons
(t1ordinary form))
((not symbol)
(error "illegal function"))
((eq quote)
;; should we ignore the form(?)
(t1ordinary 'NIL))
(t1-dispatch-entry
(top-level-dispatch form))
(c1-dispatch-entry
(not-top-level-dispatch form))
(expandable-compiler-macro
(add 'macroexpand *current-toplevel-form*)
(t1expr* (expand-macro form)))
(expandable-macro
(add 'macroexpand *current-toplevel-form*)
(t1expr* (expand-macro form)))
(otherwise
(t1ordinary form)))))
Forms are processed recursively with appropriate operator
handlers. Funcations named t1xxx
are a top level form handlers while
c1xxx
are handlers for the rest. When operator is not special it is
processed according to normal rules i.e with c1call
.
Function t1ordinary
handles top-level forms which do not have
special semantics associated with them by binding top-levelness flag
to NIL and adding a c1form with a name ordinary
and storing result
of (c1expr form)
in load-time values. Top level forms may have side
effects (i.e registering a macro in a global compiler environment).
Function c1expr
is used to handle forms which are not
top-level. Dispatched operator handler may eliminate dead code,
perform constant folding and propagate constants and rewrite the form
which is processed again. Handler may modify the compiler environment
(i.e register a local function or a local variable) and add new
objects to a data section. Already created c1forms may be updated i.e
to note that there is a cross-closure reference.
Second pass is responsible for producing files which are then compiled
by the C compiler. For top level forms we have t2xxx
handlers and
for the rest c2xxx
handlers. Additionally there are other helper
tables (p1xxx
for type propagation and location dispatch tables
set-loc
and wt-loc
with varying handler names).
(defun pass2 ()
(produce-headers)
(eclh/produce-data-section-declarations)
(with-initialization-code () ; this is put at the end of c file
(include-data-file)
(produce-initialize-cblock)
(produce-setf-function-definitions)
(do-type-propagation *top-level-forms*)
;; compiler-phase "t2" starts now
;;
;; This part is tricky. When we emit top-level form part of it
;; lands in the c-file before the initialization code (C function
;; definitions) and part is put in the initialization code.
(emit-top-level-forms *make-forms*)
(emit-top-level-forms *top-level-forms*))
(eclh/produce-data-segment-declarations)
(eclh/produce-setf-function-definers) ; should be inlined in c file?
(eclh/add-static-constants) ; CHECKME never triggered?
(eclh/declare-c-funciton-table) ; static table with function data
;; compiler-phase "t3" starts now
(eclh/declare-callback-functions) ; calls t3-callback
(data/dump-data-section))
(defun emit-top-level-form (form)
(with-init ()
(emit (t2expr form)))
(do-local-funs (fun)
;; t3local-fun may add new local funs to process.
(emit (t3local-fun fun))))
Example output in pseudocode follows. I’ve put some comments to indicate potential issues and improvement opportunities.
- <file-name>.eclh
- static data, declarations and symbol mappings
static cl_object *VV; /* declare data section */
static cl_object Cblock;
#define VM size_of_data_permanent_storage;
#define VMtemp size_of_data_temporary_storage;
/* Declare functions in this file. They are declared static and hold
in Cblock to assure that we can recompile the fasl and load it. */
static cl_object L1ordinary_function(cl_object , cl_object );
static cl_object LC2foobar(cl_object , cl_object );
static cl_object LC3__g0(cl_object , cl_object );
/* In safe code we always go through ecl_fdefinition and then this
macro definition expands to nothing. */
#define ECL_DEFINE_SETF_FUNCTIONS \\
VV[10]=ecl_setf_definition(VV[11],ECL_T); \\
VV[12]=ecl_setf_definition(VV[13],ECL_T);
/* Statically defined constants.
XXX I'm not sure how to trigger constant builders. Needs
investigation if it is not a dead code, and if so whether we should
resurrect it or remove. */
/* exported lisp functions -- installed in Cblock */
#define compiler_cfuns_size 1
static const struct ecl_cfunfixed compiler_cfuns[] = {
/*t,m,narg,padding,name=function-location,block=name-location,entry,entry_fixed,file,file_position*/
{0,0,2,0,ecl_make_fixnum(6),ecl_make_fixnum(0),(cl_objectfn)L1ordinary_function,NULL,ECL_NIL,ecl_make_fixnum(23)},
};
/* callback declarations (functions defined with defcallback). */
#include <ecl/internal.h>
static int ecl_callback_0(int var0,int var1);
- <file-name>.data
- data segment
static const struct ecl_base_string compiler_data_text1[] = {
(int8_t)t_base_string, 0, ecl_aet_bc, 0,
ECL_NIL, (cl_index)1065, (cl_index)1065,
(ecl_base_char*)
"common-lisp-user::make-closure common-lisp-user::ordinary-function common-lisp-u"
"ser::+ordinary_constant+ common-lisp-user::*foobar* common-lisp-user::foobar :de"
"lete-methods clossy-package::bam 0 0 si::dodefpackage clos::install-method clos:"
":associate-methods-to-gfun \"CL-USER\" ((optimize (debug 1))) (defun common-lisp-u"
"ser::make-closure) (#1=#P\"/home/jack/test/foobar.lisp\" . 55) (defun common-lisp-"
"user::ordinary-function) (#1# . 132) (common-lisp-user::a common-lisp-user::b) 4"
"2.32 (defconstant common-lisp-user::+ordinary_constant+) (#1# . 175) (defvar com"
"mon-lisp-user::*foobar*) (#1# . 216) (defun common-lisp-user::foobar) (#1# . 237"
") \"CLOSSY-PACKAGE\" (\"CL\") (\"BAM\" \"GENERIC-FUNCTION\") (defgeneric generic-functio"
"n) (#1# . 451) (clossy-package::a clossy-package::b) (defmethod generic-function"
" (clossy-package::a real) (clossy-package::b real)) (real real) (defmethod gener"
"ic-function (clossy-package::a integer) (clossy-package::b integer)) (integer in"
"teger) (defclass clossy-package::bam) (#1# . 582) ((:initform 42 :initargs (:a) "
":name clossy-package::a))" };
static const cl_object compiler_data_text[] = {
(cl_object)compiler_data_text1,
NULL};
- <file-name>.c
- function definitions and the initialization code
#include <ecl/ecl-cmp.h>
#include "/absolute/path/to/<file-name>.eclh"
/* Normal functions are defined with DEFUN. Local functions may be
lambdas, closures, methods, callbacks etc.
XXX callback function implementations should be inlined to avoid
indirection.
XXX method function names are named like LCn__g0 and on lisp side
they have names like g0 -- gensymed part of the name should be
produced from the generic function name for easier debugging. */
/* normal function definitions */
static cl_object L1fun (cl_object v1a, cl_object v2b) { /*...*/ }
/* local function definitions */
static cl_object LC2__g0 (cl_object v1a) { /* method */ }
static cl_object LC3__g0 (cl_narg narg, ...) { /* closure */ }
static cl_object LC4foobar (cl_object v1a, cl_object v2b) { /* callback */ }
/* callbacks */
static int ecl_callback_0 (int var0, int var1) { /* calls LC2foobar */ }
#include "/absolute/path/to/<file-name>.data"
ECL_DLLEXPORT void init_fas_CODE(cl_object flag) {
/* Function is designed to work in two passes. */
if (flag != OBJNULL) {
/* The loader passes a cblock as flag for us to initialize. */
Cblock = flag->cblock;
flag->cblock.data = VV;
flag->cblock.data_text = compiler_data_text;
/* ... */
return;
}
/* The loader initializes the module (calls READ on data segment
elements and initializes cblock.data with results, then installs
functions and their source information. */
/* 2. Execute top level code. */
VVtemp = Cblock->cblock.temp_data;
ECL_DEFINE_SETF_FUNCTIONS;
/* Note that mere annotation in a simple file requires plenty of
function calls so that impacts FASL load time. We should make
annotations part of the objects themself (instead of keeping a
central registry), then maybe we could keep this data static. */
si_select_package(VVtemp[0]);
(cl_env_copy->function=(ECL_SYM("MAPC",545)->symbol.gfdef))->cfun.entry(2, ECL_SYM("PROCLAIM",668), VVtemp[1]) /* MAPC */;
ecl_function_dispatch(cl_env_copy,ECL_SYM("ANNOTATE",1823))(4, VV[0], ECL_SYM("LOCATION",1829), VVtemp[2], VVtemp[3]) /* ANNOTATE */;
ecl_function_dispatch(cl_env_copy,ECL_SYM("ANNOTATE",1823))(4, VV[0], ECL_SYM("LAMBDA-LIST",1000), ECL_NIL, ECL_NIL) /* ANNOTATE */;
ecl_cmp_defun(VV[7]); /* MAKE-CLOSURE */
/* ... */
si_select_package(VVtemp[14]);
/* XXX defgeneric should be compiled. */
(cl_env_copy->function=(ECL_SYM("ENSURE-GENERIC-FUNCTION",944)->symbol.gfdef))->cfun.entry(5, ECL_SYM("GENERIC-FUNCTION",947), VV[5], ECL_T, ECL_SYM("LAMBDA-LIST",1000), VVtemp[19]) /* ENSURE-GENERIC-FUNCTION */;
clos_load_defclass(VV[6], ECL_NIL, VVtemp[26], ECL_NIL);
/* ... */
}
Generic functions are not compiled.
Compilation target machine is described in terms of types supported by
the target compiler. +representation-types+
cover primitives types
which are representable in C (:byte, :fixnum, :float-sse-pack, :bool,
:pointer-void etc.). Each type has a corresponding Lisp type, C type
and ways to convert between Lisp and C types (a separate column shows
how to perform an unsafe convertion on unboxed values). List is
ordered from the most specific to the least specific.
To describe a concreete machine two variables are used:
+all-machines-c-types+
containing common types for all C compilers
(without integers) and +this-machine-c-types+
adding integers and
types which vary between C compilers (i.e long long int
). Optionally
each type has information about number of bits used (for bit
fiddling), that information should be kept separate (imo). Variable
*default-machine*
use constructed from these both
tables. Alternative machine representations may be created for cross
compilation.
Each representation type is represented by an instance of a structure
rep-type
. That information is used when the C code is generated to
manipulate data of certain type.
Environment objects
http://www.lispworks.com/documentation/HyperSpec/Body/03_aa.htm