Skip to content

Instantly share code, notes, and snippets.

@dkochmanski
Created December 7, 2019 13:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dkochmanski/fc361643488c8059a6030f143ac94a6d to your computer and use it in GitHub Desktop.
Save dkochmanski/fc361643488c8059a6030f143ac94a6d to your computer and use it in GitHub Desktop.

ECL compiler

ECL’s comipler source code may be little hard to read. It relies heavily on global variables and the code has grown over many years of fixes and improvements. These notes are meant to serve the purpose of a guide (not a reference manual or a documentation). If you notice that they are not up to date then please submit a patch with corrections.

Abstract syntax tree

Syntax tree nodes are represented as instances of the c1form structure. Each node has a slot name which is a symbol denoting the operator and information about the file and position in it.

Operators are dispatched to functions with appropriate tables associated with a functionality (i.e *p1-dispatch-table* is a dispatch table for type propagators associated with c1form’s).

References

Object references are used for numerous optimizations. They are represented as instances of ref structure descendants:

var
variable reference
fun
function reference
blk
block reference (block/return)
tag
tag reference (tagbody/go)

Each reference has an identifier, number of references, flags for cross-closure-boundary and cross-local-function-boundary references and a list of nodes (c1forms) which refer to the object.

Compilation algorithm (simlified)

When compiling a file (simplified ovierview):

First pass:

  1. Check if the file exists and that we can perform the compilation
  2. Estabilish the compilation environment
  3. Load cmpinit.lsp if present in the same directory
  4. Initialize a data section and construct the AST (compiler-pass1)

Second pass:

  1. Compute initialization function name (entry point of the object)
  2. Propagate types through the AST
  3. Compile AST to a C source files .c and .eclh (compiler-pass2)
  4. Dump a data segment in a .data file (symbol compiler_data_text)
  5. Compile artifacts with the C compiler (compiler-cc and bundle-cc))

compiler-pass1

  1. Initialize a data section

Data section contains permanent and temporary objects which are later serialized in a data segment of the complaition artifacts after the second pass. Objects put in data section are constants, symbols, load-time-value’s etc. The same object may be added few times, then it is stored as a location. Not object types can be dumped to C file.

  1. Construct the AST

Each form which is read is passed to t1expr creates a c1form which are stored in *top-level-forms* which are later used in the second pass. c1form is created as follows (simplified pseudocode):

(defun t1expr* (form)
  (setq form (maybe-expand-symbol-macro form))
  (when (atom form)
    ;; ignore the form
    (return))
  (destructuring-bind (op args) form
    (typecase op
      (cons
       (t1ordinary form))
      ((not symbol)
       (error "illegal function"))
      ((eq quote)
       ;; should we ignore the form(?)
       (t1ordinary 'NIL))
      (t1-dispatch-entry
       (top-level-dispatch form))
      (c1-dispatch-entry
       (not-top-level-dispatch form))
      (expandable-compiler-macro
       (add 'macroexpand *current-toplevel-form*)
       (t1expr* (expand-macro form)))
      (expandable-macro
       (add 'macroexpand *current-toplevel-form*)
       (t1expr* (expand-macro form)))
      (otherwise
       (t1ordinary form)))))

Forms are processed recursively with appropriate operator handlers. Funcations named t1xxx are a top level form handlers while c1xxx are handlers for the rest. When operator is not special it is processed according to normal rules i.e with c1call.

Function t1ordinary handles top-level forms which do not have special semantics associated with them by binding top-levelness flag to NIL and adding a c1form with a name ordinary and storing result of (c1expr form) in load-time values. Top level forms may have side effects (i.e registering a macro in a global compiler environment).

Function c1expr is used to handle forms which are not top-level. Dispatched operator handler may eliminate dead code, perform constant folding and propagate constants and rewrite the form which is processed again. Handler may modify the compiler environment (i.e register a local function or a local variable) and add new objects to a data section. Already created c1forms may be updated i.e to note that there is a cross-closure reference.

compiler-pass2

Second pass is responsible for producing files which are then compiled by the C compiler. For top level forms we have t2xxx handlers and for the rest c2xxx handlers. Additionally there are other helper tables (p1xxx for type propagation and location dispatch tables set-loc and wt-loc with varying handler names).

(defun pass2 ()
  (produce-headers)
  (eclh/produce-data-section-declarations)
  (with-initialization-code () ; this is put at the end of c file
    (include-data-file)
    (produce-initialize-cblock)
    (produce-setf-function-definitions)
    (do-type-propagation *top-level-forms*)
    ;; compiler-phase "t2" starts now
    ;;
    ;; This part is tricky. When we emit top-level form part of it
    ;; lands in the c-file before the initialization code (C function
    ;; definitions) and part is put in the initialization code.
    (emit-top-level-forms *make-forms*)
    (emit-top-level-forms *top-level-forms*))
  (eclh/produce-data-segment-declarations)
  (eclh/produce-setf-function-definers) ; should be inlined in c file?
  (eclh/add-static-constants)           ; CHECKME never triggered?
  (eclh/declare-c-funciton-table)       ; static table with function data
  ;; compiler-phase "t3" starts now
  (eclh/declare-callback-functions)     ; calls t3-callback
  (data/dump-data-section))

(defun emit-top-level-form (form)
  (with-init ()
    (emit (t2expr form)))
  (do-local-funs (fun)
    ;; t3local-fun may add new local funs to process.
    (emit (t3local-fun fun))))

Example output

Example output in pseudocode follows. I’ve put some comments to indicate potential issues and improvement opportunities.

<file-name>.eclh
static data, declarations and symbol mappings
static cl_object *VV;           /* declare data section */
static cl_object Cblock;

#define VM     size_of_data_permanent_storage;
#define VMtemp size_of_data_temporary_storage;

/* Declare functions in this file. They are declared static and hold
   in Cblock to assure that we can recompile the fasl and load it. */
static cl_object L1ordinary_function(cl_object , cl_object );
static cl_object LC2foobar(cl_object , cl_object );
static cl_object LC3__g0(cl_object , cl_object );

/* In safe code we always go through ecl_fdefinition and then this
   macro definition expands to nothing. */
#define ECL_DEFINE_SETF_FUNCTIONS \\
  VV[10]=ecl_setf_definition(VV[11],ECL_T); \\
  VV[12]=ecl_setf_definition(VV[13],ECL_T);

/* Statically defined constants.

   XXX I'm not sure how to trigger constant builders. Needs
   investigation if it is not a dead code, and if so whether we should
   resurrect it or remove. */

/* exported lisp functions -- installed in Cblock */
#define compiler_cfuns_size 1
static const struct ecl_cfunfixed compiler_cfuns[] = {
/*t,m,narg,padding,name=function-location,block=name-location,entry,entry_fixed,file,file_position*/
{0,0,2,0,ecl_make_fixnum(6),ecl_make_fixnum(0),(cl_objectfn)L1ordinary_function,NULL,ECL_NIL,ecl_make_fixnum(23)},
};

/* callback declarations (functions defined with defcallback). */
#include <ecl/internal.h>
static int ecl_callback_0(int var0,int var1);
<file-name>.data
data segment
static const struct ecl_base_string compiler_data_text1[] = {
        (int8_t)t_base_string, 0, ecl_aet_bc, 0,
        ECL_NIL, (cl_index)1065, (cl_index)1065,
        (ecl_base_char*)
"common-lisp-user::make-closure common-lisp-user::ordinary-function common-lisp-u"
 "ser::+ordinary_constant+ common-lisp-user::*foobar* common-lisp-user::foobar :de"
 "lete-methods clossy-package::bam 0 0 si::dodefpackage clos::install-method clos:"
 ":associate-methods-to-gfun \"CL-USER\" ((optimize (debug 1))) (defun common-lisp-u"
 "ser::make-closure) (#1=#P\"/home/jack/test/foobar.lisp\" . 55) (defun common-lisp-"
 "user::ordinary-function) (#1# . 132) (common-lisp-user::a common-lisp-user::b) 4"
 "2.32 (defconstant common-lisp-user::+ordinary_constant+) (#1# . 175) (defvar com"
 "mon-lisp-user::*foobar*) (#1# . 216) (defun common-lisp-user::foobar) (#1# . 237"
 ") \"CLOSSY-PACKAGE\" (\"CL\") (\"BAM\" \"GENERIC-FUNCTION\") (defgeneric generic-functio"
 "n) (#1# . 451) (clossy-package::a clossy-package::b) (defmethod generic-function"
 " (clossy-package::a real) (clossy-package::b real)) (real real) (defmethod gener"
 "ic-function (clossy-package::a integer) (clossy-package::b integer)) (integer in"
 "teger) (defclass clossy-package::bam) (#1# . 582) ((:initform 42 :initargs (:a) "
 ":name clossy-package::a))" };

static const cl_object compiler_data_text[] = {
(cl_object)compiler_data_text1,
NULL};
<file-name>.c
function definitions and the initialization code
#include <ecl/ecl-cmp.h>
#include "/absolute/path/to/<file-name>.eclh"

/* Normal functions are defined with DEFUN. Local functions may be
   lambdas, closures, methods, callbacks etc.

   XXX callback function implementations should be inlined to avoid
   indirection.

   XXX method function names are named like LCn__g0 and on lisp side
   they have names like g0 -- gensymed part of the name should be
   produced from the generic function name for easier debugging. */

/* normal function definitions */
static cl_object L1fun     (cl_object v1a, cl_object v2b) { /*...*/ }
/* local  function definitions */
static cl_object LC2__g0   (cl_object v1a)                { /* method   */ }
static cl_object LC3__g0   (cl_narg narg, ...)            { /* closure  */ }
static cl_object LC4foobar (cl_object v1a, cl_object v2b) { /* callback */ }

/* callbacks */
static int ecl_callback_0 (int var0, int var1) { /* calls LC2foobar */ }

#include "/absolute/path/to/<file-name>.data"
ECL_DLLEXPORT void init_fas_CODE(cl_object flag) {
  /* Function is designed to work in two passes. */
  if (flag != OBJNULL) {
    /* The loader passes a cblock as flag for us to initialize. */
    Cblock = flag->cblock;
    flag->cblock.data = VV;
    flag->cblock.data_text = compiler_data_text;
    /* ... */
    return;
  }
  /* The loader initializes the module (calls READ on data segment
     elements and initializes cblock.data with results, then installs
     functions and their source information. */

  /* 2. Execute top level code. */
  VVtemp = Cblock->cblock.temp_data;
  ECL_DEFINE_SETF_FUNCTIONS;

  /* Note that mere annotation in a simple file requires plenty of
     function calls so that impacts FASL load time. We should make
     annotations part of the objects themself (instead of keeping a
     central registry), then maybe we could keep this data static. */

  si_select_package(VVtemp[0]);
  (cl_env_copy->function=(ECL_SYM("MAPC",545)->symbol.gfdef))->cfun.entry(2, ECL_SYM("PROCLAIM",668), VVtemp[1]) /*  MAPC */;
  ecl_function_dispatch(cl_env_copy,ECL_SYM("ANNOTATE",1823))(4, VV[0], ECL_SYM("LOCATION",1829), VVtemp[2], VVtemp[3]) /*  ANNOTATE */;
  ecl_function_dispatch(cl_env_copy,ECL_SYM("ANNOTATE",1823))(4, VV[0], ECL_SYM("LAMBDA-LIST",1000), ECL_NIL, ECL_NIL) /*  ANNOTATE */;
  ecl_cmp_defun(VV[7]);                           /*  MAKE-CLOSURE    */
  /* ... */
  si_select_package(VVtemp[14]);

  /* XXX defgeneric should be compiled. */
  (cl_env_copy->function=(ECL_SYM("ENSURE-GENERIC-FUNCTION",944)->symbol.gfdef))->cfun.entry(5, ECL_SYM("GENERIC-FUNCTION",947), VV[5], ECL_T, ECL_SYM("LAMBDA-LIST",1000), VVtemp[19]) /*  ENSURE-GENERIC-FUNCTION */;
  clos_load_defclass(VV[6], ECL_NIL, VVtemp[26], ECL_NIL);
  /* ... */
}

Generic functions are not compiled.

Representation types

Compilation target machine is described in terms of types supported by the target compiler. +representation-types+ cover primitives types which are representable in C (:byte, :fixnum, :float-sse-pack, :bool, :pointer-void etc.). Each type has a corresponding Lisp type, C type and ways to convert between Lisp and C types (a separate column shows how to perform an unsafe convertion on unboxed values). List is ordered from the most specific to the least specific.

To describe a concreete machine two variables are used: +all-machines-c-types+ containing common types for all C compilers (without integers) and +this-machine-c-types+ adding integers and types which vary between C compilers (i.e long long int). Optionally each type has information about number of bits used (for bit fiddling), that information should be kept separate (imo). Variable *default-machine* use constructed from these both tables. Alternative machine representations may be created for cross compilation.

Each representation type is represented by an instance of a structure rep-type. That information is used when the C code is generated to manipulate data of certain type.

Environments

Compilation environment

The Global environment

Dynamic environments

Lexical environments

Debug Lexical Environment

Environment objects

http://www.lispworks.com/documentation/HyperSpec/Body/03_aa.htm

Loading FASL files

Cross compilation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment