kini/lab3.org

## lab3.org

      
    Raw
  

              lab3.org
            
          
    Compiler Lab: Generating Assembly Code

Introduction


  Assigned: Oct. 14
  Due: Oct. 30, 11:59 PM

Note: Keshav Kini (krkini@utexas.edu) is the lead TA for this lab
  assignment.
This lab is designed mainly to give you some experience in programming
  in C, and a bit of experience with x86-64 assembly.  You will be
  writing a very basic compiler – in other words, a program that
  produces x86-64 assembly code when you run it.
Your program will take in a reverse Polish notation arithmetic
  expression and produce assembly code for a function that computes the
  value of that expression.  This will be explained in detail later in
  this document.
These instructions are quite long.  Read them carefully!
Logistics

This is an individual project.  You may not share your work on lab
  assignments with other students, but do feel free to ask instructors
  for help if you are stuck, for example during office hours or a
  discussion section.
You will turn in a tarball (a .tar.gz file) containing the source
  code for your program, written in C.  You must include a Makefile,
  such that we can build your code using the make command.  (If you
  did Lab 0, you will already be familiar with writing makefiles.)
  Ensure that your code compiles and runs correctly on
  skipper.cs.utexas.edu (I’m just choosing a fixed machine for
  consistency).
Any clarifications and corrections for this lab will be posted on
  Canvas and Piazza, and if necessary, fixes will be committed to this
  file.
Background

Reverse Polish notation (RPN) is an alternative way of representing
  arithmetic expressions like those you might type into a calculator.
  At one point in time, it was popular to use RPN in physical
  calculators because it was easier to implement a calculator that read
  RPN.  Similarly, it will be easier for you to do this assignment with
  RPN than with the normal style of writing arithmetic expressions.
Here are some examples to give you an idea of how RPN works.

  Normal: 1 + 1
    RPN: 1 1 +
  
  Normal: 3 * (4 + 7)
    RPN: 3 4 7 + *
  
  Normal: (8 + 9) * (4 + 3 * 10) - 5
    RPN: 8 9 + 4 3 10 * + * 5 -
  

Do you see how it works?  In RPN, arithmetic operators like +, *,
  and - come after their two operands, rather than in between them.
  As you enter each number or operator, the RPN calculator proceeds with
  its computation.  When you enter a number, the calculator stores the
  number on a stack.  When you enter an operator, it pops the latest two
  numbers off of the stack, applies the operator to them, and pushes
  the result back onto the stack.
Notice how we didn’t need any parentheses to write expressions in RPN!
  Yes, you might think I cherry-picked examples where RPN didn’t need
  parentheses, but if you think about it for a while longer you’ll see
  that parentheses are never necessary in RPN!  (Hint: it’s because
  all the operators take exactly 2 arguments.)  That certainly makes it
  simpler to design the calculator, since you don’t have to keep track
  of nested parentheses.
Here is an example of how an RPN calculator would process that last
  example from above, as given by how the calculator’s stack looks after
  each new input.
8
8 9
17
17 4
17 4 3
17 4 3 10
17 4 30
17 34
578
578 5
573

You can read more about reverse Polish notation on Wikipedia.
The RPN compiler

We define in this section a simple compiler that takes an expression
  on the command line and dumps assembly code to stdout (standard
  output).  The assembly code will be a function called compute that
  accepts as arguments three values for the variables x, y, and z
  respectively, and returns result of the expression.
Input to your program is an expression (sequence of tokens) given on
  the command line.  This expression may contain integer constants,
  variables, and operators.  Integer constants may be positive, zero, or
  negative, meaning that they will consist of numeric digits, possibly
  with a minus sign at the beginning.  The only variables allowed are
  x, y, and z (because the assembly function your program will
  produce will be a function that accepts three arguments, x, y, and
  z).  The following operators are allowed: +, *, - (plus,
  times, minus).  Expressions are written in reverse Polish notation (as
  described earlier).
A couple of sample invocations of your program might be:
$ ./compiler x 4 + 2 "*" 4 -
[some assembly code is printed here]
$ ./compiler x 4 + 2 4 "*" -
[some assembly code is printed here]

Note: In most shells (including bash, the one you are probably
  using), the * character is used to mean “all the files in the
  current directory”, so unfortunately, the * needs to be put in
  quotation marks for our purposes, otherwise the shell will replace it
  with a list of all the files in the current directory.
If variable x has value 3, these expressions should equal 10 and -1,
  respectively.  But if variable x has value 4, the expressions should
  equal 12 and 0, respectively.
Therefore, the assembly code produced by the first run of your
  compiler should be a function that returns the number 10 when called
  with argument x set to 3, and the number 12 when called with
  argument x set to 4.  Meanwhile, the second piece of assembly code
  should be a function that returns -1 when 3 is passed as argument x,
  and returns 0 when 4 is passed as argument x.
The output of your program will be some x86-64 assembly code, printed
  to stdout (standard output).  This code should contain a function
  named compute which accepts three arguments according to the
  standard x86-64 calling convention.  The function should be designed
  such that it always returns the value of the input RPN expression
  where the values of x, y, and z have been substituted in.  More
  on that below.
First, think about how you evaluate RPN.  Given an RPN expression, do
  the following:

  If you encounter a variable, push its value on the runtime stack.
  If you encounter a constant, push it on the runtime stack.
  If you encounter an operator, pop the right number of arguments,
    apply the operator, and push the result.
  If done, the runtime stack should simply contain the answer, so
    retrieve the answer and return it.

Note that we are directly using “the stack” as our RPN calculation
  stack, rather than making a separate stack data structure or
  something.
Various problems might occur.  For example,

  You might encounter an unrecognizable symbol.
  There may not be enough arguments on the stack for the operator you
    are attempting to apply.
  There may be more than one value left on the stack when you are
    finished.

I’d suggest maintaining an integer variable to keep track of how many
  values would be on the stack at any point during your compilation /
  execution.
If you encounter any of these errors, print a useful error message to
  stderr, not stdout, and terminate with a nonzero exit code.  Examples
  of illegal inputs are:

  a (because the variable a is not allowed – only x, y, and
    z are allowed)
  4 + (because + needs two arguments)
  3 4 (because it will leave two values on the stack)
  3 4 4 2 + (because it will leave three values on the stack)
  + (because + needs two arguments)

Aside: It’s important to print to stderr rather than stdout so that
  the user can separate the assembly code you’re trying to output from
  any error messages you might produce.  For example, when you redirect
  stdout to a file using the special shell directive ”>”, stderr
  messages will still be printed to the screen.  Sample invocations
  would look like this:
$ ./compiler 1 2 3 4 + + + > output1.s
$ ./compiler 1 2 3 4 "*" - 5 6 7 + + - "*" 8 9 "*" + > output2.s
$ ./compiler 2 "*" 2 > output3.s
Error: The * operator requires two arguments
$ ./compiler 2 2 "*" > output4.s

In all four invocations, any assembly instructions printed to stdout
  will be silently put in the specified files, which we are naming with
  the .s extension because that extension is used for assembly code
  files.  But in the third invocation, there was an error message, and
  that didn’t get sent to the output3.s file, but rather to the
  screen, because ”>” only redirects stdout, not stderr.
Now, imagine parsing a legal RPN expression and generating an x86-64
  subroutine that would evaluate it.  Our expressions may contain only
  variables x, y, and z.  We assume that these are passed (in
  order) to your program via the usual calling mechanism.  That is, x,
  y, and z are passed in the registers %rdi, %rsi, and %rdx,
  respectively.  You should return your answer in register %rax, which
  again is specified by the standard calling conventions.
Consider the following RPN expression:
x 4 + 2 * 4 -

A first cut at compiling this might be to generate the following
  sequence of x86-64 instructions (probably without the comments):
  .globl compute                # Note: this marks that the label
                                # "compute:" is a function name
                                # that can be used from other code.
compute:
  pushq %rdi                    # put x on the stack
  pushq $4                      # put 4 on the stack
  popq  %r10                    # . add the top two things
  popq  %r11                    # |
  addq  %r10, %r11              # |
  pushq %r11                    # '
  pushq $2                      # put 2 on the stack
  popq  %r10                    # . multiply the top two things
  popq  %r11                    # |
  imulq %r10, %r11              # |
  pushq %r11                    # '
  pushq $4                      # put 4 on the stack
  popq  %r10                    # . subtract the top two things
  popq  %r11                    # |
  subq  %r10, %r11              # |
  pushq %r11                    # '
  popq  %rax                    # prepare to return the answer
  retq                          # return from the function
But this is very inefficient.  We are pushing numbers on the stack and
  immediately popping them off again into registers.  Also, we’re
  popping numbers off the stack into registers before performing
  arithmetic operations on them.  It’s true that in x86-64, we can’t
  write arithmetic instructions where both the source and destination
  are in memory.  But sometimes we’ll want to operate on an integer
  constant and something on the stack, which can be done with some
  instructions (like add) and not others (like imul).  So maybe we
  can optimize a bit.
Here’s a second attempt:
  .globl compute
compute:
  pushq %rdi                    # put x on the stack
  addq  $4, (%rsp)              # add 4 (directly in memory)
  popq  %r8                     # . multiply by 2 (using a temporary
  imulq $2, %r8                 # |  register)
  pushq %r8                     # '
  subq  $4, (%rsp)              # subtract 4 (directly in memory)
  popq  %rax                    # prepare to return the answer
  retq                          # return from the function
Notice that add and sub can target a memory location, but imul
  must target a register.  Experiment with (or search the web for
  information about) different instructions to see which ones can target
  a memory location and which ones have to use registers.
You can use the as command to call the GNU assembler on your
  assembly code, to check if it is valid x86-64 assembly.  Use the
  --64 flag to ensure you’re assembling as x86-64 as opposed to 32-bit
  x86 or any other kind of assembly code.
Now, there’s even more optimization we might be able to do.  Notice
  how we happened to never put more than one thing on the stack at a
  time.  So why don’t we just use a register for that one location on
  the stack we keep using?  Then we don’t need to access memory at all.
  And in fact we can just use the %rax register, into which we were
  eventually going to pop that stack value anyway.
Here’s a third attempt:
  .globl compute
compute:
  movq  %rdi, %rax              # put x in %rax
  addq  $4, %rax                # add 4 to %rax
  imulq $2, %rax                # multiply %rax by 2
  subq  $4, %rax                # subtract 4 from %rax
  retq                          # return from the function
Note that we’ve started doing optimizations that are pretty specific
  to the exact expression we started out with, x 4 + 2 * 4 -.  They
  might not work for all expressions.  That means that if you want to
  implement these optimizations, your C code that generates this
  assembly will have to be smart and know when it can and can’t do
  certain optimizations.
For your compilation, do not use callee saved registers – in other
  words, only use the following registers:

  %rax: for the return value and/or intermediate storage
  %rsp: the stack pointer
  %rdi, %rsi, and %rdx: the registers used for passing in the
    three arguments
  %rcx, %r8, and %r9: the three unused argument-passing
    registers
  %r10, and %r11: the other two caller-saved registers

You can reuse registers if it’s safe to do so, and you can push things
  onto the stack if you run out of registers at any point.
Your code should be able to handle arbitrary RPN expressions that
  follow our constraints on variables.
Testing your code

You don’t want to submit your code without testing it first, of
  course!  In this section we’ll cover the various ways in which you can
  make sure your code is working correctly.
Your code should compile

This one is obvious.  Make sure your C code doesn’t have any syntax
  errors, etc., and make sure that your Makefile is in order.  You
  should be able to go to the directory where your code is, type make,
  and have an executable file called compiler be created in the
  directory.  Test this on skipper.cs.utexas.edu as well – that
  machine has an environment similar to what we’ll use for grading.
Your program should produce valid assembly code

Make sure that when you run the compiler executable that you
  created, with a valid RPN expression passed in as a series of command
  line arguments, you see valid assembly code printed to the screen.
Try some of the example invocations mentioned earlier in this README,
  and make up some more of your own.
Beyond just eyeballing it, make sure that the output is valid assembly
  code by using the special shell directive ”>” (as mentioned earlier)
  to make your program direct its output into a .s file such as
  test.s, and then run the assembler on it to see if there are any
  syntax errors or other problems:
$ as --64 test.s

(As mentioned before, the --64 is to ensure that the assembly code
  is treated as x86-64 assembly code.)
The assembly code should provide the right function

Your program is required to produce assembly code for a function
  called compute which takes three 64-bit signed integer arguments and
  returns a 64-bit signed integer argument.  In C, that means the
  function would have looked something like this:
#include <stdint.h> // this needs to be at the top of the file

int64_t compute (int64_t x, int64_t y, int64_t z) {
  // do stuff
  return something;
}
(In practice you’d often write long long rather than int64_t, and
  then you wouldn’t need the #include <stdint.h>, but unlike long
  long, the int64_t type is guaranteed to be exactly 64 bits long,
  no matter what compiler or architecture you use.)
But since you’re writing the function in assembly, if you wanted to
  use it in some C code, you would want to sort of “import” it.  This is
  how you do it in C: first, somewhere in your C code, you make an
  extern declaration of the function, like so:
#include <stdint.h> // this needs to be at the top of the file

extern int64_t compute (int64_t x, int64_t y, int64_t z);
Notice that no function body is provided, only the argument types and
  return type and the name of the function, with the word extern (for
  “externally defined”).  Then, when you compile your C code, just
  specify the name of the assembly file which actually contains the
  function, and gcc will hook it all up nicely.  For example,
$ gcc -o full_program some_program.c compute_function.s

So, to test that your assembly code provides the correct function,
  write some random little C program which uses the compute function
  you defined (maybe it calls compute on some arguments and prints the
  result? or whatever), making sure to put in the extern definition
  shown above.
Then compile the program, while telling gcc the name of the file where
  you saved the assembly code output from your compiler.  If there
  were no errors, gcc was able to read your assembly code as a
  definition of a compute function, as desired.  (If you follow the
  format shown in the assembly code examples above, you shouldn’t have
  any problems.)
By the way, if you’re interested in how this linking process works,
  check out Chapter 7 in the textbook, which we may get to later in the
  semester.
The generated functions should work correctly

Try compiling a few RPN expressions of your own choosing.  For each
  one, your program will produce some assembly code.  Create a C program
  that will link with your assembly code (as described in the previous
  section) and test it by calling it with some particular values of x,
  y, and z and printing the result.  Compile the test program
  together with the assembly code and run the resulting executable to
  see if it prints the number you expected.
Make sure to try various different kinds of RPN expressions, to make
  sure you did everything right.  Try ones with many operators, ones
  with many numbers in a row, ones with negative numbers, ones with
  really large numbers, etc.
If you’re feeling really ambitious, you might try to write a
  randomizing test harness that automatically tests your compiler on
  many different randomly generated RPN expressions, and substitutes in
  many different randomly generated values of x, y, and z for each
  RPN expression!
Your submission should be in the correct format

Once you’ve finished programming, it’s time to submit your work.  Add
  a README file explaining the optimization tricks you used to make your
  assembly code faster, if any, and then use the tar utility to create
  an archive (“tarball”) of the directory in which you’ve created your
  code.
Ideally the tarball should only contain source code, so you should get
  rid of any compiled files, generated assembly files, and other such
  cruft before making the tarball.  But really, as long as we can type
  make after extracting your tarball and get a working compiled
  version of your code, that’s good enough.  (Make sure to test that
  your tarball works on skipper.cs.utexas.edu, as mentioned earlier.)
Grading

Your assignment is to write a compiler program as described above.
The basic grading policy for this lab is that you’ll get 90 points if
  you create a program that correctly does what it is supposed to do –
  that is, accepts RPN expressions on the command line and dumps
  assembly code to stdout which defines a function called compute
  that takes arguments x, y, and z and returns the value of the
  RPN expression when those values are substituted in for the variables
  x, y, and z in the expression.
Even if you produce straightforward but inefficient assembly code such
  as that shown in the first example above, you will still earn the 90
  points if the code behaves correctly.
The remaining 10 points will be assigned if you optimize the assembly
  code you generate in some significant way.  Two such ways are as
  follows.

  Avoid using the stack when you can help it, and instead use the
    various registers you have at your disposal, which are much faster
    than accessing memory.
  Use constant folding to reduce the number of actual computations
    that would take place at runtime.
    For example, if your compiler was given an RPN expression that was
      constant (i.e. didn’t involve any variables), then your compiler
      could just figure out the integer value of the expression and
      create an extremely fast, two-instruction function that just wrote
      the correct answer into %rax and returned, without doing any
      work!
    Even for RPN expressions that do involve variables, maybe you could
      do partial constant folding – for example, simplify x 1 2 3
      4 + + + + to x 10 +.
    If you are feeling really ambitious, there are even more
      sophisticated forms of partial constant folding that you might try
      to explore, such as exploiting commutativity, factoring
      polynomial-like expressions, etc.
  

With your submission, you should include a README file explaining what
  optimization techniques your compiler uses.
Beyond the 90 + 10 breakdown given above, determination of further
  partial credit will be left up to the discretion of the grader.