jweese/BF.lhs

## BF.lhs
> module BF () where

This is a brainfuck interpreter.

Brainfuck is "an esoteric programming language noted for its extreme
minimalism" (Wikipedia). It is modeled as a very simple machine, with the
following parts:

* an array of at least 30,000 byte cells;
* a data pointer;
* a program listing;
* an instruction pointer;
* two byte streams for input and output.

We will look at each part of the machine in turn.

The obvious way to implement the data array is to use, well, an array. Since
the data cells are meant to change, we'll use a mutable array. We don't always
want to do this (Haskell discourages mutable datatypes), but since we need a
combination of random access and quick updates, it makes more sense to use a
mutable type instead of performing incremental updates that make us rebuild
the entire array each time it is changed.

There are two obvious choices for the monad in which the mutable array can be
encapsulated: ST and IO. ST makes sense for pure computations that use state
internally (where you'll always get the same result with the same input). This
is appealing, since it would let us restrict the mutability to only updating
array values. But unfortunately, we will eventually need to put things in the
unrestricted IO monad anyway, in order to do character input and output later.

So with that all said, we'll model the data array as a mutable array in the IO
monad:

> import System.Environment (getArgs)
> import System.Exit (exitFailure)
> import Data.Char (ord,chr)
> import Data.Word (Word8,Word16)
> import Data.Array.IArray (Array,listArray,(!),bounds,assocs)
> import Data.Array.IO (IOUArray,newArray,readArray,writeArray)
> import Data.Maybe (catMaybes)

> type BFDataArray = IOUArray Word16 Word8

The indices of the array are going to be Word16, so they will range between
0--65535. This will give us more than enough space. And, since we'll use the
minBound and maxBound of Word16 as the array bounds, it will be impossible to
access the array out of bounds. Note that a new BFDataArray has each cell
initialized to 0 :: Word8.

> newBFDataArray :: IO BFDataArray
> newBFDataArray = newArray (minBound, maxBound) 0

The data pointer is just an index into the data array:

> type DataPointer = Word16

Now let's talk about the program listing. It can be immutable, but we still
need random access (via the instruction pointer), so we'll use an immutable
array to hold the program.

The indices of the array will be simple Ints, and each element of the array
will be an instruction, an instance of the data type BFInstruction.

> type BFProgramListing = Array Int BFInstruction

Here is a list of the possible instructions in a program listing:

> data BFInstruction = IncrementDataPointer

1) Increment data pointer: this moves the data pointer to point at the next
cell in the data array.

>                    | DecrementDataPointer

2) Decrement data pointer: this moves the data pointer to point at the previous
cell in the data array.

>                    | IncrementData

3) Increment data: add 1 to the value at the pointed-to data cell.

>                    | DecrementData

4) Decrement data: subtract 1 from the value at the pointed-to data cell.

>                    | WriteByte

5) Write byte: write the value in the pointed-to cell to the output stream
(interpreted as an ASCII byte).

>                    | ReadByte

6) Read byte: read an ASCII byte from the input stream, and store it in the
pointed-to data cell.

>                    | JumpForward

7) Jump forward: if the value in the pointed-to cell is 0, jump forward to the
corresponding "jump backward" instruction. Otherwise, do nothing.

>                    | JumpBackward
>                    deriving (Eq,Show)

8) Jump backward: if the value in the pointed-to cell is 0, jump backward to the
corresponding "jump forward" instruction. Otherwise, do nothing.

To convert an arbitrary string into a program listing, we map characters to
instructions where possible, and throw away all other characters.

> stringToBFProgramListing :: String -> BFProgramListing
> stringToBFProgramListing s = listArray (1, length is) is
>    where is = catMaybes $ map charToInstruction s

> charToInstruction :: Char -> Maybe BFInstruction
> charToInstruction '>' = Just IncrementDataPointer
> charToInstruction '<' = Just DecrementDataPointer
> charToInstruction '+' = Just IncrementData
> charToInstruction '-' = Just DecrementData
> charToInstruction '.' = Just WriteByte
> charToInstruction ',' = Just ReadByte
> charToInstruction '[' = Just JumpForward
> charToInstruction ']' = Just JumpBackward
> charToInstruction _   = Nothing

The program counter is just an index into the program listing (in this case,
an Int).

> type ProgramCounter = Int

Now that we have all the pieces of the BF-interpretation machine (assuming that
the byte streams are provided for us), we can wrap it all up in one data type:

> data BFMachine = BFMachine {
>    program :: BFProgramListing ,
>    pc :: ProgramCounter ,
>    dataArray :: BFDataArray ,
>    dataPointer :: DataPointer }

And the initial state of a machine:

> newBFMachine :: String -> IO BFMachine
> newBFMachine s =
>    let prog = stringToBFProgramListing s
>    in do arr <- newBFDataArray
>          return BFMachine {
>                   program = prog ,
>                   pc = fst (bounds prog) ,
>                   dataArray = arr ,
>                   dataPointer = minBound
>                 }

And now we come to the heart of the matter: executing a program and changing
the machine state. The first thing we need to do is read the current instruc-
tion. This seems like you should just access the array, but we want to check
the array bounds: if we go out of bounds, we stop execution.

> currentInstruction :: BFMachine -> Maybe BFInstruction
> currentInstruction m =
>    let (a,b) = bounds (program m)
>        i = pc m
>    in if i < a || i > b then Nothing else Just (program m ! i)

The central function of this implementation will take a machine state and
an instruction and change the state. We'll start with the type signature:

> executeInstruction :: BFInstruction -> BFMachine -> IO BFMachine

And assuming we have that function implemented, we can run an entire program
with the following looping function: we execute an instruction, then increment
the program counter. Note that we break out of the loop once currentInstruction
doesn't return a value.

> execute :: BFMachine -> IO ()
> execute m = case currentInstruction m of
>    Just i -> do m' <- executeInstruction i m
>                 execute m' { pc = pc m' + 1 }
>    _ -> return ()

And, as long as we're putting off implementing executeInstruction, let's write
the main function of the program: p is the text of the brainfuck program, read
from a file. We create a new machine with the program p, and then execute it.

> main :: IO ()
> main = getArgs >>= \argv ->
>    case argv of
>         (f:_) -> do p <- readFile f
>                     m <- newBFMachine p
>                     execute m
>         _ -> putStrLn "need program file!" >> exitFailure

Now, the implementation of executeInstruction: we start with some convenience
functions for accessing the data array:

> readData :: BFMachine -> IO Word8
> readData m = readArray (dataArray m) (dataPointer m)

> writeData :: BFMachine -> Word8 -> IO ()
> writeData m v = writeArray (dataArray m) (dataPointer m) v

> modifyData :: BFMachine -> (Word8 -> Word8) -> IO ()
> modifyData m f = readData m >>= \v -> writeData m (f v)

We also need some functions for figuring out where to jump to when we see one
of the jump instructions:

> nextBackwardJumpIndex :: BFMachine -> Maybe ProgramCounter
> nextBackwardJumpIndex m =
>    let prog = assocs $ program m
>    in lookup JumpBackward [(i,c) | (c,i) <- drop (pc m) prog]

> previousForwardJumpIndex :: BFMachine -> Maybe ProgramCounter
> previousForwardJumpIndex m =
>    let prog = assocs $ program m
>    in lookup JumpForward [(i,c) | (c,i) <- reverse $ take (pc m) prog]

And that's the end of the helper functions. Now all we have to do is write
functions for executing each type of instruction.

Modifying the data pointer is easy enough:

> executeInstruction IncrementDataPointer m =
>    return m { dataPointer = dataPointer m + 1 }
> executeInstruction DecrementDataPointer m =
>    return m { dataPointer = dataPointer m - 1 }

And, thanks to our data-access functions, modifying the data is simple.

> executeInstruction IncrementData m = modifyData m (+1) >> return m
> executeInstruction DecrementData m = modifyData m (\v -> v - 1) >> return m

Here's why we're in IO: to read and write bytes. The only complication is
converting a Word8 to and from a Char that we can write.

> executeInstruction WriteByte m =
>    readData m >>= \v -> putChar (chr (fromIntegral v)) >> return m
> executeInstruction ReadByte m =
>    getChar >>= \c -> writeData m (fromIntegral (ord c)) >> return m

For the jumps, the first step is to read the data. For a potential jump forward,
we only jump if the current value is equal to 0. We use our helper function to
look up the index to jump to.

The one questionable design decision here is to raise an error when looking up
the jump index fails. It's probably not a good idea in general, but here there's
really no way to recover from a malformed program.

> executeInstruction JumpForward m = readData m >>= \v ->
>    if v == 0 then case nextBackwardJumpIndex m of
>                       Just v -> return m { pc = v }
>                       _ -> error ("unmatched [ at index " ++ show (pc m))
>              else return m

And executing a jump backward is almost the same:

> executeInstruction JumpBackward m = readData m >>= \v ->
>    if v /= 0 then case previousForwardJumpIndex m of
>                       Just v -> return m { pc = v }
>                       _ -> error ("unmatched ] at index " ++ show (pc m))
>              else return m
	> module BF () where

	This is a brainfuck interpreter.

	Brainfuck is "an esoteric programming language noted for its extreme
	minimalism" (Wikipedia). It is modeled as a very simple machine, with the
	following parts:

	* an array of at least 30,000 byte cells;
	* a data pointer;
	* a program listing;
	* an instruction pointer;
	* two byte streams for input and output.

	We will look at each part of the machine in turn.

	The obvious way to implement the data array is to use, well, an array. Since
	the data cells are meant to change, we'll use a mutable array. We don't always
	want to do this (Haskell discourages mutable datatypes), but since we need a
	combination of random access and quick updates, it makes more sense to use a
	mutable type instead of performing incremental updates that make us rebuild
	the entire array each time it is changed.

	There are two obvious choices for the monad in which the mutable array can be
	encapsulated: ST and IO. ST makes sense for pure computations that use state
	internally (where you'll always get the same result with the same input). This
	is appealing, since it would let us restrict the mutability to only updating
	array values. But unfortunately, we will eventually need to put things in the
	unrestricted IO monad anyway, in order to do character input and output later.

	So with that all said, we'll model the data array as a mutable array in the IO
	monad:

	> import System.Environment (getArgs)
	> import System.Exit (exitFailure)
	> import Data.Char (ord,chr)
	> import Data.Word (Word8,Word16)
	> import Data.Array.IArray (Array,listArray,(!),bounds,assocs)
	> import Data.Array.IO (IOUArray,newArray,readArray,writeArray)
	> import Data.Maybe (catMaybes)

	> type BFDataArray = IOUArray Word16 Word8

	The indices of the array are going to be Word16, so they will range between
	0--65535. This will give us more than enough space. And, since we'll use the
	minBound and maxBound of Word16 as the array bounds, it will be impossible to
	access the array out of bounds. Note that a new BFDataArray has each cell
	initialized to 0 :: Word8.

	> newBFDataArray :: IO BFDataArray
	> newBFDataArray = newArray (minBound, maxBound) 0

	The data pointer is just an index into the data array:

	> type DataPointer = Word16

	Now let's talk about the program listing. It can be immutable, but we still
	need random access (via the instruction pointer), so we'll use an immutable
	array to hold the program.

	The indices of the array will be simple Ints, and each element of the array
	will be an instruction, an instance of the data type BFInstruction.

	> type BFProgramListing = Array Int BFInstruction

	Here is a list of the possible instructions in a program listing:

	> data BFInstruction = IncrementDataPointer

	1) Increment data pointer: this moves the data pointer to point at the next
	cell in the data array.

	> \| DecrementDataPointer

	2) Decrement data pointer: this moves the data pointer to point at the previous
	cell in the data array.

	> \| IncrementData

	3) Increment data: add 1 to the value at the pointed-to data cell.

	> \| DecrementData

	4) Decrement data: subtract 1 from the value at the pointed-to data cell.

	> \| WriteByte

	5) Write byte: write the value in the pointed-to cell to the output stream
	(interpreted as an ASCII byte).

	> \| ReadByte

	6) Read byte: read an ASCII byte from the input stream, and store it in the
	pointed-to data cell.

	> \| JumpForward

	7) Jump forward: if the value in the pointed-to cell is 0, jump forward to the
	corresponding "jump backward" instruction. Otherwise, do nothing.

	> \| JumpBackward
	> deriving (Eq,Show)

	8) Jump backward: if the value in the pointed-to cell is 0, jump backward to the
	corresponding "jump forward" instruction. Otherwise, do nothing.

	To convert an arbitrary string into a program listing, we map characters to
	instructions where possible, and throw away all other characters.

	> stringToBFProgramListing :: String -> BFProgramListing
	> stringToBFProgramListing s = listArray (1, length is) is
	> where is = catMaybes $ map charToInstruction s

	> charToInstruction :: Char -> Maybe BFInstruction
	> charToInstruction '>' = Just IncrementDataPointer
	> charToInstruction '<' = Just DecrementDataPointer
	> charToInstruction '+' = Just IncrementData
	> charToInstruction '-' = Just DecrementData
	> charToInstruction '.' = Just WriteByte
	> charToInstruction ',' = Just ReadByte
	> charToInstruction '[' = Just JumpForward
	> charToInstruction ']' = Just JumpBackward
	> charToInstruction _ = Nothing

	The program counter is just an index into the program listing (in this case,
	an Int).

	> type ProgramCounter = Int

	Now that we have all the pieces of the BF-interpretation machine (assuming that
	the byte streams are provided for us), we can wrap it all up in one data type:

	> data BFMachine = BFMachine {
	> program :: BFProgramListing ,
	> pc :: ProgramCounter ,
	> dataArray :: BFDataArray ,
	> dataPointer :: DataPointer }

	And the initial state of a machine:

	> newBFMachine :: String -> IO BFMachine
	> newBFMachine s =
	> let prog = stringToBFProgramListing s
	> in do arr <- newBFDataArray
	> return BFMachine {
	> program = prog ,
	> pc = fst (bounds prog) ,
	> dataArray = arr ,
	> dataPointer = minBound
	> }

	And now we come to the heart of the matter: executing a program and changing
	the machine state. The first thing we need to do is read the current instruc-
	tion. This seems like you should just access the array, but we want to check
	the array bounds: if we go out of bounds, we stop execution.

	> currentInstruction :: BFMachine -> Maybe BFInstruction
	> currentInstruction m =
	> let (a,b) = bounds (program m)
	> i = pc m
	> in if i < a \|\| i > b then Nothing else Just (program m ! i)

	The central function of this implementation will take a machine state and
	an instruction and change the state. We'll start with the type signature:

	> executeInstruction :: BFInstruction -> BFMachine -> IO BFMachine

	And assuming we have that function implemented, we can run an entire program
	with the following looping function: we execute an instruction, then increment
	the program counter. Note that we break out of the loop once currentInstruction
	doesn't return a value.

	> execute :: BFMachine -> IO ()
	> execute m = case currentInstruction m of
	> Just i -> do m' <- executeInstruction i m
	> execute m' { pc = pc m' + 1 }
	> _ -> return ()

	And, as long as we're putting off implementing executeInstruction, let's write
	the main function of the program: p is the text of the brainfuck program, read
	from a file. We create a new machine with the program p, and then execute it.

	> main :: IO ()
	> main = getArgs >>= \argv ->
	> case argv of
	> (f:_) -> do p <- readFile f
	> m <- newBFMachine p
	> execute m
	> _ -> putStrLn "need program file!" >> exitFailure

	Now, the implementation of executeInstruction: we start with some convenience
	functions for accessing the data array:

	> readData :: BFMachine -> IO Word8
	> readData m = readArray (dataArray m) (dataPointer m)

	> writeData :: BFMachine -> Word8 -> IO ()
	> writeData m v = writeArray (dataArray m) (dataPointer m) v

	> modifyData :: BFMachine -> (Word8 -> Word8) -> IO ()
	> modifyData m f = readData m >>= \v -> writeData m (f v)

	We also need some functions for figuring out where to jump to when we see one
	of the jump instructions:

	> nextBackwardJumpIndex :: BFMachine -> Maybe ProgramCounter
	> nextBackwardJumpIndex m =
	> let prog = assocs $ program m
	> in lookup JumpBackward [(i,c) \| (c,i) <- drop (pc m) prog]

	> previousForwardJumpIndex :: BFMachine -> Maybe ProgramCounter
	> previousForwardJumpIndex m =
	> let prog = assocs $ program m
	> in lookup JumpForward [(i,c) \| (c,i) <- reverse $ take (pc m) prog]

	And that's the end of the helper functions. Now all we have to do is write
	functions for executing each type of instruction.

	Modifying the data pointer is easy enough:

	> executeInstruction IncrementDataPointer m =
	> return m { dataPointer = dataPointer m + 1 }
	> executeInstruction DecrementDataPointer m =
	> return m { dataPointer = dataPointer m - 1 }

	And, thanks to our data-access functions, modifying the data is simple.

	> executeInstruction IncrementData m = modifyData m (+1) >> return m
	> executeInstruction DecrementData m = modifyData m (\v -> v - 1) >> return m

	Here's why we're in IO: to read and write bytes. The only complication is
	converting a Word8 to and from a Char that we can write.

	> executeInstruction WriteByte m =
	> readData m >>= \v -> putChar (chr (fromIntegral v)) >> return m
	> executeInstruction ReadByte m =
	> getChar >>= \c -> writeData m (fromIntegral (ord c)) >> return m

	For the jumps, the first step is to read the data. For a potential jump forward,
	we only jump if the current value is equal to 0. We use our helper function to
	look up the index to jump to.

	The one questionable design decision here is to raise an error when looking up
	the jump index fails. It's probably not a good idea in general, but here there's
	really no way to recover from a malformed program.

	> executeInstruction JumpForward m = readData m >>= \v ->
	> if v == 0 then case nextBackwardJumpIndex m of
	> Just v -> return m { pc = v }
	> _ -> error ("unmatched [ at index " ++ show (pc m))
	> else return m

	And executing a jump backward is almost the same:

	> executeInstruction JumpBackward m = readData m >>= \v ->
	> if v /= 0 then case previousForwardJumpIndex m of
	> Just v -> return m { pc = v }
	> _ -> error ("unmatched ] at index " ++ show (pc m))
	> else return m