Skip to content

Instantly share code, notes, and snippets.

@jda0
Created February 26, 2018 11:09
Show Gist options
  • Save jda0/3256718aedd2cc1955c565986a37bb2c to your computer and use it in GitHub Desktop.
Save jda0/3256718aedd2cc1955c565986a37bb2c to your computer and use it in GitHub Desktop.
Bristol LOLCODE
===============
Author: James Daly (Dec 2016)
This document outlines a parser for a dialect of [LOLCODE][1], an
esoteric imperative language devised in 2007 by [Justin J. Mesa][2]
based on the ["lolspeak" or "chanspeak" language][3] from popular ["lolcats"][4]
and ["Can I Haz Cheezburger?"][5] memes and weblogs.
The grammar in the present document is based on Version 1.2 of the formal
specification available [on GitHub][6]
However, line terminals can be any combination of `\n` and `\r`,
and inline comments must be preceded with a line seperator (`\n`, `\r` or `,`).
In addition: multiline comments are not supported; variables must be
exclusively formed from the uppercase English alphabet and NUMBRs (integers) are
conflated with NUMBARs (floats). String escapes are also unimplemented,
as are arrays and types (although these are also unimplemented in the version
1.2 specification), and carriage returns are not suppressed from VISIBLE input.
There is also no NOOB, the absence of a type is handled using Haskell Maybes.
Grammar
-------
The grammar of a dialect of Lolcode that shall henceforth be called Bristol
Lolcode:
prog: HAI numbar lsep line* KTHXBYE cr
line: cmnt cr | stmt lsep
cmnt: BTW char*
stmt: I HAZ A var (ITZ expr)?
| math
| lgic
| strop
| cast
| varn R expr
| VISIBLE expr
| expr lsep O RLY? cr
YA RLY lsep line*
(MEBBE expr lsep line*)*
(NO WAI lsep line*)?
OIC
| WTF? cr
(OMG literal cr line* GTFO? cr)+
(OMGWTF cr line*)?
OIC
| HOW IZ I varn (YR varn (AN varn)*)?
(line
| FOUND YR expr lsep
| GTFO lsep
)*
IF U SAY SO
| I IZ varn (YR varn (AN YR varn)*)?
expr: rnum | ryarn | rtroof | rtype | varn
rnum: math | numbar
ryarn: strop | yarn
rtroof: lgic | troof
rtype: cast | type
math: SUM OF rnum AN rnum
| DIFF OF rnum AN rnum
| PRODUKT OF rnum AN rnum
| QUOSHUNT OF rnum AN rnum
| MOD OF rnum AN rnum
| BIGGR OF rnum AN rnum
| SMALLR OF rnum AN rnum
lgic: BOTH OF rtroof AN rtroof
| EITHER OF rtroof AN rtroof
| WON OF rtroof AN rtroof
| NOT rtroof
| ALL OF rtroof (AN rtroof)* MKAY
| ANY OF rtroof (AN rtroof)* MKAY
| BOTH SAEM expr AN expr
| DIFFRINT expr AN expr
strop: SMOOSH ryarn (AN ryarn)* MKAY
cast: MAEK varn A rtype
| varn IS NOW A rtype
varn: alpha alpha*
alpha: 'A' | 'B' | ... | 'Z'
numbar: '-'? digit digit* ('.' digit digit*)?
digit: '0' | '1' | ... | '9'
literal: yarn | numbar | troof
yarn: '"' achar* '"'
troof: WIN | FAIL
type: YARN | NUMBAR | TROOF
lsep: cr | ','
achar: any unicode character except cr
cr: ('\n' | '\r')+
The grammar above will be used to produce a parser using Megaparsec,
which is a maintained version of Parsec, a parser combinator library.
Parser
------
A few preliminaries that import modules and language features before
the full parser is defined.
> {-# LANGUAGE StandaloneDeriving #-}
> module Lolcode where
> import Text.Megaparsec
> import Text.Megaparsec.String
> import Data.List (intercalate)
> tok :: String -> Parser String
> tok t = string t <* whitespace
> whitespace :: Parser ()
> whitespace = many (oneOf " \t") *> pure ()
> cr :: Parser [Char]
> cr = many (oneOf "\r\n")
Now each of the production rules in the grammar will be considered
and translated into a corresponding datatype and parser.
The language is defined using deep embedding.
prog: HAI numbar line* KTHXBYE
> data Prog = Prog numbar [Line]
> prog :: Parser Prog
> prog = Prog <$ tok "HAI" <*> numbar <*> lsep <*> many line <* tok "KTHXBYE"
> <* many cr
line: cmnt cr | stmt lsep
> data Line = Cmnt cmnt | Stmt stmt
> line :: Parser Line
> line = (Cmnt <$> cmnt <* cr)
> <|> (Stmt <$> stmt <* cr)
cmnt: BTW char*
> data Cmnt = BTW
> cmnt :: Parser Cmnt
> cmnt = BTW <$ tok "BTW" <* some (noneOf ("\n\r"))
stmt: I HAZ A var (ITZ expr)?
| math
| lgic
| strop
| cast
| varn R expr
| VISIBLE expr
| expr lsep O RLY? lsep
YA RLY lsep line*
(MEBBE expr lsep line*)*
(NO WAI lsep line*)?
OIC
| WTF? lsep
(OMG literal lsep line* GTFO? lsep)+
(OMGWTF lsep line*)?
OIC
| HOW IZ I varn (YR varn (AN varn)*)? lsep
(line
| FOUND YR expr lsep
| GTFO lsep
)*
IF U SAY SO
| I IZ varn (YR varn (AN varn)*)?
> data Stmt = IHAZA Varn (Maybe Expr)
> | Math math
> | Strop strop
> | Cast cast
> | R Varn Expr
> | VISIBLE Expr
> | ORLY Expr YaRly [Mebbe] (Maybe NoWai)
> | WTF [Omg] (Maybe Omgwtf)
> | HOW Varn [Varn] [HowStmt]
> | IIZ Varn [Varn]
> stmt :: Parser Stmt
> stmt = (IHAZA <$ tok "I HAZ A" <*> varn <* optional (tok "ITZ" <*> expr))
> <|> (Math <$> math)
> <|> (Strop <$> strop)
> <|> (Cast <$> cast)
> <|> (R <$> varn <* tok "R" <*> expr)
> <|> (VISIBLE <$ tok "VISIBLE" <*> expr)
> <|> (ORLY <$> expr <* lsep <* tok "O RLY?" <* lsep
> <*> yarly <*> many mebbe <*> optional nowai <* tok "OIC")
> <|> (WTF <$ tok "WTF?" <* lsep <*> some omg <*> optional omgwtf
> <* tok "OIC")
> <|> (HOW <$ tok "HOW IZ I" <*> varn <*> (sepBy (tok "YR" *> varn) "AN")
> <* lsep <*> many howstmt <* tok "IF U SAY SO")
> <|> (IIZ <$ tok "I IZ" <*> varn <*> (sepBy (tok "YR" *> varn) "AN")
> <* tok "MKAY")
> type YaRly = [Line]
> mebbe :: Parser YaRly
> yarly = (YaRly <$ tok "YA RLY" <* lsep <*> many line)
> data Mebbe = Mebbe Expr [Line]
> mebbe :: Parser Mebbe
> mebbe = (Mebbe <$ tok "MEBBE " <*> expr <* lsep <*> many line)
> type NoWai = [Line]
> nowai :: Parser NoWai
> nowai = (NoWai <$ tok "NO WAI" <* lsep <*> many line)
> data Omg = Omg Literal [Line]
> omg :: Parser Omg
> omg = (Omg <$ tok "OMG" <*> literal <* lsep <*> many line
> <* optional (tok "GTFO?") <* lsep)
> type OmgWtf = [Line]
> omgwtf :: Parser OmgWtf
> omgwtf = (OmgWtf <$ lsep <*> many line)
> data HowStmt = Line line
> | FOUND Expr
> | GTFO
> howstmt :: Parser HowStmt
> howstmt = (FOUND <$ tok "FOUND YR" <*> expr)
> <|> (GTFO <$ tok "GTFO")
> <|> (Line <$> line)
expr: rnum | ryarn | rtroof | rtype | varn | IT | NOOB
rnum: math | numbar
ryarn: strop | yarn
rtroof: lgic | troof
rtype: cast | type
> data Expr = RNum rnum
> | RYarn ryarn
> | RTroof rtroof
> | RType rtype
> | Varn varn
> | NOOB
> data RNum = Math math
> | Numbar numbar
> data RYarn = Strop strop
> | Yarn yarn
> data RTroof = Lgic lgic
> | Troof troof
> data RType = Cast cast
> | Typ typ
math: SUM OF rnum AN rnum
| DIFF OF rnum AN rnum
| PRODUKT OF rnum AN rnum
| QUOSHUNT OF rnum AN rnum
| MOD OF rnum AN rnum
| BIGGR OF rnum AN rnum
| SMALLR OF rnum AN rnum
> data Math = SUM RNum RNum
> | DIFF RNum RNum
> | PRODUKT RNum RNum
> | QUOSHUNT RNum RNum
> | MOD RNum RNum
> | BIGGR RNum RNum
> | SMALLR RNum RNum
> math :: Parser Math
> math = (SUM <$ tok "SUM OF" <*> rnum <* tok "AN" <*> rnum)
> <|> (DIFF <$ tok "DIFF OF" <*> rnum <* tok "AN" <*> rnum)
> <|> (PRODUKT <$ tok "PRODUKT OF" <*> rnum <* tok "AN" <*> rnum)
> <|> (QUOSHUNT <$ tok "QUOSHUNT OF" <*> rnum <* tok "AN" <*> rnum)
> <|> (MOD <$ tok "MOD OF" <*> rnum <* tok "AN" <*> rnum)
> <|> (BIGGR <$ tok "BIGGR OF" <*> rnum <* tok "AN" <*> rnum)
> <|> (SMALLR <$ tok "SMALLR OF" <*> rnum <* tok "AN" <*> rnum)
lgic: BOTH OF rtroof AN rtroof
| EITHER OF rtroof AN troof
| WON OF rtroof AN rtroof
| NOT rtroof
| ALL OF rtroof (AN troof)* MKAY
| ANY OF rtroof (AN troof)* MKAY
| BOTH SAEM expr AN expr
| DIFFRINT expr AN expr
> data Lgic = BOTH RTroof RTroof
> | EITHER RTroof RTroof
> | WON RTroof RTroof
> | NOT RTroof
> | ALL [RTroof]
> | ANY [RTroof]
> | BOTH Expr Expr
> | DIFFRINT Expr Expr
> lgic :: Parser Lgic
> lgic = (BOTH <$ tok "BOTH OF" <*> rtroof <* tok "AN" <*> rtroof)
> <|> (EITHER <$ tok "EITHER OF" <*> rtroof <* tok "AN" <*> rtroof)
> <|> (WON <$ tok "WON OF" <*> rtroof <* tok "AN" <*> rtroof)
> <|> (NOT <$ tok "NOT" <*> rtroof)
> <|> (ALL <$ tok "ALL OF" <*> (sepBy1 rtroof "AN") <* tok "MKAY")
> <|> (ANY <$ tok "ANY OF" <*> (sepBy1 rtroof "AN") <* tok "MKAY")
> <|> (BOTH <$ tok "BOTH SAEM" <*> expr <* tok "AN" <*> expr)
> <|> (DIFFRINT <$ tok "DIFFRINT" <*> expr <* tok "AN" <*> expr)
strop: SMOOSH ryarn (AN ryarn)* MKAY
> data Strop = SMOOSH [RYarn]
> strop :: Parser Strop
> strop = (SMOOSH <$ tok "SMOOSH" <*> (sepBy1 ryarn "AN") <* tok "MKAY")
cast: MAEK varn A rtype
| varn IS NOW A type
> data Cast = MAEK Varn RType
> cast :: Parser Cast
> cast = (MAEK <$ tok "MAEK" <*> varn <* tok "A" <*> rtype)
> <|> (MAEK <$ varn <*> tok "IZ NOW A" <*> rtype)
varn: alpha alpha*
alpha: 'A' | 'B' | ... | 'Z'
> type Varn = String
> varn :: Parser Varn
> varn = (some (oneOf ['A' .. 'Z']) >>= return . read) <* whitespace
numbar: '-'? digit digit* ('.' digit digit*)?
digit: '0' | '1' | ... | '9'
> type Numbar = Float
> numbar :: Parser Float
> numbar = (((optional '-') <*> some (oneOf ['0' .. '9']) <*> '.'
> <*> some (oneOf ['0' .. '9'])) >>= return . read) <* whitespace
literal: yarn | numbar | troof
yarn: '"' achar* '"'
troof: WIN | FAIL
type: YARN | NUMBAR | TROOF
> data Literal = Yarn Yarn
> | Numbar Numbar
> | Troof Troof
> literal :: Parser literal
> literal = (Yarn <$> yarn)
> <|> (Numbar <$> numbar)
> <|> (Troof <$> troof)
> type Yarn = String
> yarn :: Parser String
> yarn = tok "\"" *> some (noneOf "\n\r") <* tok "\""
> type Troof = Bool
> troof :: Parser Bool
> troof = (True <$ tok "WIN")
> <|> (False <$ tok "FAIL")
> data Typ = TYARN | TNUMBAR | TTROOF
> typ :: Parser Type
> typ = (TYARN <$ tok "YARN")
> <|> (TNUMBAR <$ tok "NUMBAR")
> <|> (TTROOF <$ tok "TROOF")
lsep: cr | ','
> lsep :: Parser ()
> lsep = many (oneOf "\r\n,") *> pure ()
`achar` is just a `Char`, the definition of `yarn` where `achar` is exclusively
used handles `cr`s.
TODO: pretty printing
> parseFile :: FilePath -> IO ()
> parseFile filePath = do
> file <- readFile filePath
> putStrLn $ case parse prog filePath file of
> Left err -> parseErrorPretty err
> Right prog -> pretty prog
Pretty Printing
---------------
The instances below allow values to be inspected in the terminal.
The default instance that is derived shows all the constructor names.
> deriving instance Show Prog
> deriving instance Show Line
> deriving instance Show Cmnt
> deriving instance Show Stmt
> deriving instance Show YaRly
> deriving instance Show Mebbe
> deriving instance Show NoWai
> deriving instance Show Omg
> deriving instance Show OmgWtf
> deriving instance Show HowStmt
> deriving instance Show Expr
> deriving instance Show RNum
> deriving instance Show RYarn
> deriving instance Show RTroof
> deriving instance Show RType
> deriving instance Show Math
> deriving instance Show Lgic
> deriving instance Show Strop
> deriving instance Show Cast
> deriving instance Show Literal
> deriving instance Show Troof
> deriving instance Show Type
The pretty-printed output gives a version that should be acceptable
LOLCODE stripped of comments. Although using a deep embedding necessitates
a large memory usage for the creation of the tree and wrapping into `Pretty`
classes, this allows the language to be easily extended with extra features.
The pretty function flattens the tree into a shallow embedding: a single string
representing the entire program. A parallel function could be used for code
generation e.g. to generate intermediate code in an intermediate or low-level
language, or machine code. This demonstrates the power of deep embedding to
create versatility.
> class Pretty a where
> pretty :: a -> String
> instance Pretty Prog where
> pretty (Prog v ls) = "HAI " ++ pretty v ++ "\n" ++ (unlines . map pretty) ls
> ++ "KTHXBYE\n"
> instance Pretty Line where
> pretty (Stmt stmt) = pretty stmt
> pretty (Cmnt cmnt) = ""
> instance Pretty Stmt where
> pretty (IHAZA v Nothing) = "I HAZ A " ++ v
> pretty (IHAZA v (Just val)) = "I HAZ A " ++ v ++ pretty val
> pretty (Math math) = pretty math
> pretty (Strop strop) = pretty strop
> pretty (Cast cast) = pretty cast
> pretty (R v expr) = v ++ " R " ++ pretty expr
> pretty (VISIBLE expr) = "VISIBLE " ++ pretty expr
> pretty (ORLY expr y ms Nothing) = pretty expr ++ ", O RLY?\n" ++ pretty y
> ++ "\n" ++ (unlines . map pretty) ms
> ++ "\nOIC"
> pretty (ORLY expr y ms (Just n)) = pretty expr ++ ", O RLY?\n" ++ pretty y
> ++ "\n" ++ (unlines . map pretty) ms ++ "\n" ++ pretty n
> ++ "\nOIC"
> pretty (WTF omgs Nothing) = "WTF?\n" ++ (unlines . map pretty) omgs
> ++ "\nOIC"
> pretty (WTF omgs (Just wtf)) = "WTF?\n" ++ (unlines . map pretty) omgs
> ++ "\n" ++ pretty wtf ++ "\nOIC"
> pretty (HOW v [] hows) = "HOW IZ I " ++ v ++ "\n"
> ++ (unlines . map pretty) hows ++ "\n IF U SAY SO"
> pretty (HOW v vs hows) = "HOW IZ I " ++ v ++ " YR "
> ++ ((intercalate) " AN ") . map pretty) vs ++ "\n"
> ++ (unlines . map pretty) hows ++ "\n IF U SAY SO"
> pretty (IIZ v []) = "I IZ " ++ v
> pretty (IIZ v vs) = "I IZ " ++ v ++ " YR "
> ++ ((intercalate) " AN ") . map pretty) vs
> instance Pretty YaRly where
> pretty ls = "YA RLY\n" ++ (unlines . map pretty) ls
> instance Pretty Mebbe where
> pretty (Mebbe cond ls) = "MEBBE " ++ pretty cond ++ "\n" ++ (unlines . map pretty) ls
> instance Pretty NoWai where
> pretty ls = "NO WAI\n" ++ (unlines . map pretty) ls
> instance Pretty Omg where
> pretty (Omg c ls) = "OMG " ++ pretty c ++ "\n" ++ (unlines . map pretty) ls
> ++ "\nGTFO?"
> instance Pretty OmgWtf where
> pretty ls = "OMGWTF\n" ++ (unlines . map pretty) ls
> instance Pretty HowStmt where
> pretty (Line line) = pretty line
> pretty (FOUND expr) = "FOUND YR " ++ pretty expr
> pretty (GTFO) = "GTFO"
> instance Pretty Expr where
> pretty (RNum rnum) = pretty rnum
> pretty (RYarn ryarn) = pretty ryarn
> pretty (RTroof rtroof) = pretty rtroof
> pretty (RType rtype) = pretty rtype
> pretty (Varn varn) = show varn
> pretty (NOOB) = ""
> instance Pretty RNum where
> pretty (Math math) = pretty math
> pretty (Numbar numbar) = show numbar
> instance Pretty RYarn where
> pretty (Strop strop) = pretty strop
> pretty (Yarn yarn) = yarn
> instance Pretty RTroof where
> pretty (Lgic lgic) = pretty lgic
> pretty (Troof troof) = pretty troof
> instance Pretty RType where
> pretty (Cast cast) = pretty cast
> pretty (Type typ) = pretty typ
> instance Pretty Math where
> pretty (SUM a b) = "SUM OF " ++ pretty a ++ " AN " ++ pretty b
> pretty (DIFF a b) = "DIFF OF " ++ pretty a ++ " AN " ++ pretty b
> pretty (PRODUKT a b) = "PRODUKT OF " ++ pretty a ++ " AN " ++ pretty b
> pretty (QUOSHUNT a b) = "QUOSHUNT OF " ++ pretty a ++ " AN " ++ pretty b
> pretty (MOD a b) = "MOD OF " ++ pretty a ++ " AN " ++ pretty b
> pretty (BIGGR a b) = "BIGGR OF " ++ pretty a ++ " AN " ++ pretty b
> pretty (SMALLR a b) = "SMALLR OF " ++ pretty a ++ " AN " ++ pretty b
> instance Pretty Lgic where
> pretty (BOTH a b) = "BOTH OF " ++ pretty a ++ " AN " ++ pretty b
> pretty (EITHER a b) = "EITHER OF " ++ pretty a ++ " AN " ++ pretty b
> pretty (WON a b) = "WON OF " ++ pretty a ++ " AN " ++ pretty b
> pretty (NOT a) = "NOT " ++ pretty a
> pretty (ALL cs) = "ALL OF " ++ ((intercalate " AN ") . map pretty) cs ++ " MKAY"
> pretty (ANY cs) = "ANY OF " ++ ((intercalate " AN ") . map pretty) cs ++ " MKAY"
> pretty (BOTH a b) = "BOTH SAEM " ++ pretty a ++ " AN " ++ pretty b
> pretty (DIFFRINT a b) = "DIFFRINT " ++ pretty a ++ " AN " ++ pretty b
> instance Pretty Strop where
> pretty (SMOOSH ys) = "SMOOSH " ++ ((intercalate " AN ") . map pretty) cs ++ " MKAY"
> instance Pretty Cast where
> pretty (MAEK v t) = "MAEK " ++ show v ++ " A " ++ pretty t
> instance Pretty Literal where
> pretty (Yarn y) = y
> pretty (Numbar n) = show n
> pretty (Troof t) = pretty t
> instance Pretty Troof where
> pretty True = "WIN"
> pretty False = "FAIL"
> instance Pretty Type where
> pretty (TYARN) = "YARN"
> pretty (TNUMBAR) = "NUMBAR"
> pretty (TTROOF) = "TROOF"
Hopefully this document demonstrates the simplicity of creating a language using
parser combinators. By using a functional language we simplified our workflow
which meant that the creation of this parser took in the tens of man hours
instead of hundreds, the parser follows our language definition closely and
extending both the language and parser is easier, quicker and safer.
References
----------
[1]: http://lolcode.org
[2]: https://github.com/justinmesa
[3]: http://ozwords.org/?p=6054
[4]: http://knowyourmeme.com/memes/lolcats
[5]: https://www.cnet.com/uk/news/the-history-of-i-can-has-cheezburger/
[6]: https://github.com/justinmeza/lolcode-spec/blob/master/v1.2/lolcode-spec-v1.2.md
Based on the work of [Nicolas Wu](http://zenzike.com/) for University of Bristol
undergraduate course [COMS22201 Language Engineering](http://www.cs.bris.ac.uk/Teaching/unitglance.jsp?unit=COMS22201).
All haskell code in this document is currently untested due to technical
problems.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment