Created
February 26, 2018 11:09
-
-
Save jda0/3256718aedd2cc1955c565986a37bb2c to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Bristol LOLCODE | |
=============== | |
Author: James Daly (Dec 2016) | |
This document outlines a parser for a dialect of [LOLCODE][1], an | |
esoteric imperative language devised in 2007 by [Justin J. Mesa][2] | |
based on the ["lolspeak" or "chanspeak" language][3] from popular ["lolcats"][4] | |
and ["Can I Haz Cheezburger?"][5] memes and weblogs. | |
The grammar in the present document is based on Version 1.2 of the formal | |
specification available [on GitHub][6] | |
However, line terminals can be any combination of `\n` and `\r`, | |
and inline comments must be preceded with a line seperator (`\n`, `\r` or `,`). | |
In addition: multiline comments are not supported; variables must be | |
exclusively formed from the uppercase English alphabet and NUMBRs (integers) are | |
conflated with NUMBARs (floats). String escapes are also unimplemented, | |
as are arrays and types (although these are also unimplemented in the version | |
1.2 specification), and carriage returns are not suppressed from VISIBLE input. | |
There is also no NOOB, the absence of a type is handled using Haskell Maybes. | |
Grammar | |
------- | |
The grammar of a dialect of Lolcode that shall henceforth be called Bristol | |
Lolcode: | |
prog: HAI numbar lsep line* KTHXBYE cr | |
line: cmnt cr | stmt lsep | |
cmnt: BTW char* | |
stmt: I HAZ A var (ITZ expr)? | |
| math | |
| lgic | |
| strop | |
| cast | |
| varn R expr | |
| VISIBLE expr | |
| expr lsep O RLY? cr | |
YA RLY lsep line* | |
(MEBBE expr lsep line*)* | |
(NO WAI lsep line*)? | |
OIC | |
| WTF? cr | |
(OMG literal cr line* GTFO? cr)+ | |
(OMGWTF cr line*)? | |
OIC | |
| HOW IZ I varn (YR varn (AN varn)*)? | |
(line | |
| FOUND YR expr lsep | |
| GTFO lsep | |
)* | |
IF U SAY SO | |
| I IZ varn (YR varn (AN YR varn)*)? | |
expr: rnum | ryarn | rtroof | rtype | varn | |
rnum: math | numbar | |
ryarn: strop | yarn | |
rtroof: lgic | troof | |
rtype: cast | type | |
math: SUM OF rnum AN rnum | |
| DIFF OF rnum AN rnum | |
| PRODUKT OF rnum AN rnum | |
| QUOSHUNT OF rnum AN rnum | |
| MOD OF rnum AN rnum | |
| BIGGR OF rnum AN rnum | |
| SMALLR OF rnum AN rnum | |
lgic: BOTH OF rtroof AN rtroof | |
| EITHER OF rtroof AN rtroof | |
| WON OF rtroof AN rtroof | |
| NOT rtroof | |
| ALL OF rtroof (AN rtroof)* MKAY | |
| ANY OF rtroof (AN rtroof)* MKAY | |
| BOTH SAEM expr AN expr | |
| DIFFRINT expr AN expr | |
strop: SMOOSH ryarn (AN ryarn)* MKAY | |
cast: MAEK varn A rtype | |
| varn IS NOW A rtype | |
varn: alpha alpha* | |
alpha: 'A' | 'B' | ... | 'Z' | |
numbar: '-'? digit digit* ('.' digit digit*)? | |
digit: '0' | '1' | ... | '9' | |
literal: yarn | numbar | troof | |
yarn: '"' achar* '"' | |
troof: WIN | FAIL | |
type: YARN | NUMBAR | TROOF | |
lsep: cr | ',' | |
achar: any unicode character except cr | |
cr: ('\n' | '\r')+ | |
The grammar above will be used to produce a parser using Megaparsec, | |
which is a maintained version of Parsec, a parser combinator library. | |
Parser | |
------ | |
A few preliminaries that import modules and language features before | |
the full parser is defined. | |
> {-# LANGUAGE StandaloneDeriving #-} | |
> module Lolcode where | |
> import Text.Megaparsec | |
> import Text.Megaparsec.String | |
> import Data.List (intercalate) | |
> tok :: String -> Parser String | |
> tok t = string t <* whitespace | |
> whitespace :: Parser () | |
> whitespace = many (oneOf " \t") *> pure () | |
> cr :: Parser [Char] | |
> cr = many (oneOf "\r\n") | |
Now each of the production rules in the grammar will be considered | |
and translated into a corresponding datatype and parser. | |
The language is defined using deep embedding. | |
prog: HAI numbar line* KTHXBYE | |
> data Prog = Prog numbar [Line] | |
> prog :: Parser Prog | |
> prog = Prog <$ tok "HAI" <*> numbar <*> lsep <*> many line <* tok "KTHXBYE" | |
> <* many cr | |
line: cmnt cr | stmt lsep | |
> data Line = Cmnt cmnt | Stmt stmt | |
> line :: Parser Line | |
> line = (Cmnt <$> cmnt <* cr) | |
> <|> (Stmt <$> stmt <* cr) | |
cmnt: BTW char* | |
> data Cmnt = BTW | |
> cmnt :: Parser Cmnt | |
> cmnt = BTW <$ tok "BTW" <* some (noneOf ("\n\r")) | |
stmt: I HAZ A var (ITZ expr)? | |
| math | |
| lgic | |
| strop | |
| cast | |
| varn R expr | |
| VISIBLE expr | |
| expr lsep O RLY? lsep | |
YA RLY lsep line* | |
(MEBBE expr lsep line*)* | |
(NO WAI lsep line*)? | |
OIC | |
| WTF? lsep | |
(OMG literal lsep line* GTFO? lsep)+ | |
(OMGWTF lsep line*)? | |
OIC | |
| HOW IZ I varn (YR varn (AN varn)*)? lsep | |
(line | |
| FOUND YR expr lsep | |
| GTFO lsep | |
)* | |
IF U SAY SO | |
| I IZ varn (YR varn (AN varn)*)? | |
> data Stmt = IHAZA Varn (Maybe Expr) | |
> | Math math | |
> | Strop strop | |
> | Cast cast | |
> | R Varn Expr | |
> | VISIBLE Expr | |
> | ORLY Expr YaRly [Mebbe] (Maybe NoWai) | |
> | WTF [Omg] (Maybe Omgwtf) | |
> | HOW Varn [Varn] [HowStmt] | |
> | IIZ Varn [Varn] | |
> stmt :: Parser Stmt | |
> stmt = (IHAZA <$ tok "I HAZ A" <*> varn <* optional (tok "ITZ" <*> expr)) | |
> <|> (Math <$> math) | |
> <|> (Strop <$> strop) | |
> <|> (Cast <$> cast) | |
> <|> (R <$> varn <* tok "R" <*> expr) | |
> <|> (VISIBLE <$ tok "VISIBLE" <*> expr) | |
> <|> (ORLY <$> expr <* lsep <* tok "O RLY?" <* lsep | |
> <*> yarly <*> many mebbe <*> optional nowai <* tok "OIC") | |
> <|> (WTF <$ tok "WTF?" <* lsep <*> some omg <*> optional omgwtf | |
> <* tok "OIC") | |
> <|> (HOW <$ tok "HOW IZ I" <*> varn <*> (sepBy (tok "YR" *> varn) "AN") | |
> <* lsep <*> many howstmt <* tok "IF U SAY SO") | |
> <|> (IIZ <$ tok "I IZ" <*> varn <*> (sepBy (tok "YR" *> varn) "AN") | |
> <* tok "MKAY") | |
> type YaRly = [Line] | |
> mebbe :: Parser YaRly | |
> yarly = (YaRly <$ tok "YA RLY" <* lsep <*> many line) | |
> data Mebbe = Mebbe Expr [Line] | |
> mebbe :: Parser Mebbe | |
> mebbe = (Mebbe <$ tok "MEBBE " <*> expr <* lsep <*> many line) | |
> type NoWai = [Line] | |
> nowai :: Parser NoWai | |
> nowai = (NoWai <$ tok "NO WAI" <* lsep <*> many line) | |
> data Omg = Omg Literal [Line] | |
> omg :: Parser Omg | |
> omg = (Omg <$ tok "OMG" <*> literal <* lsep <*> many line | |
> <* optional (tok "GTFO?") <* lsep) | |
> type OmgWtf = [Line] | |
> omgwtf :: Parser OmgWtf | |
> omgwtf = (OmgWtf <$ lsep <*> many line) | |
> data HowStmt = Line line | |
> | FOUND Expr | |
> | GTFO | |
> howstmt :: Parser HowStmt | |
> howstmt = (FOUND <$ tok "FOUND YR" <*> expr) | |
> <|> (GTFO <$ tok "GTFO") | |
> <|> (Line <$> line) | |
expr: rnum | ryarn | rtroof | rtype | varn | IT | NOOB | |
rnum: math | numbar | |
ryarn: strop | yarn | |
rtroof: lgic | troof | |
rtype: cast | type | |
> data Expr = RNum rnum | |
> | RYarn ryarn | |
> | RTroof rtroof | |
> | RType rtype | |
> | Varn varn | |
> | NOOB | |
> data RNum = Math math | |
> | Numbar numbar | |
> data RYarn = Strop strop | |
> | Yarn yarn | |
> data RTroof = Lgic lgic | |
> | Troof troof | |
> data RType = Cast cast | |
> | Typ typ | |
math: SUM OF rnum AN rnum | |
| DIFF OF rnum AN rnum | |
| PRODUKT OF rnum AN rnum | |
| QUOSHUNT OF rnum AN rnum | |
| MOD OF rnum AN rnum | |
| BIGGR OF rnum AN rnum | |
| SMALLR OF rnum AN rnum | |
> data Math = SUM RNum RNum | |
> | DIFF RNum RNum | |
> | PRODUKT RNum RNum | |
> | QUOSHUNT RNum RNum | |
> | MOD RNum RNum | |
> | BIGGR RNum RNum | |
> | SMALLR RNum RNum | |
> math :: Parser Math | |
> math = (SUM <$ tok "SUM OF" <*> rnum <* tok "AN" <*> rnum) | |
> <|> (DIFF <$ tok "DIFF OF" <*> rnum <* tok "AN" <*> rnum) | |
> <|> (PRODUKT <$ tok "PRODUKT OF" <*> rnum <* tok "AN" <*> rnum) | |
> <|> (QUOSHUNT <$ tok "QUOSHUNT OF" <*> rnum <* tok "AN" <*> rnum) | |
> <|> (MOD <$ tok "MOD OF" <*> rnum <* tok "AN" <*> rnum) | |
> <|> (BIGGR <$ tok "BIGGR OF" <*> rnum <* tok "AN" <*> rnum) | |
> <|> (SMALLR <$ tok "SMALLR OF" <*> rnum <* tok "AN" <*> rnum) | |
lgic: BOTH OF rtroof AN rtroof | |
| EITHER OF rtroof AN troof | |
| WON OF rtroof AN rtroof | |
| NOT rtroof | |
| ALL OF rtroof (AN troof)* MKAY | |
| ANY OF rtroof (AN troof)* MKAY | |
| BOTH SAEM expr AN expr | |
| DIFFRINT expr AN expr | |
> data Lgic = BOTH RTroof RTroof | |
> | EITHER RTroof RTroof | |
> | WON RTroof RTroof | |
> | NOT RTroof | |
> | ALL [RTroof] | |
> | ANY [RTroof] | |
> | BOTH Expr Expr | |
> | DIFFRINT Expr Expr | |
> lgic :: Parser Lgic | |
> lgic = (BOTH <$ tok "BOTH OF" <*> rtroof <* tok "AN" <*> rtroof) | |
> <|> (EITHER <$ tok "EITHER OF" <*> rtroof <* tok "AN" <*> rtroof) | |
> <|> (WON <$ tok "WON OF" <*> rtroof <* tok "AN" <*> rtroof) | |
> <|> (NOT <$ tok "NOT" <*> rtroof) | |
> <|> (ALL <$ tok "ALL OF" <*> (sepBy1 rtroof "AN") <* tok "MKAY") | |
> <|> (ANY <$ tok "ANY OF" <*> (sepBy1 rtroof "AN") <* tok "MKAY") | |
> <|> (BOTH <$ tok "BOTH SAEM" <*> expr <* tok "AN" <*> expr) | |
> <|> (DIFFRINT <$ tok "DIFFRINT" <*> expr <* tok "AN" <*> expr) | |
strop: SMOOSH ryarn (AN ryarn)* MKAY | |
> data Strop = SMOOSH [RYarn] | |
> strop :: Parser Strop | |
> strop = (SMOOSH <$ tok "SMOOSH" <*> (sepBy1 ryarn "AN") <* tok "MKAY") | |
cast: MAEK varn A rtype | |
| varn IS NOW A type | |
> data Cast = MAEK Varn RType | |
> cast :: Parser Cast | |
> cast = (MAEK <$ tok "MAEK" <*> varn <* tok "A" <*> rtype) | |
> <|> (MAEK <$ varn <*> tok "IZ NOW A" <*> rtype) | |
varn: alpha alpha* | |
alpha: 'A' | 'B' | ... | 'Z' | |
> type Varn = String | |
> varn :: Parser Varn | |
> varn = (some (oneOf ['A' .. 'Z']) >>= return . read) <* whitespace | |
numbar: '-'? digit digit* ('.' digit digit*)? | |
digit: '0' | '1' | ... | '9' | |
> type Numbar = Float | |
> numbar :: Parser Float | |
> numbar = (((optional '-') <*> some (oneOf ['0' .. '9']) <*> '.' | |
> <*> some (oneOf ['0' .. '9'])) >>= return . read) <* whitespace | |
literal: yarn | numbar | troof | |
yarn: '"' achar* '"' | |
troof: WIN | FAIL | |
type: YARN | NUMBAR | TROOF | |
> data Literal = Yarn Yarn | |
> | Numbar Numbar | |
> | Troof Troof | |
> literal :: Parser literal | |
> literal = (Yarn <$> yarn) | |
> <|> (Numbar <$> numbar) | |
> <|> (Troof <$> troof) | |
> type Yarn = String | |
> yarn :: Parser String | |
> yarn = tok "\"" *> some (noneOf "\n\r") <* tok "\"" | |
> type Troof = Bool | |
> troof :: Parser Bool | |
> troof = (True <$ tok "WIN") | |
> <|> (False <$ tok "FAIL") | |
> data Typ = TYARN | TNUMBAR | TTROOF | |
> typ :: Parser Type | |
> typ = (TYARN <$ tok "YARN") | |
> <|> (TNUMBAR <$ tok "NUMBAR") | |
> <|> (TTROOF <$ tok "TROOF") | |
lsep: cr | ',' | |
> lsep :: Parser () | |
> lsep = many (oneOf "\r\n,") *> pure () | |
`achar` is just a `Char`, the definition of `yarn` where `achar` is exclusively | |
used handles `cr`s. | |
TODO: pretty printing | |
> parseFile :: FilePath -> IO () | |
> parseFile filePath = do | |
> file <- readFile filePath | |
> putStrLn $ case parse prog filePath file of | |
> Left err -> parseErrorPretty err | |
> Right prog -> pretty prog | |
Pretty Printing | |
--------------- | |
The instances below allow values to be inspected in the terminal. | |
The default instance that is derived shows all the constructor names. | |
> deriving instance Show Prog | |
> deriving instance Show Line | |
> deriving instance Show Cmnt | |
> deriving instance Show Stmt | |
> deriving instance Show YaRly | |
> deriving instance Show Mebbe | |
> deriving instance Show NoWai | |
> deriving instance Show Omg | |
> deriving instance Show OmgWtf | |
> deriving instance Show HowStmt | |
> deriving instance Show Expr | |
> deriving instance Show RNum | |
> deriving instance Show RYarn | |
> deriving instance Show RTroof | |
> deriving instance Show RType | |
> deriving instance Show Math | |
> deriving instance Show Lgic | |
> deriving instance Show Strop | |
> deriving instance Show Cast | |
> deriving instance Show Literal | |
> deriving instance Show Troof | |
> deriving instance Show Type | |
The pretty-printed output gives a version that should be acceptable | |
LOLCODE stripped of comments. Although using a deep embedding necessitates | |
a large memory usage for the creation of the tree and wrapping into `Pretty` | |
classes, this allows the language to be easily extended with extra features. | |
The pretty function flattens the tree into a shallow embedding: a single string | |
representing the entire program. A parallel function could be used for code | |
generation e.g. to generate intermediate code in an intermediate or low-level | |
language, or machine code. This demonstrates the power of deep embedding to | |
create versatility. | |
> class Pretty a where | |
> pretty :: a -> String | |
> instance Pretty Prog where | |
> pretty (Prog v ls) = "HAI " ++ pretty v ++ "\n" ++ (unlines . map pretty) ls | |
> ++ "KTHXBYE\n" | |
> instance Pretty Line where | |
> pretty (Stmt stmt) = pretty stmt | |
> pretty (Cmnt cmnt) = "" | |
> instance Pretty Stmt where | |
> pretty (IHAZA v Nothing) = "I HAZ A " ++ v | |
> pretty (IHAZA v (Just val)) = "I HAZ A " ++ v ++ pretty val | |
> pretty (Math math) = pretty math | |
> pretty (Strop strop) = pretty strop | |
> pretty (Cast cast) = pretty cast | |
> pretty (R v expr) = v ++ " R " ++ pretty expr | |
> pretty (VISIBLE expr) = "VISIBLE " ++ pretty expr | |
> pretty (ORLY expr y ms Nothing) = pretty expr ++ ", O RLY?\n" ++ pretty y | |
> ++ "\n" ++ (unlines . map pretty) ms | |
> ++ "\nOIC" | |
> pretty (ORLY expr y ms (Just n)) = pretty expr ++ ", O RLY?\n" ++ pretty y | |
> ++ "\n" ++ (unlines . map pretty) ms ++ "\n" ++ pretty n | |
> ++ "\nOIC" | |
> pretty (WTF omgs Nothing) = "WTF?\n" ++ (unlines . map pretty) omgs | |
> ++ "\nOIC" | |
> pretty (WTF omgs (Just wtf)) = "WTF?\n" ++ (unlines . map pretty) omgs | |
> ++ "\n" ++ pretty wtf ++ "\nOIC" | |
> pretty (HOW v [] hows) = "HOW IZ I " ++ v ++ "\n" | |
> ++ (unlines . map pretty) hows ++ "\n IF U SAY SO" | |
> pretty (HOW v vs hows) = "HOW IZ I " ++ v ++ " YR " | |
> ++ ((intercalate) " AN ") . map pretty) vs ++ "\n" | |
> ++ (unlines . map pretty) hows ++ "\n IF U SAY SO" | |
> pretty (IIZ v []) = "I IZ " ++ v | |
> pretty (IIZ v vs) = "I IZ " ++ v ++ " YR " | |
> ++ ((intercalate) " AN ") . map pretty) vs | |
> instance Pretty YaRly where | |
> pretty ls = "YA RLY\n" ++ (unlines . map pretty) ls | |
> instance Pretty Mebbe where | |
> pretty (Mebbe cond ls) = "MEBBE " ++ pretty cond ++ "\n" ++ (unlines . map pretty) ls | |
> instance Pretty NoWai where | |
> pretty ls = "NO WAI\n" ++ (unlines . map pretty) ls | |
> instance Pretty Omg where | |
> pretty (Omg c ls) = "OMG " ++ pretty c ++ "\n" ++ (unlines . map pretty) ls | |
> ++ "\nGTFO?" | |
> instance Pretty OmgWtf where | |
> pretty ls = "OMGWTF\n" ++ (unlines . map pretty) ls | |
> instance Pretty HowStmt where | |
> pretty (Line line) = pretty line | |
> pretty (FOUND expr) = "FOUND YR " ++ pretty expr | |
> pretty (GTFO) = "GTFO" | |
> instance Pretty Expr where | |
> pretty (RNum rnum) = pretty rnum | |
> pretty (RYarn ryarn) = pretty ryarn | |
> pretty (RTroof rtroof) = pretty rtroof | |
> pretty (RType rtype) = pretty rtype | |
> pretty (Varn varn) = show varn | |
> pretty (NOOB) = "" | |
> instance Pretty RNum where | |
> pretty (Math math) = pretty math | |
> pretty (Numbar numbar) = show numbar | |
> instance Pretty RYarn where | |
> pretty (Strop strop) = pretty strop | |
> pretty (Yarn yarn) = yarn | |
> instance Pretty RTroof where | |
> pretty (Lgic lgic) = pretty lgic | |
> pretty (Troof troof) = pretty troof | |
> instance Pretty RType where | |
> pretty (Cast cast) = pretty cast | |
> pretty (Type typ) = pretty typ | |
> instance Pretty Math where | |
> pretty (SUM a b) = "SUM OF " ++ pretty a ++ " AN " ++ pretty b | |
> pretty (DIFF a b) = "DIFF OF " ++ pretty a ++ " AN " ++ pretty b | |
> pretty (PRODUKT a b) = "PRODUKT OF " ++ pretty a ++ " AN " ++ pretty b | |
> pretty (QUOSHUNT a b) = "QUOSHUNT OF " ++ pretty a ++ " AN " ++ pretty b | |
> pretty (MOD a b) = "MOD OF " ++ pretty a ++ " AN " ++ pretty b | |
> pretty (BIGGR a b) = "BIGGR OF " ++ pretty a ++ " AN " ++ pretty b | |
> pretty (SMALLR a b) = "SMALLR OF " ++ pretty a ++ " AN " ++ pretty b | |
> instance Pretty Lgic where | |
> pretty (BOTH a b) = "BOTH OF " ++ pretty a ++ " AN " ++ pretty b | |
> pretty (EITHER a b) = "EITHER OF " ++ pretty a ++ " AN " ++ pretty b | |
> pretty (WON a b) = "WON OF " ++ pretty a ++ " AN " ++ pretty b | |
> pretty (NOT a) = "NOT " ++ pretty a | |
> pretty (ALL cs) = "ALL OF " ++ ((intercalate " AN ") . map pretty) cs ++ " MKAY" | |
> pretty (ANY cs) = "ANY OF " ++ ((intercalate " AN ") . map pretty) cs ++ " MKAY" | |
> pretty (BOTH a b) = "BOTH SAEM " ++ pretty a ++ " AN " ++ pretty b | |
> pretty (DIFFRINT a b) = "DIFFRINT " ++ pretty a ++ " AN " ++ pretty b | |
> instance Pretty Strop where | |
> pretty (SMOOSH ys) = "SMOOSH " ++ ((intercalate " AN ") . map pretty) cs ++ " MKAY" | |
> instance Pretty Cast where | |
> pretty (MAEK v t) = "MAEK " ++ show v ++ " A " ++ pretty t | |
> instance Pretty Literal where | |
> pretty (Yarn y) = y | |
> pretty (Numbar n) = show n | |
> pretty (Troof t) = pretty t | |
> instance Pretty Troof where | |
> pretty True = "WIN" | |
> pretty False = "FAIL" | |
> instance Pretty Type where | |
> pretty (TYARN) = "YARN" | |
> pretty (TNUMBAR) = "NUMBAR" | |
> pretty (TTROOF) = "TROOF" | |
Hopefully this document demonstrates the simplicity of creating a language using | |
parser combinators. By using a functional language we simplified our workflow | |
which meant that the creation of this parser took in the tens of man hours | |
instead of hundreds, the parser follows our language definition closely and | |
extending both the language and parser is easier, quicker and safer. | |
References | |
---------- | |
[1]: http://lolcode.org | |
[2]: https://github.com/justinmesa | |
[3]: http://ozwords.org/?p=6054 | |
[4]: http://knowyourmeme.com/memes/lolcats | |
[5]: https://www.cnet.com/uk/news/the-history-of-i-can-has-cheezburger/ | |
[6]: https://github.com/justinmeza/lolcode-spec/blob/master/v1.2/lolcode-spec-v1.2.md | |
Based on the work of [Nicolas Wu](http://zenzike.com/) for University of Bristol | |
undergraduate course [COMS22201 Language Engineering](http://www.cs.bris.ac.uk/Teaching/unitglance.jsp?unit=COMS22201). | |
All haskell code in this document is currently untested due to technical | |
problems. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment