Last active
March 8, 2018 21:03
-
-
Save jarlg/ec1eefe5368a8ec2ef408eb449d3da3c to your computer and use it in GitHub Desktop.
live file (with holes) of parsing, and completed version
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Does everyone have a REPL (ghci) running? | |
* Literal haskell: code is on lines starting with ">". | |
* Run it with "ghci <file>". | |
* https://gist.github.com/jarlg | |
i) Types | |
Every *value* (a sequence of bits in memory) has a *type* (the | |
"meaning" of these bits). | |
Here are types Bool, Maybe' and List'; Show is derived for | |
printing. | |
> data Bool' = True' | False' deriving Show | |
> data Maybe' a = Just' a | Nothing' deriving Show | |
> data List a = Cons a (List a) | Empty deriving Show | |
Maybe' and List are similar in a way: | |
> mapMaybe :: (a -> b) -> Maybe' a -> Maybe' b | |
> mapMaybe f Nothing' = Nothing' | |
> mapMaybe f (Just' a) = Just' (f a) | |
> mapList :: (a -> b) -> List a -> List b | |
> mapList f Empty = Empty | |
> mapList f (Cons x xs) = Cons (f x) (mapList f xs) | |
A typeclass captures this similarity: | |
> class Mappable t where | |
> map' :: (a -> b) -> t a -> t b | |
> instance Mappable Maybe' where | |
> map' = mapMaybe | |
> instance Mappable List where | |
> map' = mapList | |
We want to write generic functions, but sometimes we need our | |
types to support certain operations. For example, the function | |
'max' requires the types to be ordered. | |
NB: "Mappable" above is called Functor, and Maybe, Bool and List | |
all exist in Prelude (List is []). We'll use them from now. | |
ii) Parsers | |
Let's get into parsers. We'll think of them like this: | |
> type Parser' a = String -> m (a, String) | |
Construct some examples of parsers, and try them out. | |
> item' :: Parser' Char -- takes one charater, if there is one | |
> item' "" = Nothing | |
> item' (x:xs) = Just (x, xs) | |
> sat' :: (Char -> Bool) -> Parser' Char | |
> sat' pred (x:xs) | pred x = Just (x, xs) | |
> sat' _ _ = Nothing | |
> char' :: Char -> Parser' Char | |
> char' c = sat' (== c) | |
> space' :: Parser' Char | |
> space' = char' ' ' | |
> digit' :: Parser' Char -- can we do Parser' Int? | |
> digit' (x:xs) | '0' <= x && x <= '9' = Just (x, xs) | |
> digit' _ = Nothing | |
Is Parser mappable/a functor? What would it mean? | |
3. Define mapP (later we'll go into the abstract). | |
> mapP :: (a -> b) -> Parser' a -> Parser' b | |
> mapP f p str = case p str of | |
> Nothing -> Nothing | |
> Just (x, rest) -> Just (f x, rest) | |
A lot of programs consist of parsing and computing on the | |
result. Now we can simply combine parsing and | |
computing/interpreting using mapP! | |
How can we combine parsed results? E.g. parse many numbers? Or a | |
certain string? An idea: usually "building a string" is done by | |
cons'ing on top of a previous string: e.g. ('a' : []) == "a", and | |
('a' : "bc") == "abc". The function "cons'ing 'a' on top" is ((:) | |
'a'). | |
The parser equivalent is (mapP (:) (char 'a')) which parses 'a' | |
(or fails) and returns ((:) 'a') -- it is the parser of the | |
"cons'ing 'a' on top"-function. With usual functions, we just | |
apply the function. Can we apply a parser of a function? | |
One solution: Applicative. | |
4. The following types don't leave room for creativity. Try | |
defining them. | |
> pureP :: a -> Parser' a | |
> pureP x str = Just (x, str) | |
> applyP :: Parser' (a -> b) -> Parser' a -> Parser' b | |
> applyP pf pa str = case pf str of | |
> Nothing -> Nothing | |
> Just (f, rest) -> case pa rest of | |
> Nothing -> Nothing | |
> Just (a, rest') -> Just (f a, rest') | |
5. Use (:) to combine parsers: | |
> string' :: String -> Parser' String | |
> string' "" = pureP "" | |
> string' (x:xs) = | |
> applyP (mapP (:) (char' x)) (string' xs) | |
> count' :: Int -> Parser' a -> Parser' [a] | |
> count' 0 p = pureP [] | |
> count' n p = applyP (mapP (:) p) (count' (n-1) p) | |
How about "choice"? Could we try another parser if the first one | |
fails? | |
6. Implement the following "choice combinator" | |
(from Alternative). | |
> (<|>) :: Parser' a -> Parser' a -> Parser' a | |
> (p <|> q) str = case p str of | |
> Just (x, rest) -> Just (x, rest) | |
> Nothing -> case q str of | |
> Nothing -> Nothing | |
> r -> r | |
Now we can write a pair of useful parsers: | |
> many1' :: Parser' a -> Parser' [a] -- can fail | |
> many1' p = applyP (mapP (:) p) (many' p) | |
> many' :: Parser' a -> Parser' [a] -- cannot fail | |
> many' p = many1' p <|> pureP [] | |
Especially useful: | |
> spaces' = many' space' | |
And finally a proper usage of mapP: | |
> number' :: Parser' Int | |
> number' = mapP read (many1' digit') | |
Let's get general; it'll give us some functions for free. | |
> newtype Parser a = | |
> Parser { parse :: String -> Maybe (a, String) } | |
The general instances are just wrapping/unwrapping the ones we | |
already did. | |
> instance Functor Parser where | |
> fmap f (Parser p) = Parser (mapP f p) | |
> -- f <$> (Parser p) = Parser (mapP f p) | |
> instance Applicative Parser where | |
> pure a = Parser (pureP a) | |
> (Parser pf) <*> (Parser pa) = Parser (applyP pf pa) | |
(Could also define an Alternative instance, for <|>.) | |
Wrap the previously defined functions: | |
> item = Parser item' | |
> sat p = Parser (sat' p) | |
> char c = Parser (char' c) | |
> space = Parser space' | |
> digit = Parser digit' | |
> number = Parser number' | |
> many (Parser p) = Parser (many' p) | |
> spaces = Parser spaces' | |
As defined previously, but in new notation: | |
> string :: String -> Parser String | |
> string "" = pure [] | |
> string (x:xs) = (:) <$> char x <*> string xs | |
> count 0 p = pure [] | |
> count n p = (:) <$> p <*> count (n-1) p | |
Applicative gives us (<*) and (*>) for free; they ignore parsers | |
on left and right. A *tokenized* parser ignores surrounding | |
whitespace. | |
> tokenize :: Parser a -> Parser a | |
> tokenize p = spaces *> p <* spaces | |
The following parser parses a token surrounded by opening and | |
closing | |
> bracketize :: Char -> Parser a -> Char -> Parser a | |
> bracketize open p close = | |
> char open *> tokenize p <* char close | |
> sepBy :: Parser a -> Parser b -> Parser [a] | |
> p `sepBy` sep = (:) <$> p <*> (many (sep *> p)) | |
> list :: Parser a -> Parser [a] | |
> list p = bracketize '[' (tokenize p `sepBy` comma) ']' | |
> where comma = char ',' | |
Until now, our parsers have been deterministic in the sense that | |
a previously parsed value cannot affect the next parser; only the | |
parsed value can be affected (through Applicative). What if we | |
parse either "name:" or "age:" and then want to parse either a | |
string (the name) or an int (the age)? Or e.g. based on a header, | |
change the body parser. | |
One solution: | |
> bind' :: Parser' a -> (a -> Parser' b) -> Parser' b | |
> bind' pa apb str = case pa str of | |
> Nothing -> Nothing | |
> Just (a, rest) -> (apb a) rest | |
> instance Monad Parser where | |
> return = pure | |
> (Parser pa) >>= apb = | |
> Parser $ \str -> case pa str of | |
> Nothing -> Nothing | |
> Just (x, rest) -> parse (apb x) rest | |
Let's illustrate by parsing an abstract "packet" which is a | |
number n and then n characters. | |
> packet :: Parser String | |
> packet = number >>= \n -> count n item | |
Chaining commands with bind (>>=) gets complicated very | |
quick. The solution is "do"-notation: | |
> packet2 :: Parser String | |
> packet2 = do | |
> n <- number | |
> count n item |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Does everyone have a REPL (ghci) running? | |
* Literal haskell: code is on lines starting with ">". | |
* Run it with "ghci <file>". | |
i) Types | |
Every *value* (a sequence of bits in memory) has a *type* (the | |
"meaning" of these bits). | |
Here are types Bool, Maybe' and List'; Show is derived for printing. | |
> data Bool' = True' | False' deriving Show | |
> data Maybe' a = Just' a | Nothing' deriving Show | |
> data List a = Cons a (List a) | Empty deriving Show | |
Maybe' and List are similar in a way: | |
> mapMaybe :: (a -> b) -> Maybe' a -> Maybe' b | |
> mapMaybe = undefined | |
> mapList :: (a -> b) -> List a -> List b | |
> mapList = undefined | |
A typeclass captures this similarity: | |
> class Mappable t where | |
> map' :: (a -> b) -> t a -> t b | |
> instance Mappable Maybe' where | |
> map' = undefined | |
> instance Mappable List where | |
> map' = undefined | |
We want to write generic functions, but sometimes we need our types to | |
support certain operations. For example, the function 'max' requires | |
the types to be ordered. | |
NB: "Mappable" above is called Functor, and Maybe, Bool and List all | |
exist in Prelude (List is []). We'll use them from now. | |
ii) Parsers | |
Let's get into parsers. We'll think of them like this: | |
> type Parser' a = String -> Maybe (a, String) | |
Construct some examples of parsers, and try them out. | |
> item' :: Parser' Char -- takes one charater, if there is one | |
> item' = undefined | |
> sat' :: (Char -> Bool) -> Parser' Char | |
> sat' = undefined | |
> char' :: Char -> Parser' Char | |
> char' = undefined | |
> space' :: Parser' Char | |
> space' = undefined | |
> digit' :: Parser' Char -- can we do Parser' Int? | |
> digit' = undefined | |
Is Parser mappable/a functor? What would it mean? | |
3. Define mapP (later we'll go into the abstract). | |
> mapP :: (a -> b) -> Parser' a -> Parser' b | |
> mapP = undefined | |
A lot of programs consist of parsing and computing on the result. Now | |
we can simply combine parsing and computing/interpreting using mapP! | |
How can we combine parsed results? E.g. parse many numbers? Or a | |
certain string? An idea: usually "building a string" is done by | |
cons'ing on top of a previous string: e.g. ('a' : []) == "a", and ('a' | |
: "bc") == "abc". The function "cons'ing 'a' on top" is ((:) 'a'). | |
The parser equivalent is (mapP (:) (char 'a')) which parses 'a' (or | |
fails) and returns ((:) 'a') -- it is the parser of the "cons'ing 'a' | |
on top"-function. With usual functions, we just apply the | |
function. Can we apply a parser of a function? | |
One solution: Applicative. | |
4. The following types don't leave room for creativity. Try defining | |
them. | |
> pureP :: a -> Parser' a | |
> pureP = undefined | |
> applyP :: Parser' (a -> b) -> Parser' a -> Parser' b | |
> applyP = undefined | |
5. Use (:) to combine parsers: | |
> string' :: String -> Parser' String | |
> string' = undefined | |
> count' :: Int -> Parser' a -> Parser' [a] | |
> count' = undefined | |
How about "choice"? Could we try another parser if the first one fails? | |
6. Implement the following "choice combinator" (from Alternative). | |
> (<|>) :: Parser' a -> Parser' a -> Parser' a | |
> (p <|> q) str = undefined | |
Now we can write a pair of useful parsers: | |
> many1' :: Parser' a -> Parser' [a] -- can fail | |
> many1' p = undefined | |
> many' :: Parser' a -> Parser' [a] -- cannot fail | |
> many' p = undefined | |
Especially useful: | |
> spaces' = many' space' | |
And finally a proper usage of mapP: | |
> number' :: Parser' Int | |
> number' = mapP read (many1' digit') | |
Let's get general; it'll give us some functions for free. | |
> newtype Parser a = Parser { parse :: String -> Maybe (a, String) } | |
The general instances are just wrapping/unwrapping the ones we already | |
did. | |
> instance Functor Parser where | |
> fmap f (Parser p) = Parser (mapP f p) | |
> -- f <$> (Parser p) = Parser (mapP f p) | |
> instance Applicative Parser where | |
> pure a = Parser (pureP a) | |
> (Parser pf) <*> (Parser pa) = Parser (applyP pf pa) | |
(Could also define an Alternative instance, for <|>.) | |
Wrap the previously defined functions: | |
> item = Parser item' | |
> sat p = Parser (sat' p) | |
> char c = Parser (char' c) | |
> space = Parser space' | |
> digit = Parser digit' | |
> number = Parser number' | |
> many (Parser p) = Parser (many' p) | |
> spaces = Parser spaces' | |
As defined previously, but in new notation: | |
> string :: String -> Parser String | |
> string "" = pure [] | |
> string (x:xs) = (:) <$> char x <*> string xs | |
> count 0 p = pure [] | |
> count n p = (:) <$> p <*> count (n-1) p | |
Applicative gives us (<*) and (*>) for free; they ignore parsers on | |
left and right. A *tokenized* parser ignores surrounding whitespace. | |
> tokenize :: Parser a -> Parser a | |
> tokenize p = spaces *> p <* spaces | |
The following parser parses a token surrounded by opening and closing | |
brackets: | |
> bracketize :: Char -> Parser a -> Char -> Parser a | |
> bracketize open p close = undefined | |
> sepBy :: Parser a -> Parser b -> Parser [a] | |
> p `sepBy` sep = undefined | |
> list :: Parser a -> Parser [a] | |
> list p = undefined | |
Until now, our parsers have been deterministic in the sense that a | |
previously parsed value cannot affect the next parser; only the parsed | |
value can be affected (through Applicative). What if we parse either | |
"name:" or "age:" and then want to parse either a string (the name) or | |
an int (the age)? Or e.g. based on a header, change the body parser. | |
One solution: | |
> bind' :: Parser' a -> (a -> Parser' b) -> Parser' b | |
> bind' pa apb str = undefined | |
> instance Monad Parser where | |
> return = pure | |
> (Parser pa) >>= apb = Parser $ \str -> case pa str of | |
> Nothing -> Nothing | |
> Just (x, rest) -> parse (apb x) rest | |
Let's illustrate by parsing an abstract "packet" which is a number n | |
and then n characters. | |
> packet :: Parser String | |
> packet = undefined | |
Chaining commands with bind (>>=) gets complicated very quick. The | |
solution is "do"-notation: | |
> packet2 :: Parser String | |
> packet2 = undefined |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment