jarlg/Parsing-complete.lhs

## Parsing-complete.lhs
  * Does everyone have a REPL (ghci) running?
  * Literal haskell: code is on lines starting with ">".
  * Run it with "ghci <file>".
  * https://gist.github.com/jarlg


i) Types

Every *value* (a sequence of bits in memory) has a *type* (the
"meaning" of these bits).

Here are types Bool, Maybe' and List'; Show is derived for
printing.

> data Bool' = True' | False' deriving Show
> data Maybe' a = Just' a | Nothing' deriving Show
> data List a = Cons a (List a) | Empty deriving Show

Maybe' and List are similar in a way:

> mapMaybe :: (a -> b) -> Maybe' a -> Maybe' b
> mapMaybe f Nothing' = Nothing'
> mapMaybe f (Just' a) = Just' (f a)

> mapList :: (a -> b) -> List a -> List b
> mapList f Empty = Empty
> mapList f (Cons x xs) = Cons (f x) (mapList f xs)

A typeclass captures this similarity:

> class Mappable t where
>   map' :: (a -> b) -> t a -> t b

> instance Mappable Maybe' where
>   map' = mapMaybe

> instance Mappable List where
>   map' = mapList


We want to write generic functions, but sometimes we need our
types to support certain operations. For example, the function
'max' requires the types to be ordered.

NB: "Mappable" above is called Functor, and Maybe, Bool and List
all exist in Prelude (List is []). We'll use them from now.

ii) Parsers

Let's get into parsers. We'll think of them like this:

> type Parser' a = String -> m (a, String)

Construct some examples of parsers, and try them out.

> item' :: Parser' Char -- takes one charater, if there is one
> item' "" = Nothing
> item' (x:xs) = Just (x, xs)

> sat' :: (Char -> Bool) -> Parser' Char
> sat' pred (x:xs) | pred x = Just (x, xs)
> sat' _ _ = Nothing

> char' :: Char -> Parser' Char
> char' c = sat' (== c)

> space' :: Parser' Char
> space' = char' ' '

> digit' :: Parser' Char -- can we do Parser' Int?
> digit' (x:xs) | '0' <= x && x <= '9' = Just (x, xs)
> digit' _ = Nothing

Is Parser mappable/a functor? What would it mean?

3. Define mapP (later we'll go into the abstract).

> mapP :: (a -> b) -> Parser' a -> Parser' b
> mapP f p str = case p str of
>                  Nothing -> Nothing
>                  Just (x, rest) -> Just (f x, rest)

A lot of programs consist of parsing and computing on the
result. Now we can simply combine parsing and
computing/interpreting using mapP!

How can we combine parsed results? E.g. parse many numbers? Or a
certain string? An idea: usually "building a string" is done by
cons'ing on top of a previous string: e.g. ('a' : []) == "a", and
('a' : "bc") == "abc". The function "cons'ing 'a' on top" is ((:)
'a').

The parser equivalent is (mapP (:) (char 'a')) which parses 'a'
(or fails) and returns ((:) 'a') -- it is the parser of the
"cons'ing 'a' on top"-function. With usual functions, we just
apply the function. Can we apply a parser of a function?

One solution: Applicative.

4. The following types don't leave room for creativity. Try
defining them.

> pureP :: a -> Parser' a
> pureP x str = Just (x, str)

> applyP :: Parser' (a -> b) -> Parser' a -> Parser' b
> applyP pf pa str = case pf str of
>                      Nothing -> Nothing
>                      Just (f, rest) -> case pa rest of
>                                          Nothing -> Nothing
>                                          Just (a, rest') -> Just (f a, rest')

5. Use (:) to combine parsers:

> string' :: String -> Parser' String
> string' "" = pureP ""
> string' (x:xs) =
>   applyP (mapP (:) (char' x)) (string' xs)

> count' :: Int -> Parser' a -> Parser' [a]
> count' 0 p = pureP []
> count' n p = applyP (mapP (:) p) (count' (n-1) p)

How about "choice"? Could we try another parser if the first one
fails?

6. Implement the following "choice combinator"
(from Alternative).

> (<|>) :: Parser' a -> Parser' a -> Parser' a
> (p <|> q) str = case p str of
>                   Just (x, rest) -> Just (x, rest)
>                   Nothing -> case q str of
>                                Nothing -> Nothing
>                                r -> r

Now we can write a pair of useful parsers:


> many1' :: Parser' a -> Parser' [a]  -- can fail
> many1' p = applyP (mapP (:) p) (many' p)

> many' :: Parser' a -> Parser' [a] -- cannot fail
> many' p = many1' p <|> pureP []


Especially useful:

> spaces' = many' space'

And finally a proper usage of mapP:

> number' :: Parser' Int
> number' = mapP read (many1' digit')

Let's get general; it'll give us some functions for free.

> newtype Parser a =
>   Parser { parse :: String -> Maybe (a, String) }

The general instances are just wrapping/unwrapping the ones we
already did.

> instance Functor Parser where
>   fmap f (Parser p) = Parser (mapP f p)
> -- f <$> (Parser p) = Parser (mapP f p)

> instance Applicative Parser where
>   pure a = Parser (pureP a)
>   (Parser pf) <*> (Parser pa) = Parser (applyP pf pa)

(Could also define an Alternative instance, for <|>.)

Wrap the previously defined functions:

> item = Parser item'
> sat p = Parser (sat' p)
> char c = Parser (char' c)
> space = Parser space'
> digit = Parser digit'
> number = Parser number'
> many (Parser p) = Parser (many' p)
> spaces = Parser spaces'

As defined previously, but in new notation:

> string :: String -> Parser String
> string "" = pure []
> string (x:xs) = (:) <$> char x <*> string xs

> count 0 p = pure []
> count n p = (:) <$> p <*> count (n-1) p

Applicative gives us (<*) and (*>) for free; they ignore parsers
on left and right. A *tokenized* parser ignores surrounding
whitespace.

> tokenize :: Parser a -> Parser a
> tokenize p = spaces *> p <* spaces

The following parser parses a token surrounded by opening and
closing

> bracketize :: Char -> Parser a -> Char -> Parser a
> bracketize open p close =
>   char open *> tokenize p <* char close

> sepBy :: Parser a -> Parser b -> Parser [a]
> p `sepBy` sep = (:) <$> p <*> (many (sep *> p))

> list :: Parser a -> Parser [a]
> list p = bracketize '[' (tokenize p `sepBy` comma) ']'
>   where comma = char ','

Until now, our parsers have been deterministic in the sense that
a previously parsed value cannot affect the next parser; only the
parsed value can be affected (through Applicative). What if we
parse either "name:" or "age:" and then want to parse either a
string (the name) or an int (the age)? Or e.g. based on a header,
change the body parser.

One solution:

> bind' :: Parser' a -> (a -> Parser' b) -> Parser' b
> bind' pa apb str = case pa str of
>                      Nothing -> Nothing
>                      Just (a, rest) -> (apb a) rest

> instance Monad Parser where
>   return = pure
>   (Parser pa) >>= apb =
>     Parser $ \str -> case pa str of
>                        Nothing -> Nothing
>                        Just (x, rest) -> parse (apb x) rest

Let's illustrate by parsing an abstract "packet" which is a
number n and then n characters.

> packet :: Parser String
> packet = number >>= \n -> count n item

Chaining commands with bind (>>=) gets complicated very
quick. The solution is "do"-notation:

> packet2 :: Parser String
> packet2 = do
>   n <- number
>   count n item

## Parsing-live.lhs
  * Does everyone have a REPL (ghci) running?
  * Literal haskell: code is on lines starting with ">".
  * Run it with "ghci <file>".

i) Types

Every *value* (a sequence of bits in memory) has a *type* (the
"meaning" of these bits).

Here are types Bool, Maybe' and List'; Show is derived for printing.

> data Bool' = True' | False' deriving Show
> data Maybe' a = Just' a | Nothing' deriving Show
> data List a = Cons a (List a) | Empty deriving Show

Maybe' and List are similar in a way:

> mapMaybe :: (a -> b) -> Maybe' a -> Maybe' b
> mapMaybe = undefined

> mapList :: (a -> b) -> List a -> List b
> mapList = undefined

A typeclass captures this similarity:

> class Mappable t where
>   map' :: (a -> b) -> t a -> t b

> instance Mappable Maybe' where
>   map' = undefined

> instance Mappable List where
>   map' = undefined

We want to write generic functions, but sometimes we need our types to
support certain operations. For example, the function 'max' requires
the types to be ordered.

NB: "Mappable" above is called Functor, and Maybe, Bool and List all
exist in Prelude (List is []). We'll use them from now.

ii) Parsers

Let's get into parsers. We'll think of them like this:

> type Parser' a = String -> Maybe (a, String)

Construct some examples of parsers, and try them out.

> item' :: Parser' Char -- takes one charater, if there is one
> item' = undefined

> sat' :: (Char -> Bool) -> Parser' Char
> sat' = undefined

> char' :: Char -> Parser' Char
> char' = undefined

> space' :: Parser' Char
> space' = undefined

> digit' :: Parser' Char -- can we do Parser' Int?
> digit' = undefined

Is Parser mappable/a functor? What would it mean?

3. Define mapP (later we'll go into the abstract).

> mapP :: (a -> b) -> Parser' a -> Parser' b
> mapP = undefined

A lot of programs consist of parsing and computing on the result. Now
we can simply combine parsing and computing/interpreting using mapP!

How can we combine parsed results? E.g. parse many numbers? Or a
certain string? An idea: usually "building a string" is done by
cons'ing on top of a previous string: e.g. ('a' : []) == "a", and ('a'
: "bc") == "abc". The function "cons'ing 'a' on top" is ((:) 'a').

The parser equivalent is (mapP (:) (char 'a')) which parses 'a' (or
fails) and returns ((:) 'a') -- it is the parser of the "cons'ing 'a'
on top"-function. With usual functions, we just apply the
function. Can we apply a parser of a function?

One solution: Applicative.

4. The following types don't leave room for creativity. Try defining
them.

> pureP :: a -> Parser' a
> pureP = undefined

> applyP :: Parser' (a -> b) -> Parser' a -> Parser' b
> applyP = undefined

5. Use (:) to combine parsers:

> string' :: String -> Parser' String
> string' = undefined

> count' :: Int -> Parser' a -> Parser' [a]
> count' = undefined

How about "choice"? Could we try another parser if the first one fails?

6. Implement the following "choice combinator" (from Alternative).

> (<|>) :: Parser' a -> Parser' a -> Parser' a
> (p <|> q) str = undefined

Now we can write a pair of useful parsers:

> many1' :: Parser' a -> Parser' [a]  -- can fail
> many1' p = undefined

> many'  :: Parser' a -> Parser' [a] -- cannot fail
> many' p = undefined

Especially useful:

> spaces' = many' space'

And finally a proper usage of mapP:

> number' :: Parser' Int
> number' = mapP read (many1' digit')

Let's get general; it'll give us some functions for free.

> newtype Parser a = Parser { parse :: String -> Maybe (a, String) }

The general instances are just wrapping/unwrapping the ones we already
did.

> instance Functor Parser where
>   fmap f (Parser p) = Parser (mapP f p)
> -- f <$> (Parser p) = Parser (mapP f p)

> instance Applicative Parser where
>   pure a = Parser (pureP a)
>   (Parser pf) <*> (Parser pa) = Parser (applyP pf pa)

(Could also define an Alternative instance, for <|>.)

Wrap the previously defined functions:

> item = Parser item'
> sat p = Parser (sat' p)
> char c = Parser (char' c)
> space = Parser space'
> digit = Parser digit'
> number = Parser number'
> many (Parser p) = Parser (many' p)
> spaces = Parser spaces'

As defined previously, but in new notation:

> string :: String -> Parser String
> string "" = pure []
> string (x:xs) = (:) <$> char x <*> string xs

> count 0 p = pure []
> count n p = (:) <$> p <*> count (n-1) p

Applicative gives us (<*) and (*>) for free; they ignore parsers on
left and right. A *tokenized* parser ignores surrounding whitespace.

> tokenize :: Parser a -> Parser a
> tokenize p = spaces *> p <* spaces

The following parser parses a token surrounded by opening and closing
brackets:

> bracketize :: Char -> Parser a -> Char -> Parser a
> bracketize open p close = undefined

> sepBy :: Parser a -> Parser b -> Parser [a]
> p `sepBy` sep = undefined

> list :: Parser a -> Parser [a]
> list p = undefined

Until now, our parsers have been deterministic in the sense that a
previously parsed value cannot affect the next parser; only the parsed
value can be affected (through Applicative). What if we parse either
"name:" or "age:" and then want to parse either a string (the name) or
an int (the age)? Or e.g. based on a header, change the body parser.

One solution:

> bind' :: Parser' a -> (a -> Parser' b) -> Parser' b
> bind' pa apb str = undefined

> instance Monad Parser where
>   return = pure
>   (Parser pa) >>= apb = Parser $ \str -> case pa str of
>                                            Nothing -> Nothing
>                                            Just (x, rest) -> parse (apb x) rest

Let's illustrate by parsing an abstract "packet" which is a number n
and then n characters.

> packet :: Parser String
> packet = undefined

Chaining commands with bind (>>=) gets complicated very quick. The
solution is "do"-notation:

> packet2 :: Parser String
> packet2 = undefined
	* Does everyone have a REPL (ghci) running?
	* Literal haskell: code is on lines starting with ">".
	* Run it with "ghci <file>".
	* https://gist.github.com/jarlg


	i) Types

	Every value (a sequence of bits in memory) has a type (the
	"meaning" of these bits).

	Here are types Bool, Maybe' and List'; Show is derived for
	printing.

	> data Bool' = True' \| False' deriving Show
	> data Maybe' a = Just' a \| Nothing' deriving Show
	> data List a = Cons a (List a) \| Empty deriving Show

	Maybe' and List are similar in a way:

	> mapMaybe :: (a -> b) -> Maybe' a -> Maybe' b
	> mapMaybe f Nothing' = Nothing'
	> mapMaybe f (Just' a) = Just' (f a)

	> mapList :: (a -> b) -> List a -> List b
	> mapList f Empty = Empty
	> mapList f (Cons x xs) = Cons (f x) (mapList f xs)

	A typeclass captures this similarity:

	> class Mappable t where
	> map' :: (a -> b) -> t a -> t b

	> instance Mappable Maybe' where
	> map' = mapMaybe

	> instance Mappable List where
	> map' = mapList


	We want to write generic functions, but sometimes we need our
	types to support certain operations. For example, the function
	'max' requires the types to be ordered.

	NB: "Mappable" above is called Functor, and Maybe, Bool and List
	all exist in Prelude (List is []). We'll use them from now.

	ii) Parsers

	Let's get into parsers. We'll think of them like this:

	> type Parser' a = String -> m (a, String)

	Construct some examples of parsers, and try them out.

	> item' :: Parser' Char -- takes one charater, if there is one
	> item' "" = Nothing
	> item' (x:xs) = Just (x, xs)

	> sat' :: (Char -> Bool) -> Parser' Char
	> sat' pred (x:xs) \| pred x = Just (x, xs)
	> sat' _ _ = Nothing

	> char' :: Char -> Parser' Char
	> char' c = sat' (== c)

	> space' :: Parser' Char
	> space' = char' ' '

	> digit' :: Parser' Char -- can we do Parser' Int?
	> digit' (x:xs) \| '0' <= x && x <= '9' = Just (x, xs)
	> digit' _ = Nothing

	Is Parser mappable/a functor? What would it mean?

	3. Define mapP (later we'll go into the abstract).

	> mapP :: (a -> b) -> Parser' a -> Parser' b
	> mapP f p str = case p str of
	> Nothing -> Nothing
	> Just (x, rest) -> Just (f x, rest)

	A lot of programs consist of parsing and computing on the
	result. Now we can simply combine parsing and
	computing/interpreting using mapP!

	How can we combine parsed results? E.g. parse many numbers? Or a
	certain string? An idea: usually "building a string" is done by
	cons'ing on top of a previous string: e.g. ('a' : []) == "a", and
	('a' : "bc") == "abc". The function "cons'ing 'a' on top" is ((:)
	'a').

	The parser equivalent is (mapP (:) (char 'a')) which parses 'a'
	(or fails) and returns ((:) 'a') -- it is the parser of the
	"cons'ing 'a' on top"-function. With usual functions, we just
	apply the function. Can we apply a parser of a function?

	One solution: Applicative.

	4. The following types don't leave room for creativity. Try
	defining them.

	> pureP :: a -> Parser' a
	> pureP x str = Just (x, str)

	> applyP :: Parser' (a -> b) -> Parser' a -> Parser' b
	> applyP pf pa str = case pf str of
	> Nothing -> Nothing
	> Just (f, rest) -> case pa rest of
	> Nothing -> Nothing
	> Just (a, rest') -> Just (f a, rest')

	5. Use (:) to combine parsers:

	> string' :: String -> Parser' String
	> string' "" = pureP ""
	> string' (x:xs) =
	> applyP (mapP (:) (char' x)) (string' xs)

	> count' :: Int -> Parser' a -> Parser' [a]
	> count' 0 p = pureP []
	> count' n p = applyP (mapP (:) p) (count' (n-1) p)

	How about "choice"? Could we try another parser if the first one
	fails?

	6. Implement the following "choice combinator"
	(from Alternative).

	> (<\|>) :: Parser' a -> Parser' a -> Parser' a
	> (p <\|> q) str = case p str of
	> Just (x, rest) -> Just (x, rest)
	> Nothing -> case q str of
	> Nothing -> Nothing
	> r -> r

	Now we can write a pair of useful parsers:


	> many1' :: Parser' a -> Parser' [a] -- can fail
	> many1' p = applyP (mapP (:) p) (many' p)

	> many' :: Parser' a -> Parser' [a] -- cannot fail
	> many' p = many1' p <\|> pureP []


	Especially useful:

	> spaces' = many' space'

	And finally a proper usage of mapP:

	> number' :: Parser' Int
	> number' = mapP read (many1' digit')

	Let's get general; it'll give us some functions for free.

	> newtype Parser a =
	> Parser { parse :: String -> Maybe (a, String) }

	The general instances are just wrapping/unwrapping the ones we
	already did.

	> instance Functor Parser where
	> fmap f (Parser p) = Parser (mapP f p)
	> -- f <$> (Parser p) = Parser (mapP f p)

	> instance Applicative Parser where
	> pure a = Parser (pureP a)
	> (Parser pf) <*> (Parser pa) = Parser (applyP pf pa)

	(Could also define an Alternative instance, for <\|>.)

	Wrap the previously defined functions:

	> item = Parser item'
	> sat p = Parser (sat' p)
	> char c = Parser (char' c)
	> space = Parser space'
	> digit = Parser digit'
	> number = Parser number'
	> many (Parser p) = Parser (many' p)
	> spaces = Parser spaces'

	As defined previously, but in new notation:

	> string :: String -> Parser String
	> string "" = pure []
	> string (x:xs) = (:) <$> char x <*> string xs

	> count 0 p = pure []
	> count n p = (:) <$> p <*> count (n-1) p

	Applicative gives us (<) and (>) for free; they ignore parsers
	on left and right. A tokenized parser ignores surrounding
	whitespace.

	> tokenize :: Parser a -> Parser a
	> tokenize p = spaces > p < spaces

	The following parser parses a token surrounded by opening and
	closing

	> bracketize :: Char -> Parser a -> Char -> Parser a
	> bracketize open p close =
	> char open > tokenize p < char close

	> sepBy :: Parser a -> Parser b -> Parser [a]
	> p `sepBy` sep = (:) <$> p <> (many (sep > p))

	> list :: Parser a -> Parser [a]
	> list p = bracketize '[' (tokenize p `sepBy` comma) ']'
	> where comma = char ','

	Until now, our parsers have been deterministic in the sense that
	a previously parsed value cannot affect the next parser; only the
	parsed value can be affected (through Applicative). What if we
	parse either "name:" or "age:" and then want to parse either a
	string (the name) or an int (the age)? Or e.g. based on a header,
	change the body parser.

	One solution:

	> bind' :: Parser' a -> (a -> Parser' b) -> Parser' b
	> bind' pa apb str = case pa str of
	> Nothing -> Nothing
	> Just (a, rest) -> (apb a) rest

	> instance Monad Parser where
	> return = pure
	> (Parser pa) >>= apb =
	> Parser $ \str -> case pa str of
	> Nothing -> Nothing
	> Just (x, rest) -> parse (apb x) rest

	Let's illustrate by parsing an abstract "packet" which is a
	number n and then n characters.

	> packet :: Parser String
	> packet = number >>= \n -> count n item

	Chaining commands with bind (>>=) gets complicated very
	quick. The solution is "do"-notation:

	> packet2 :: Parser String
	> packet2 = do
	> n <- number
	> count n item