| {-# LANGUAGE ConstraintKinds #-} | |
| {-# LANGUAGE GeneralizedNewtypeDeriving #-} | |
| module ParserCombinators where | |
| {- | |
| We'll build a set of parser combinators from scratch demonstrating how | |
| they arise as a monad transformer stack. Actually, how they arise as a | |
| choice between two different monad transformer stacks! | |
| -} | |
| -------------------------------------------------------------------------------- | |
| import Control.Applicative | |
| import Control.Monad | |
| import Control.Monad.State | |
| {- | |
| Parser combinators are computations which, behind the scenes, consume | |
| a stream of inbound characters producing /parsed partial results/ or | |
| failing due to failing to match the grammar. | |
| We'll describe the most basic interface in a typeclass. A parser | |
| combinator type must be | |
| 1. a 'Functor' where 'fmap' changes the partial result without any side effect, | |
| 2. a 'Monad' and an 'Applicative' where '(>>=)'/'(<*>)' indicates /sequencing/, | |
| 3. an 'Alternative' where '(<|>)' indicates (left-biased) choice, and finally | |
| 4. an instance of 'Parses' defined below which gives us the most basic parser | |
| -} | |
| class Parses p where | |
| satisfy :: (Char -> Bool) -> p Char | |
| -- We'll use ConstraintKinds to describe this overall set of | |
| -- constraints. | |
| type IsParser f | |
| = ( Parses f | |
| , Functor f | |
| , Applicative f | |
| , Alternative f | |
| , MonadPlus f -- this is just for completeness, MonadPlus and | |
| -- Alternative are the same thing! | |
| , Monad f | |
| ) | |
| -- And now we already can begin to write parser combinators, although, | |
| -- honestly, each of these types is *far* too restrictive. | |
| char :: Parses p => Char -> p Char | |
| char c = satisfy (== c) | |
| parensM :: (Parses m, Monad m) => m b -> m b | |
| parensM p = do | |
| char '(' | |
| res <- p | |
| char ')' | |
| return res | |
| many1M :: MonadPlus p => p a -> p [a] | |
| many1M p = do | |
| a <- p | |
| as <- manyM p | |
| return (a:as) | |
| manyM :: MonadPlus p => p a -> p [a] | |
| manyM p = many1M p `mplus` return [] | |
| option :: Alternative p => p a -> p (Maybe a) | |
| option p = fmap Just p <|> pure Nothing | |
| choice :: Alternative p => [p a] -> p a | |
| choice = foldr (<|>) empty | |
| -- The @XM@ names are used to emphasize that these instances are | |
| -- needlessly monadic. | |
| parens :: (Parses p, Alternative p) => p b -> p b | |
| parens p = (\_ a _ -> a) <$> char '(' <*> p <*> char ')' | |
| many1 :: Alternative p => p a -> p [a] | |
| many1 p = liftA2 (:) p (many p <|> pure []) | |
| -- We'll eventually design our parser implementations to satisfy all | |
| -- of 'IsParser', so since all of these combinators demand only | |
| -- subsets of the constraints in 'IsParser' we'll ultimately be able | |
| -- to use them all. | |
| sepBy :: Alternative f => f a -> f s -> f [a] | |
| sepBy p s = liftA2 (:) p ((s *> sepBy1 p s) <|> pure []) <|> pure [] | |
| sepBy1 :: Alternative f => f a -> f s -> f [a] | |
| sepBy1 p s = scan where scan = liftA2 (:) p ((s *> scan) <|> pure []) | |
| -- Also note that these are exactly the definitions used in real | |
| -- parser combinator libraries like Attoparsec. | |
| -- Because our parser combinator basis does not actually demand a | |
| -- specific implementation of the parser type, we're also free to, for | |
| -- instance, parse context sensitive grammars by stacking on | |
| -- constraints like 'MonadState' | |
| type IsContextSensitive s p = ( IsParser p, MonadState s p ) | |
| -- So let's implement this now. | |
| -------------------------------------------------------------------------------- | |
| {- | |
| Above we implemented a basis from which we can operate a | |
| parser. Anything which satisfies the entire 'IsParser' interface | |
| probably isn't lying. Let's examine two of these and see how they | |
| arise as monad transformers. | |
| The most basic parser is one which intakes an imput string and returns | |
| 'Maybe' the parsed interpretation of that string. That would nearly be | |
| a perfect specification, except it doesn't talk about how to handle | |
| "leftovers" if you only need to use part of the input to determine | |
| whether or not this is a valid parse. | |
| Instead of throwing leftovers away, we'll keep them around. | |
| -} | |
| runParser1 :: Parser1 a -> (String -> Maybe (a, String)) | |
| runParser1 (Parser1 go) inp = go inp | |
| newtype Parser1 a = Parser1 (String -> Maybe (a, String)) | |
| {- | |
| We can see that Parser1 implements the entire interface we need. I'll | |
| just write that out, but we'll exmaine how it works in a little | |
| bit. For now, just skip the following code and take it as material | |
| proof that Parser1 is a sufficient design. | |
| -} | |
| instance Functor Parser1 where | |
| fmap f (Parser1 go) = Parser1 $ \inp -> do | |
| -- using the Maybe monad here | |
| (a, outp) <- go inp | |
| return (f a, outp) | |
| instance Applicative Parser1 where | |
| pure = return | |
| p1 <*> p2 = do | |
| f <- p1 | |
| x <- p2 | |
| return (f x) | |
| instance Alternative Parser1 where | |
| empty = Parser1 (\_ -> Nothing) | |
| p1 <|> p2 = Parser1 $ \inp -> | |
| case runParser1 p1 inp of | |
| Nothing -> runParser1 p2 inp | |
| Just x -> Just x | |
| instance Monad Parser1 where | |
| -- passes the input string straight through to the output | |
| return a = Parser1 (\inp -> return (a, inp)) | |
| p >>= f = Parser1 $ \inp -> do | |
| (a, outp1) <- runParser1 p inp | |
| (b, outp2) <- runParser1 (f a) outp1 | |
| return (b, outp2) | |
| instance Parses Parser1 where | |
| satisfy pred = Parser1 $ \inp -> | |
| case inp of | |
| [] -> Nothing | |
| c : cs | pred c -> Just (c, cs) | |
| | otherwise -> Nothing | |
| {- | |
| Phew! | |
| So why does all of the code above work? Why should we believe that we | |
| could ever implement this stuff? | |
| Because if we look carefully at 'Parser1' we see that it's \"just\" | |
| the same as 'Parser2' | |
| -} | |
| newtype Parser2 a = | |
| Parser2 (StateT String Maybe a) | |
| deriving ( Functor, Monad, Applicative, Alternative ) | |
| runParser2 :: Parser2 a -> String -> Maybe (a, String) | |
| runParser2 (Parser2 go) inp = runStateT go inp | |
| instance Parses Parser2 where | |
| satisfy pred = Parser2 $ do | |
| inp <- get | |
| case inp of | |
| [] -> fail "empty input" | |
| c : cs | pred c -> put cs >> return c | |
| | otherwise -> fail "satisfy" | |
| {- | |
| In other words, if we recognize that 'Parser1' is a monad transformer | |
| stack of 'State' and 'Maybe' then all of the interfaces \"write | |
| themselves\". In fact, if you go above and examine the interfaces | |
| against the code for the implementation of 'State' then you'll see a | |
| lot of similarities. | |
| Which leads to the question: why are parsers equal to a combination of | |
| 'State' and 'Maybe' (failure)? Well, that seems to be a reasonable | |
| description of what parsers do---we simply must restrict our notion of | |
| state as being /causal/ in that we try to handle the present state | |
| along a stream and throw it away when we're done: this is not at all | |
| unlike @MonadState (Stream a)@. | |
| This might make one wonder if we could change out the | |
| components. And we can! | |
| For instance, a /backtracking/ monadic parser combinator | |
| implementation which satisfies all of the prior interface is just the | |
| transformer stack of 'State' and '[]'. | |
| -} | |
| newtype Parser3 a = | |
| Parser3 { runParser3 :: StateT String [] a } | |
| deriving ( Functor, Monad, Applicative, Alternative ) | |
| -- Note that this is literally a copy-and-paste job from | |
| -- above... That's because this implementation works for /any/ | |
| -- underlying monadic layer which has a notion of failure. | |
| instance Parses Parser3 where | |
| satisfy pred = Parser3 $ do | |
| inp <- get | |
| case inp of | |
| [] -> fail "empty input" | |
| c : cs | pred c -> put cs >> return c | |
| | otherwise -> fail "satisfy" | |
| {- | |
| Here, we see that the @runX@ function denotes the idea that this | |
| parser succeeds 0-or-more times just like we'd expect from a | |
| non-deterministic, backtracking parse. | |
| The list monad embodies depth-first search by default. What if we want | |
| something breadth-first? Or if we want to ensure fairness in context | |
| of an infinite space of potential parses? | |
| We could rip out the '[]' layer and replace it with a monad from the | |
| @logict@ or @omega@ packages, but instead let's just take advantage of | |
| the idea that all we need to actually rely on is the /failure effect/: | |
| -} | |
| newtype Parser4 m a = | |
| Parser4 { runParser4 :: StateT String m a } | |
| deriving ( Functor, Monad, Applicative, Alternative ) | |
| instance FailWith m => Parses (Parser4 m) where | |
| satisfy pred = Parser4 $ do | |
| inp <- get | |
| case inp of | |
| [] -> failWith "empty input" | |
| c : cs | pred c -> put cs >> return c | |
| | otherwise -> failWith "satisfy" | |
| -- FailWith just let's us be a bit more specific about why we're | |
| -- failing. The 'fail' implementation built into 'Monad' is widely | |
| -- considered to be a mistake since it often has really bad default | |
| -- implementations---not all monads actually have a notion of failure! | |
| class Monad m => FailWith m where | |
| failWith :: String -> m a | |
| instance FailWith Maybe where | |
| failWith _ = Nothing | |
| instance FailWith [] where | |
| failWith _ = [] | |
| instance FailWith m => FailWith (StateT s m) where | |
| failWith reason = lift (failWith reason) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment