Skip to content

Instantly share code, notes, and snippets.

@jisantuc
Created January 9, 2018 20:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jisantuc/f4cd17e4ebb517f2e7e282a694069b70 to your computer and use it in GitHub Desktop.
Save jisantuc/f4cd17e4ebb517f2e7e282a694069b70 to your computer and use it in GitHub Desktop.
# This is a dumb graph
/* Look at this comment it's going
* to get multi-line ignored
*/
# This is heavily simplified to ease the parsing problem
digraph exampleGraph {
a -> b;
b -> c;
a -> k;
a -> l;
b -> d;
c -> e;
e -> f;
f -> g;
h -> i;
f -> h;
d -> h;
}
module GraphDot where
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Map.Strict as M
import qualified Data.ByteString as BS
import Data.Attoparsec.ByteString.Char8
import qualified Data.Attoparsec.Combinator as Comb
import Control.Applicative
data GraphType = Graph | DiGraph deriving (Eq, Show)
data Node = Node { name :: String
, descendants :: [Node] } deriving (Eq, Show)
multLineCommentParser :: Parser [Char]
multLineCommentParser =
many' space
>> string "/*"
>> manyTill anyChar (string "*/\n")
singleLineCommentParser :: Parser [Char]
singleLineCommentParser =
many' space
>> char '#'
>> manyTill anyChar (char '\n')
commentParser :: Parser [Char]
commentParser =
singleLineCommentParser <|> multLineCommentParser
nodeParser :: Parser Node
nodeParser =
do
_ <- many' space
nodeName <- many' (letter_ascii <|> digit)
child <- (string " -> " >> many' (letter_ascii <|> digit))
end <- (char ';' >> many' space)
return $ Node nodeName [Node child []];
graphTypeParser :: Parser GraphType
graphTypeParser =
do
graphType <- (string "graph" <|> string "digraph")
return $ if (graphType == "graph") then Graph else DiGraph
dotToNodesParser :: Parser [Node]
dotToNodesParser =
do
skipMany commentParser
graphType <- graphTypeParser
graphName <- (space >> manyTill anyChar space <* (char '{' >> endOfLine))
nodes <- many' nodeParser
char '}' >> many' anyChar
return nodes
addToMap :: M.Map String [Node] -> Node -> M.Map String [Node]
addToMap m n =
case M.lookup (name n) m of
Nothing -> M.insert (name n) (descendants n) m
Just nodes -> M.insert (name n) (descendants n ++ nodes) m
mergeNodes :: [Node] -> M.Map String [Node]
mergeNodes nodes =
let
m = M.empty
in
foldl addToMap m nodes
main =
do
inf <- BS.readFile "data/graph.dot"
nodeList <- return $ parseOnly dotToNodesParser inf
return $ mergeNodes <$> nodeList
module InChapterExercises where
import Control.Applicative
import Text.Trifecta
-- some parsers
one :: Parser Char
one = char '1'
two :: Parser Char
two = char '2'
oneTwo :: Parser Char
oneTwo = one >> two
oneOrTwo :: Parser Char
oneOrTwo = one <|> two
-- Parsing Exercises
-- 1. Make a parser fail by not exhausting all the input
oneTwoFail :: Parser Char
oneTwoFail = oneTwo <* eof
-- parseString oneTwoFail mempty "12"
-- parseString oneTwoFail mempty "123"
-- 2. Use string to make a parser that parses "1", "123", and "123"
oneOrOneTwoOrOneTwoThree :: Parser String
oneOrOneTwoOrOneTwoThree =
string "123" <* eof
<|> string "12" <* eof
<|> string "1" <* eof
-- 3. a parser that does what string does, but using char
-- I think this is the idea?
stringLikeChar :: Parser [Char]
stringLikeChar = do
one <- char '1'
two <- char '2'
three <- char '3'
return $ [one, two, three]
-- Unit of Success
-- This isn't a great return type when our intent is obviously to get the integer
aUnit :: Result ()
aUnit = parseString (integer >> eof) mempty "123"
-- making it better by returning the integer
intParser :: Parser Integer
intParser = integer <* eof
anInt :: Result Integer
anInt = parseString intParser mempty "123"
aFailedInt :: Result Integer
aFailedInt = parseString intParser mempty "123abc"
-- Alternatives
type NumberOrString =
Either Integer String
a :: String
a = "blah"
b :: String
b = "123"
c :: String
c = "123blah789"
-- quasiquotes weren't working for me for some reason
eitherOr :: String
eitherOr = "123\n\
\abc\n\
\456\n\
\def\n"
parseNos :: Parser NumberOrString
parseNos =
Left <$> integer
<|> Right <$> (some letter)
-- and quasiquotes not working apparently trickled into this not working the
-- way the book had it written, so i improvised
doTheThing =
let
f = parseString parseNos mempty
in
print $ sequenceA (f <$> lines eitherOr)

Parser Combinators

Important new operators

(<|>) :: Alternative f => f a -> f a -> f a
(<*) :: Applicative f => f a -> f b -> f a
(*>) :: Applicative f => f a -> f b -> f b
  • (<|>) lets us pick whichever of left and right is successful
  • (<*) lets us evaluate f a and f b returning only the result of f a (it's strict in f b and f b)
  • (*>) is (<*) the opposite of (<*)
  • all of these are in Control.Applicative

Parser Combinators

Think of a cursor moving through a string-like object. Parser combinators tell the cursor when to move and what to do with what it encounters.

Example:

char '1'

tells the cursor "give me the character '1'".

Parser combinators themselves don't do anything -- they only describe how to do things. To do something, we need some function (pseudocode) Parser a -> ... -> String -> Result a. This function frequently starts with parse. In trifecta, it's parseString or parseByteString if you're feeling fancy. In attoparsec it's parse and parseOnly ('parseOnly stops when it's done and returns an Either, parse returns a Result holding the remaining things to parse and what was successfully parsed).

Example:

parseString (char '1') mempty :: String -> Result Char
parseString (char '1') mempty "2" :: Result Char
parseString (char '1') mempty "1" :: Result Char

Alternatives

(<|>) is from the Alternative typeclass, with the operator exposed in Control.Applicative.

Here's what it looks like for Maybe:

instance Alternative Maybe where
    empty = Nothing
    Nothing <|> r = r
    l       <|> _ = l

The Alternative typeclass also provides some and many methods. some repeats a parser at least once, while many repeats a parser any number of times (including zero):

parseString (some integer) mempty "a" -- Failure
parseString (some integer) mempty "a" -- Success []

Tokenizing

Just don't. The parsers in Text.Trifecta.Char are tokenizers already. If you need custom tokenizing, worry about how to do that then.

Polymorphic Parsers

Because parser combinators are build off of parsing typeclasses, you can have function signatures like:

parseFraction :: (Monad m, TokenParsing m) => m Rational

and maintain simultaneous support for whatever parsers anyone has written or will write now or in the future.

Marshalling form an AST to a datatype

You can encounter the problem where you want

real world data -> YourDataType

and

YourDataType -> real world data

These are unmarshalling and marshalling, respectively.

We can do this with aeson by providing FromJSON and ToJSON typeclasses for our datatype, but it's sort of boring compared to unmarshalling problems in Advent of Code and they're autoderivable because we live in the future, so skipping.

A parser combinator in the wild

This isn't a very good one but here it is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment