Skip to content

Instantly share code, notes, and snippets.

@benshimmin
Created August 2, 2012 16:36
Show Gist options
  • Save benshimmin/3238456 to your computer and use it in GitHub Desktop.
Save benshimmin/3238456 to your computer and use it in GitHub Desktop.
How to extract all links and link text from a Markdown file using pandoc
-- adapted from <http://johnmacfarlane.net/pandoc/scripting.html>
import Text.Pandoc
import Text.Pandoc.Shared (stringify)
extractURL :: Inline -> [String]
extractURL (Link txt (u,_)) = [stringify txt ++ " : <" ++ u ++ ">"]
extractURL (Image _ (u,_)) = [u]
extractURL _ = []
extractURLs :: Pandoc -> [String]
extractURLs = queryWith extractURL
readDoc :: String -> Pandoc
readDoc = readMarkdown defaultParserState
main :: IO ()
main = interact (unlines . extractURLs . readDoc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment