Last active
December 19, 2015 17:09
-
-
Save jferris/5989476 to your computer and use it in GitHub Desktop.
Find URLs that redirect in stdin and replace them with their target URL in stdout
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import Control.Applicative | |
import Data.List.Utils | |
import Data.Maybe | |
import Network.HTTP | |
import Network.HTTP.Headers | |
import System.IO | |
import Text.Regex.PCRE | |
main = do | |
line <- getLine | |
replacedLine <- replaceRedirects line | |
putStrLn replacedLine | |
eof <- isEOF | |
if eof then return () else main | |
replaceRedirects :: String -> IO String | |
replaceRedirects line = foldl replaceRedirect (pure line) $ matchUrls line | |
replaceRedirect :: IO String -> String -> IO String | |
replaceRedirect line url = | |
replace <$> pure url <*> targetUrlFromUrl url <*> line | |
targetUrlFromUrl :: String -> IO String | |
targetUrlFromUrl sourceUrl = | |
simpleHTTP (getRequest sourceUrl) >>= | |
return . either (\_ -> sourceUrl) targetUrl | |
where targetUrl = fromMaybe sourceUrl . findHeader HdrLocation | |
matchUrls :: String -> [String] | |
matchUrls line = map head $ line =~ "http://[^ ]+" |
Cool!
To address the main
function, I would just write something like this:
main :: IO ()
main = myInteract replaceRedirects
myInteract :: (String -> IO String) -> IO ()
myInteract f = putStrLn =<< f =<< hGetContents stdin
The other thing I noticed is that you've got a lot of IO
going far into the logic of the program. Ideally, the IO
should occur at only the top level of the program, with the major logic occurring in pure functions. All the Applicative hoops you have to jump through might be an indication something's amiss there.
One way I think you can get to that point would be to make the GET requests and build up a "TODO" of replacements in the same general area you have to do the IO of reading and printing. Then the machinery of making those replacements can be pure functions.
My attempt to go that direction turned out like this:
import Control.Monad
data Replacement = Replacement
{ from :: String
, to :: String
}
main :: IO ()
main = do
lines' <- fmap lines $ hGetContents stdin
forM_ lines' $ \line -> do
replacements <- getReplacements line
putStrLn $ replaceAll line replacements
getReplacements :: String -> IO [Replacement]
getReplacements line = forM (matchUrls line) $ \url -> do
return . Replacement url =<< targetUrlFromUrl url
replaceAll :: String -> [Replacement] -> String
replaceAll = foldl go
where
go :: String -> Replacement -> String
go line (Replacement from to) = replace from to line
I dunno, my 2c anyway.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This checks a file, line by line, for URLs. For each URL, it makes a GET request. If that request redirects, it replaces the original URL with the redirect target.
The
main
function sucks because I couldn't figure out a way to properly liftinteract
(it takesString -> String
and I haveString -> IO String
) and I couldn't another functional way of handling the EOF loop.