Skip to content

Instantly share code, notes, and snippets.

@ririw
Last active January 1, 2016 21:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ririw/8205284 to your computer and use it in GitHub Desktop.
Save ririw/8205284 to your computer and use it in GitHub Desktop.
Space inefficient program
ghc -O2 -rtsopts -threaded -prof -fprof-auto -fforce-recomp reader.hs
time ./reader +RTS -K1G -sstderr -pa -A3M
Mon Dec 30 23:37 2013 Time and Allocation Profiling Report (Final)
reader +RTS -K1G -sstderr -pa -A3M -RTS
total time = 5.73 secs (5726 ticks @ 1000 us, 1 processor)
total alloc = 1,920,697,176 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc ticks bytes
GC GC 52.4 0.0 3001 0
popLink Main 15.3 29.5 877 566607296
generalIndexer Main 12.8 15.7 735 301188920
resourceName Main 8.9 27.7 511 531200856
indexLinks.indexLoop Main 3.2 5.0 185 96016120
link Main 2.7 7.3 157 140799648
comment Main 2.0 6.3 117 121599960
indexLinks.insertLink Main 0.9 4.2 51 79999800
linkLine Main 0.6 1.7 34 31999920
linkLineParser Main 0.5 2.7 30 51200000
OVERHEAD_of PROFILING 0.3 0.0 16 0
SYSTEM SYSTEM 0.2 0.0 11 16352
MAIN MAIN 0.0 0.0 1 6768
IDLE IDLE 0.0 0.0 0 0
PINNED SYSTEM 0.0 0.0 0 0
DONT_CARE MAIN 0.0 0.0 0 0
CAF GHC.Integer.Type 0.0 0.0 0 0
CAF GHC.Integer.Logarithms.Internals 0.0 0.0 0 320
CAF GHC.IO.Encoding.Failure 0.0 0.0 0 0
CAF GHC.Real 0.0 0.0 0 0
CAF GHC.Float 0.0 0.0 0 0
CAF GHC.Event.PSQ 0.0 0.0 0 0
CAF GHC.IO.Handle.Types 0.0 0.0 0 0
CAF GHC.IO.Encoding.UTF8 0.0 0.0 0 0
CAF GHC.IO.Encoding.UTF32 0.0 0.0 0 0
CAF GHC.IO.Encoding.UTF16 0.0 0.0 0 0
CAF GHC.Enum 0.0 0.0 0 0
CAF GHC.Event.Manager 0.0 0.0 0 0
CAF GHC.Event.Clock 0.0 0.0 0 0
CAF Foreign.Marshal.Alloc 0.0 0.0 0 0
CAF Data.Typeable.Internal 0.0 0.0 0 0
CAF GHC.Event.Internal 0.0 0.0 0 32
CAF GHC.Event.EPoll 0.0 0.0 0 0
CAF GHC.Event.Control 0.0 0.0 0 0
CAF GHC.Int 0.0 0.0 0 0
CAF GHC.IO.Encoding.Iconv 0.0 0.0 0 248
CAF GHC.IO.FD 0.0 0.0 0 32
CAF GHC.Conc.Sync 0.0 0.0 0 0
CAF System.Posix.Internals 0.0 0.0 0 0
CAF Data.Maybe 0.0 0.0 0 0
CAF GHC.Show 0.0 0.0 0 0
CAF GHC.IO.Encoding 0.0 0.0 0 3376
CAF GHC.Exception 0.0 0.0 0 0
CAF GHC.Conc.Signal 0.0 0.0 0 672
CAF GHC.Arr 0.0 0.0 0 0
CAF GHC.Event.Thread 0.0 0.0 0 904
CAF GHC.TopHandler 0.0 0.0 0 0
CAF GHC.List 0.0 0.0 0 0
CAF GHC.IO.Handle.Text 0.0 0.0 0 0
CAF GHC.IO.Exception 0.0 0.0 0 0
CAF Control.Exception.Base 0.0 0.0 0 0
CAF GHC.IO.Handle.Internals 0.0 0.0 0 0
CAF GHC.IO.Handle.FD 0.0 0.0 0 34672
CAF GHC.IO.Handle 0.0 0.0 0 0
CAF GHC.ForeignPtr 0.0 0.0 0 0
CAF GHC.Err 0.0 0.0 0 0
CAF Data.ByteString 0.0 0.0 0 0
CAF Data.Map 0.0 0.0 0 0
CAF Data.Attoparsec.ByteString.FastSet 0.0 0.0 0 0
CAF Data.Attoparsec.Internal.Types 0.0 0.0 0 0
CAF Data.Attoparsec.ByteString.Internal 0.0 0.0 0 0
rnf Main 0.0 0.0 0 0
rnf Main 0.0 0.0 0 0
showList Main 0.0 0.0 0 0
showsPrec Main 0.0 0.0 0 0
showList Main 0.0 0.0 0 0
showsPrec Main 0.0 0.0 0 0
showList Main 0.0 0.0 0 0
showsPrec Main 0.0 0.0 0 0
main Main 0.0 0.0 0 19496
indexLinks Main 0.0 0.0 0 224
linkspath Main 0.0 0.0 0 1256
popLink.parsed Main 0.0 0.0 0 0
CAF Main 0.0 0.0 0 304
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc ticks bytes
MAIN MAIN 53 0 0.0 0.0 100.0 100.0 1 6768
main Main 107 0 0.0 0.0 47.1 100.0 0 19496
indexLinks Main 109 1 0.0 0.0 47.1 100.0 0 224
indexLinks.indexLoop Main 110 400001 3.2 5.0 47.1 100.0 185 96016120
indexLinks.insertLink Main 116 400000 0.9 4.2 13.7 19.8 51 79999800
generalIndexer Main 123 799998 12.8 15.7 12.8 15.7 735 301188920
popLink Main 111 400000 15.3 29.5 30.1 75.2 877 566607296
linkLineParser Main 113 0 0.5 2.7 14.8 45.6 30 51200000
comment Main 115 0 2.0 6.3 14.3 43.0 117 121599960
linkLine Main 118 0 0.6 1.7 12.3 36.7 34 31999920
resourceName Main 120 0 8.9 27.7 11.7 35.0 511 531198672
link Main 122 0 2.7 7.3 2.7 7.3 157 140799648
CAF Main 105 0 0.0 0.0 0.0 0.0 0 304
link Main 121 1 0.0 0.0 0.0 0.0 0 0
resourceName Main 119 1 0.0 0.0 0.0 0.0 0 2184
linkLine Main 117 1 0.0 0.0 0.0 0.0 0 0
comment Main 114 1 0.0 0.0 0.0 0.0 0 0
linkLineParser Main 112 1 0.0 0.0 0.0 0.0 0 0
linkspath Main 108 1 0.0 0.0 0.0 0.0 0 1256
main Main 106 1 0.0 0.0 0.0 0.0 0 0
CAF Data.Attoparsec.ByteString.Internal 104 0 0.0 0.0 0.0 0.0 0 0
CAF Data.Attoparsec.Internal.Types 103 0 0.0 0.0 0.0 0.0 0 0
CAF Data.Attoparsec.ByteString.FastSet 102 0 0.0 0.0 0.0 0.0 0 0
CAF Data.Map 101 0 0.0 0.0 0.0 0.0 0 0
CAF Data.ByteString 100 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.Err 99 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.ForeignPtr 98 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.IO.Handle 97 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.IO.Handle.FD 96 0 0.0 0.0 0.0 0.0 0 34672
CAF GHC.IO.Handle.Internals 95 0 0.0 0.0 0.0 0.0 0 0
CAF Control.Exception.Base 94 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.IO.Exception 93 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.IO.Handle.Text 92 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.List 91 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.TopHandler 90 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.Event.Thread 89 0 0.0 0.0 0.0 0.0 0 904
CAF GHC.Arr 88 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.Conc.Signal 87 0 0.0 0.0 0.0 0.0 0 672
CAF GHC.Exception 86 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.IO.Encoding 85 0 0.0 0.0 0.0 0.0 0 3376
CAF GHC.Show 84 0 0.0 0.0 0.0 0.0 0 0
CAF Data.Maybe 83 0 0.0 0.0 0.0 0.0 0 0
CAF System.Posix.Internals 82 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.Conc.Sync 81 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.IO.FD 80 0 0.0 0.0 0.0 0.0 0 32
CAF GHC.IO.Encoding.Iconv 79 0 0.0 0.0 0.0 0.0 0 248
CAF GHC.Int 78 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.Event.Control 77 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.Event.EPoll 76 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.Event.Internal 75 0 0.0 0.0 0.0 0.0 0 32
CAF Data.Typeable.Internal 74 0 0.0 0.0 0.0 0.0 0 0
CAF Foreign.Marshal.Alloc 73 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.Event.Clock 72 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.Event.Manager 71 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.Enum 70 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.IO.Encoding.UTF16 69 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.IO.Encoding.UTF32 68 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.IO.Encoding.UTF8 67 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.IO.Handle.Types 66 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.Event.PSQ 65 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.Float 64 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.Real 63 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.IO.Encoding.Failure 62 0 0.0 0.0 0.0 0.0 0 0
CAF GHC.Integer.Logarithms.Internals 61 0 0.0 0.0 0.0 0.0 0 320
CAF GHC.Integer.Type 60 0 0.0 0.0 0.0 0.0 0 0
SYSTEM SYSTEM 59 0 0.2 0.0 0.2 0.0 11 16352
GC GC 58 0 52.4 0.0 52.4 0.0 3001 0
OVERHEAD_of PROFILING 57 0 0.3 0.0 0.3 0.0 16 0
DONT_CARE MAIN 56 0 0.0 0.0 0.0 0.0 0 0
PINNED SYSTEM 55 0 0.0 0.0 0.0 0.0 0 0
IDLE IDLE 54 0 0.0 0.0 0.0 0.0 0 0
{-# LANGUAGE OverloadedStrings #-}
import Prelude hiding (takeWhile, take)
import Data.Attoparsec.Char8
import Control.Applicative
import qualified Data.ByteString.Char8 as BS
import System.IO hiding (hGetLine)
import Control.Monad.Loops
import Control.Monad
import qualified Data.Map as M
import Data.Maybe
import Data.Hashable
import Control.Parallel
import System.Random
import Debug.Trace
import qualified Data.Foldable (sum)
import Data.List (foldl')
import Data.IORef
import Data.HashTable as HT
import Control.DeepSeq
import Data.Int
data NTEntry = NTEntry Resource Schema Content deriving Show
data NTLink = NTLink From To deriving Show
type From = BS.ByteString
type To = BS.ByteString
type Resource = BS.ByteString
type Schema = BS.ByteString
type Content = BS.ByteString
instance NFData NTEntry where
rnf (NTEntry r s c) = (rnf r) `seq` (rnf s) `seq` (rnf c) `seq` ()
instance NFData BS.ByteString where
rnf bs = bs `seq` ()
type Handlemap = M.Map Int [Integer]
data TypedHandleMap a = TypedHandleMap (M.Map Int [Integer]) deriving Show
--instance Hashable BS.ByteString where
--hash = hash . BS.unpack
type IXTable = HT.HashTable Resource [Integer]
main = do
f <- openFile linkspath ReadMode
indexLinks f
indexLinks :: Handle -> IO IXTable
indexLinks f = do
table <- (HT.new (==) (fromIntegral . hash)) :: IO (IXTable)
indexLoop f table
return table
where
indexLoop ::
Handle ->
IXTable ->
IO ()
indexLoop f table = do
ended <- hIsEOF f
if ended
then return ()
else do
v <- popLink f
insertLink table v
indexLoop f table
insertLink ::
IXTable
-> (Maybe (Integer, Maybe NTLink))
-> IO ()
insertLink _ Nothing = return ()
insertLink _ (Just (_, Nothing)) = return ()
insertLink table (Just (pos, Just (NTLink f t))) = do
generalIndexer table f pos
generalIndexer table t pos
generalIndexer ::
IXTable
-> Resource -> Integer
-> IO ()
generalIndexer index b pos = do
l <- HT.lookup index b
case l of
Nothing -> insert index b [pos]
Just ls -> insert index b (pos : ls)
linkspath = "page_links_en.nt_"
abstractspath = "short_abstracts_en.nt"
popAbstract :: Handle -> IO (Maybe (Integer, Maybe NTEntry))
popAbstract h = do
end <- hIsEOF h
if end
then return Nothing
else do
lineStart <- hTell h
line <- BS.hGetLine h
let parsed = force (parseOnly abstractParser line) in
parsed `par` case parsed of
Left error -> do
hPutStrLn stderr $ show error
return $ Just (lineStart, Nothing)
Right v ->
return $ Just (lineStart, v)
popLink :: Handle -> IO (Maybe (Integer, Maybe NTLink))
popLink h = do
end <- hIsEOF h
if end
then return Nothing
else do
lineStart <- hTell h
line <- BS.hGetLine h
let parsed = parseOnly linkLineParser line in
parsed `par`
case (parseOnly linkLineParser line) of
Left error -> do
hPutStrLn stderr $ show error
return $ Just (lineStart, Nothing)
Right v ->
return $ Just (lineStart, v)
lazy_untilM :: Monad m => m a -> m Bool -> m [a]
lazy_untilM action test = do
a <- action
t <- test
if t
then return [a]
else do
r <- lazy_untilM action test
return $ a:r
linkLineParser :: Parser (Maybe NTLink)
linkLineParser =
(comment >> return Nothing)
<|> linkLine
linkLine = do
char '<'
from <- resourceName
char '>'
skipWhile isSpace
char '<'
link
char '>'
skipWhile isSpace
char '<'
to <- resourceName
char '>'
skipWhile ((/=) '\n')
return . Just $ NTLink from to
resourceName = do
string "http://dbpedia.org/resource/"
takeWhile ((/= '>'))
abstractParser :: Parser (Maybe NTEntry)
abstractParser =
(comment >> return Nothing)
<|> entry
comment :: Parser ()
comment = do
skipWhile isSpace
char '#'
skipWhile ((/=) '\n')
entry :: Parser (Maybe NTEntry)
entry = do
skipWhile isSpace
char '<'
resource <- resourceName
char '>'
skipWhile isSpace
char '<'
schema <- link
char '>'
skipWhile isSpace
char '"'
content <- quotedString
char '"'
char '@'
lang <- language
skipWhile ((/=) '\n')
case lang of
"en" -> return $ Just $ NTEntry resource schema content
_ -> return Nothing
link = takeWhile ((/=) '>')
quotedString = takeWhile ((/=) '"')
language = take 2
# started 2013-08-04T11:34:31Z
<http://dbpedia.org/resource/AccessibleComputing> <http://dbpedia.org/ontology/wikiPageWikiLink> <http://dbpedia.org/resource/Computer_accessibility> .
<http://dbpedia.org/resource/AfghanistanGeography> <http://dbpedia.org/ontology/wikiPageWikiLink> <http://dbpedia.org/resource/Geography_of_Afghanistan> .
<http://dbpedia.org/resource/AfghanistanHistory> <http://dbpedia.org/ontology/wikiPageWikiLink> <http://dbpedia.org/resource/History_of_Afghanistan> .
<http://dbpedia.org/resource/AfghanistanPeople> <http://dbpedia.org/ontology/wikiPageWikiLink> <http://dbpedia.org/resource/Demography_of_Afghanistan> .
<http://dbpedia.org/resource/AfghanistanCommunications> <http://dbpedia.org/ontology/wikiPageWikiLink> <http://dbpedia.org/resource/Communications_in_Afghanistan> .
<http://dbpedia.org/resource/AfghanistanMilitary> <http://dbpedia.org/ontology/wikiPageWikiLink> <http://dbpedia.org/resource/Military_of_Afghanistan> .
<http://dbpedia.org/resource/AfghanistanTransportations> <http://dbpedia.org/ontology/wikiPageWikiLink> <http://dbpedia.org/resource/Transport_in_Afghanistan> .
<http://dbpedia.org/resource/AfghanistanTransnationalIssues> <http://dbpedia.org/ontology/wikiPageWikiLink> <http://dbpedia.org/resource/Foreign_relations_of_Afghanistan> .
<http://dbpedia.org/resource/AmoeboidTaxa> <http://dbpedia.org/ontology/wikiPageWikiLink> <http://dbpedia.org/resource/Amoeboid> .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment