Skip to content

Instantly share code, notes, and snippets.

@quasiben
Last active August 29, 2015 14:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save quasiben/2c7b5199ac13227bc4eb to your computer and use it in GitHub Desktop.
Save quasiben/2c7b5199ac13227bc4eb to your computer and use it in GitHub Desktop.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/quasiben/Research/ContinuumDev/Memex/nutch_application/nutch/runtime/local/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/quasiben/anaconda/envs/nutchpy/lib/python2.7/site-packages/nutchpy/java_libs/seqreader-app-1.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2014-09-29 16:31:55.463 java[17872:5403] Unable to load realm info from SCDynamicStore
URL: /var/folders/1t/t94brwgx7sjcn8jgz4gr3_c00000gq/T/tmpaJeuZN
14/09/29 16:31:56 INFO crawl.Injector: Injector: starting at 2014-09-29 16:31:56
14/09/29 16:31:56 INFO crawl.Injector: Injector: crawlDb: /Users/quasiben/Research/ContinuumDev/Memex/nutchpy/crawl
14/09/29 16:31:56 INFO crawl.Injector: Injector: urlDir: /var/folders/1t/t94brwgx7sjcn8jgz4gr3_c00000gq/T/tmpaJeuZN
14/09/29 16:31:56 INFO crawl.Injector: tempDir: /tmp/hadoop-quasiben/mapred/temp/inject-temp-587370943
14/09/29 16:31:56 INFO crawl.Injector: CONF: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, nutch-default.xml, nutch-site.xml
14/09/29 16:31:56 INFO crawl.Injector: Injector: Converting injected urls to crawl db entries.
14/09/29 16:31:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/09/29 16:31:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/09/29 16:31:56 WARN snappy.LoadSnappy: Snappy native library not loaded
14/09/29 16:31:56 INFO mapred.FileInputFormat: Total input paths to process : 1
14/09/29 16:31:56 INFO mapred.JobClient: Running job: job_local409772736_0001
14/09/29 16:31:56 INFO mapred.LocalJobRunner: Waiting for map tasks
14/09/29 16:31:56 INFO mapred.LocalJobRunner: Starting task: attempt_local409772736_0001_m_000000_0
14/09/29 16:31:56 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/09/29 16:31:56 INFO mapred.MapTask: Processing split: file:/var/folders/1t/t94brwgx7sjcn8jgz4gr3_c00000gq/T/tmpaJeuZN/seed.txt:0+25
14/09/29 16:31:56 INFO mapred.MapTask: numReduceTasks: 0
14/09/29 16:31:56 INFO plugin.PluginRepository: Plugins: looking in: /Users/quasiben/Research/ContinuumDev/Memex/nutch_application/nutch/runtime/local/plugins
14/09/29 16:31:57 INFO plugin.PluginRepository: Plugin Auto-activation mode: [true]
14/09/29 16:31:57 INFO plugin.PluginRepository: Registered Plugins:
14/09/29 16:31:57 INFO plugin.PluginRepository: the nutch core extension points (nutch-extensionpoints)
14/09/29 16:31:57 INFO plugin.PluginRepository: Basic URL Normalizer (urlnormalizer-basic)
14/09/29 16:31:57 INFO plugin.PluginRepository: Html Parse Plug-in (parse-html)
14/09/29 16:31:57 INFO plugin.PluginRepository: Basic Indexing Filter (index-basic)
14/09/29 16:31:57 INFO plugin.PluginRepository: SOLRIndexWriter (indexer-solr)
14/09/29 16:31:57 INFO plugin.PluginRepository: HTTP Framework (lib-http)
14/09/29 16:31:57 INFO plugin.PluginRepository: Regex URL Filter (urlfilter-regex)
14/09/29 16:31:57 INFO plugin.PluginRepository: Pass-through URL Normalizer (urlnormalizer-pass)
14/09/29 16:31:57 INFO plugin.PluginRepository: Http Protocol Plug-in (protocol-http)
14/09/29 16:31:57 INFO plugin.PluginRepository: Regex URL Normalizer (urlnormalizer-regex)
14/09/29 16:31:57 INFO plugin.PluginRepository: CyberNeko HTML Parser (lib-nekohtml)
14/09/29 16:31:57 INFO plugin.PluginRepository: Tika Parser Plug-in (parse-tika)
14/09/29 16:31:57 INFO plugin.PluginRepository: OPIC Scoring Plug-in (scoring-opic)
14/09/29 16:31:57 INFO plugin.PluginRepository: Anchor Indexing Filter (index-anchor)
14/09/29 16:31:57 INFO plugin.PluginRepository: Regex URL Filter Framework (lib-regex-filter)
14/09/29 16:31:57 INFO plugin.PluginRepository: Registered Extension-Points:
14/09/29 16:31:57 INFO plugin.PluginRepository: Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
14/09/29 16:31:57 INFO plugin.PluginRepository: Nutch Protocol (org.apache.nutch.protocol.Protocol)
14/09/29 16:31:57 INFO plugin.PluginRepository: Nutch Segment Merge Filter (org.apache.nutch.segment.SegmentMergeFilter)
14/09/29 16:31:57 INFO plugin.PluginRepository: Nutch URL Filter (org.apache.nutch.net.URLFilter)
14/09/29 16:31:57 INFO plugin.PluginRepository: Nutch Index Writer (org.apache.nutch.indexer.IndexWriter)
14/09/29 16:31:57 INFO plugin.PluginRepository: Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
14/09/29 16:31:57 INFO plugin.PluginRepository: HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
14/09/29 16:31:57 INFO plugin.PluginRepository: Nutch Content Parser (org.apache.nutch.parse.Parser)
14/09/29 16:31:57 INFO plugin.PluginRepository: Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
14/09/29 16:31:57 INFO conf.Configuration: found resource regex-normalize.xml at file:/Users/quasiben/Research/ContinuumDev/Memex/nutch_application/nutch/runtime/local/conf/regex-normalize.xml
14/09/29 16:31:57 INFO conf.Configuration: found resource regex-urlfilter.txt at file:/Users/quasiben/Research/ContinuumDev/Memex/nutch_application/nutch/runtime/local/conf/regex-urlfilter.txt
14/09/29 16:31:57 INFO regex.RegexURLNormalizer: can't find rules for scope 'inject', using default
14/09/29 16:31:57 INFO mapred.Task: Task:attempt_local409772736_0001_m_000000_0 is done. And is in the process of commiting
14/09/29 16:31:57 INFO mapred.LocalJobRunner:
14/09/29 16:31:57 INFO mapred.Task: Task attempt_local409772736_0001_m_000000_0 is allowed to commit now
14/09/29 16:31:57 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local409772736_0001_m_000000_0' to file:/tmp/hadoop-quasiben/mapred/temp/inject-temp-587370943
14/09/29 16:31:57 INFO mapred.LocalJobRunner: file:/var/folders/1t/t94brwgx7sjcn8jgz4gr3_c00000gq/T/tmpaJeuZN/seed.txt:0+25
14/09/29 16:31:57 INFO mapred.Task: Task 'attempt_local409772736_0001_m_000000_0' done.
14/09/29 16:31:57 INFO mapred.LocalJobRunner: Finishing task: attempt_local409772736_0001_m_000000_0
14/09/29 16:31:57 INFO mapred.LocalJobRunner: Map task executor complete.
14/09/29 16:31:57 INFO mapred.JobClient: map 100% reduce 0%
14/09/29 16:31:57 INFO mapred.JobClient: Job complete: job_local409772736_0001
14/09/29 16:31:57 INFO mapred.JobClient: Counters: 11
14/09/29 16:31:57 INFO mapred.JobClient: File Input Format Counters
14/09/29 16:31:57 INFO mapred.JobClient: Bytes Read=25
14/09/29 16:31:57 INFO mapred.JobClient: File Output Format Counters
14/09/29 16:31:57 INFO mapred.JobClient: Bytes Written=160
14/09/29 16:31:57 INFO mapred.JobClient: injector
14/09/29 16:31:57 INFO mapred.JobClient: urls_injected=1
14/09/29 16:31:57 INFO mapred.JobClient: FileSystemCounters
14/09/29 16:31:57 INFO mapred.JobClient: FILE_BYTES_READ=546517
14/09/29 16:31:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=635913
14/09/29 16:31:57 INFO mapred.JobClient: Map-Reduce Framework
14/09/29 16:31:57 INFO mapred.JobClient: Map input records=1
14/09/29 16:31:57 INFO mapred.JobClient: Spilled Records=0
14/09/29 16:31:57 INFO mapred.JobClient: Total committed heap usage (bytes)=515375104
14/09/29 16:31:57 INFO mapred.JobClient: Map input bytes=25
14/09/29 16:31:57 INFO mapred.JobClient: SPLIT_RAW_BYTES=125
14/09/29 16:31:57 INFO mapred.JobClient: Map output records=1
14/09/29 16:31:57 INFO crawl.Injector: Injector: Total number of urls rejected by filters: 0
14/09/29 16:31:57 INFO crawl.Injector: Injector: Total number of urls after normalization: 1
14/09/29 16:31:57 INFO crawl.Injector: Injector: Merging injected urls into crawl db.
14/09/29 16:31:57 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/09/29 16:31:57 INFO mapred.FileInputFormat: Total input paths to process : 2
14/09/29 16:31:57 INFO mapred.JobClient: Running job: job_local917149307_0002
14/09/29 16:31:57 INFO mapred.LocalJobRunner: Waiting for map tasks
14/09/29 16:31:57 INFO mapred.LocalJobRunner: Starting task: attempt_local917149307_0002_m_000000_0
14/09/29 16:31:57 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/09/29 16:31:57 INFO mapred.MapTask: Processing split: file:/Users/quasiben/Research/ContinuumDev/Memex/nutchpy/crawl/current/part-00000/data:0+148
14/09/29 16:31:57 INFO mapred.MapTask: numReduceTasks: 1
14/09/29 16:31:57 INFO mapred.MapTask: io.sort.mb = 100
14/09/29 16:31:57 INFO mapred.MapTask: data buffer = 79691776/99614720
14/09/29 16:31:57 INFO mapred.MapTask: record buffer = 262144/327680
14/09/29 16:31:57 INFO mapred.MapTask: Starting flush of map output
14/09/29 16:31:57 INFO mapred.MapTask: Finished spill 0
14/09/29 16:31:57 INFO mapred.Task: Task:attempt_local917149307_0002_m_000000_0 is done. And is in the process of commiting
14/09/29 16:31:57 INFO mapred.LocalJobRunner: file:/Users/quasiben/Research/ContinuumDev/Memex/nutchpy/crawl/current/part-00000/data:0+148
14/09/29 16:31:57 INFO mapred.Task: Task 'attempt_local917149307_0002_m_000000_0' done.
14/09/29 16:31:57 INFO mapred.LocalJobRunner: Finishing task: attempt_local917149307_0002_m_000000_0
14/09/29 16:31:57 INFO mapred.LocalJobRunner: Starting task: attempt_local917149307_0002_m_000001_0
14/09/29 16:31:57 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/09/29 16:31:57 INFO mapred.MapTask: Processing split: file:/tmp/hadoop-quasiben/mapred/temp/inject-temp-587370943/part-00000:0+148
14/09/29 16:31:57 INFO mapred.MapTask: numReduceTasks: 1
14/09/29 16:31:57 INFO mapred.MapTask: io.sort.mb = 100
14/09/29 16:31:58 INFO mapred.MapTask: data buffer = 79691776/99614720
14/09/29 16:31:58 INFO mapred.MapTask: record buffer = 262144/327680
14/09/29 16:31:58 INFO mapred.MapTask: Starting flush of map output
14/09/29 16:31:58 INFO mapred.MapTask: Finished spill 0
14/09/29 16:31:58 INFO mapred.Task: Task:attempt_local917149307_0002_m_000001_0 is done. And is in the process of commiting
14/09/29 16:31:58 INFO mapred.LocalJobRunner: file:/tmp/hadoop-quasiben/mapred/temp/inject-temp-587370943/part-00000:0+148
14/09/29 16:31:58 INFO mapred.Task: Task 'attempt_local917149307_0002_m_000001_0' done.
14/09/29 16:31:58 INFO mapred.LocalJobRunner: Finishing task: attempt_local917149307_0002_m_000001_0
14/09/29 16:31:58 INFO mapred.LocalJobRunner: Map task executor complete.
14/09/29 16:31:58 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/09/29 16:31:58 INFO mapred.LocalJobRunner:
14/09/29 16:31:58 INFO mapred.Merger: Merging 2 sorted segments
14/09/29 16:31:58 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 116 bytes
14/09/29 16:31:58 INFO mapred.LocalJobRunner:
14/09/29 16:31:58 INFO crawl.Injector: Injector: overwrite: false
14/09/29 16:31:58 INFO crawl.Injector: Injector: update: false
14/09/29 16:31:58 INFO compress.CodecPool: Got brand-new compressor
14/09/29 16:31:58 INFO mapred.Task: Task:attempt_local917149307_0002_r_000000_0 is done. And is in the process of commiting
14/09/29 16:31:58 INFO mapred.LocalJobRunner:
14/09/29 16:31:58 INFO mapred.Task: Task attempt_local917149307_0002_r_000000_0 is allowed to commit now
14/09/29 16:31:58 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local917149307_0002_r_000000_0' to file:/Users/quasiben/Research/ContinuumDev/Memex/nutchpy/crawl/1532362367
14/09/29 16:31:58 INFO mapred.LocalJobRunner: reduce > reduce
14/09/29 16:31:58 INFO mapred.Task: Task 'attempt_local917149307_0002_r_000000_0' done.
14/09/29 16:31:58 INFO mapred.JobClient: map 100% reduce 100%
14/09/29 16:31:58 INFO mapred.JobClient: Job complete: job_local917149307_0002
14/09/29 16:31:58 INFO mapred.JobClient: Counters: 19
14/09/29 16:31:58 INFO mapred.JobClient: File Input Format Counters
14/09/29 16:31:58 INFO mapred.JobClient: Bytes Read=320
14/09/29 16:31:58 INFO mapred.JobClient: File Output Format Counters
14/09/29 16:31:58 INFO mapred.JobClient: Bytes Written=389
14/09/29 16:31:58 INFO mapred.JobClient: injector
14/09/29 16:31:58 INFO mapred.JobClient: urls_merged=1
14/09/29 16:31:58 INFO mapred.JobClient: FileSystemCounters
14/09/29 16:31:58 INFO mapred.JobClient: FILE_BYTES_READ=3280972
14/09/29 16:31:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3817822
14/09/29 16:31:58 INFO mapred.JobClient: Map-Reduce Framework
14/09/29 16:31:58 INFO mapred.JobClient: Reduce input groups=1
14/09/29 16:31:58 INFO mapred.JobClient: Map output materialized bytes=124
14/09/29 16:31:58 INFO mapred.JobClient: Combine output records=0
14/09/29 16:31:58 INFO mapred.JobClient: Map input records=2
14/09/29 16:31:58 INFO mapred.JobClient: Reduce shuffle bytes=0
14/09/29 16:31:58 INFO mapred.JobClient: Reduce output records=1
14/09/29 16:31:58 INFO mapred.JobClient: Spilled Records=4
14/09/29 16:31:58 INFO mapred.JobClient: Map output bytes=108
14/09/29 16:31:58 INFO mapred.JobClient: Total committed heap usage (bytes)=1546125312
14/09/29 16:31:58 INFO mapred.JobClient: Map input bytes=124
14/09/29 16:31:58 INFO mapred.JobClient: Combine input records=0
14/09/29 16:31:58 INFO mapred.JobClient: Map output records=2
14/09/29 16:31:58 INFO mapred.JobClient: SPLIT_RAW_BYTES=262
14/09/29 16:31:58 INFO mapred.JobClient: Reduce input records=2
14/09/29 16:31:58 INFO crawl.Injector: Injector: URLs merged: 1
14/09/29 16:31:58 INFO crawl.Injector: Injector: Total new urls injected: 0
14/09/29 16:31:58 INFO crawl.Injector: Injector: finished at 2014-09-29 16:31:58, elapsed: 00:00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment