Created
May 11, 2015 15:05
-
-
Save mkemp/6d3e758fdce75f6bd035 to your computer and use it in GitHub Desktop.
Used in CloudCamp Chicago 2015.05.11 presentation.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ pyspark | |
Python 2.7.2 (default, Oct 11 2012, 20:14:37) | |
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin | |
... | |
Welcome to | |
____ __ | |
/ __/__ ___ _____/ /__ | |
_\ \/ _ \/ _ `/ __/ '_/ | |
/__ / .__/\_,_/_/ /_/\_\ version 1.1.0 | |
/_/ | |
Using Python version 2.7.2 (default, Oct 11 2012 20:14:37) | |
SparkContext available as sc. | |
>>> from word_count import word_count | |
>>> word_count(sc, 'text.txt', 'text_counts') | |
15/05/11 10:01:28 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=278302556 | |
15/05/11 10:01:28 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 265.3 MB) | |
15/05/11 10:01:29 INFO FileInputFormat: Total input paths to process : 1 | |
15/05/11 10:01:29 INFO SparkContext: Starting job: sortByKey at word_count.py:11 | |
15/05/11 10:01:29 INFO DAGScheduler: Registering RDD 4 (RDD at PythonRDD.scala:261) | |
15/05/11 10:01:29 INFO DAGScheduler: Got job 0 (sortByKey at word_count.py:11) with 2 output partitions (allowLocal=false) | |
15/05/11 10:01:29 INFO DAGScheduler: Final stage: Stage 0(sortByKey at word_count.py:11) | |
15/05/11 10:01:29 INFO DAGScheduler: Parents of final stage: List(Stage 1) | |
15/05/11 10:01:29 INFO DAGScheduler: Missing parents: List(Stage 1) | |
15/05/11 10:01:29 INFO DAGScheduler: Submitting Stage 1 (PairwiseRDD[4] at RDD at PythonRDD.scala:261), which has no missing parents | |
15/05/11 10:01:29 INFO MemoryStore: ensureFreeSpace(7776) called with curMem=163705, maxMem=278302556 | |
15/05/11 10:01:29 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 7.6 KB, free 265.2 MB) | |
15/05/11 10:01:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (PairwiseRDD[4] at RDD at PythonRDD.scala:261) | |
15/05/11 10:01:29 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks | |
15/05/11 10:01:29 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, localhost, PROCESS_LOCAL, 1204 bytes) | |
15/05/11 10:01:29 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1204 bytes) | |
15/05/11 10:01:29 INFO Executor: Running task 1.0 in stage 1.0 (TID 1) | |
15/05/11 10:01:29 INFO Executor: Running task 0.0 in stage 1.0 (TID 0) | |
15/05/11 10:01:29 INFO HadoopRDD: Input split: file:/tmp/cloudcamp/examples/text.txt:0+2644 | |
15/05/11 10:01:29 INFO HadoopRDD: Input split: file:/tmp/cloudcamp/examples/text.txt:2644+2645 | |
15/05/11 10:01:29 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id | |
15/05/11 10:01:29 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id | |
15/05/11 10:01:29 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap | |
15/05/11 10:01:29 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition | |
15/05/11 10:01:29 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id | |
15/05/11 10:01:29 INFO PythonRDD: Times: total = 599, boot = 570, init = 29, finish = 0 | |
15/05/11 10:01:30 INFO PythonRDD: Times: total = 610, boot = 567, init = 32, finish = 11 | |
15/05/11 10:01:30 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0). 1863 bytes result sent to driver | |
15/05/11 10:01:30 INFO Executor: Finished task 1.0 in stage 1.0 (TID 1). 1863 bytes result sent to driver | |
15/05/11 10:01:30 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 805 ms on localhost (1/2) | |
15/05/11 10:01:30 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 833 ms on localhost (2/2) | |
15/05/11 10:01:30 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool | |
15/05/11 10:01:30 INFO DAGScheduler: Stage 1 (RDD at PythonRDD.scala:261) finished in 0.849 s | |
15/05/11 10:01:30 INFO DAGScheduler: looking for newly runnable stages | |
15/05/11 10:01:30 INFO DAGScheduler: running: Set() | |
15/05/11 10:01:30 INFO DAGScheduler: waiting: Set(Stage 0) | |
15/05/11 10:01:30 INFO DAGScheduler: failed: Set() | |
15/05/11 10:01:30 INFO DAGScheduler: Missing parents for Stage 0: List() | |
15/05/11 10:01:30 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[8] at RDD at PythonRDD.scala:43), which is now runnable | |
15/05/11 10:01:30 INFO MemoryStore: ensureFreeSpace(5928) called with curMem=171481, maxMem=278302556 | |
15/05/11 10:01:30 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 5.8 KB, free 265.2 MB) | |
15/05/11 10:01:30 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (PythonRDD[8] at RDD at PythonRDD.scala:43) | |
15/05/11 10:01:30 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks | |
15/05/11 10:01:30 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL, 948 bytes) | |
15/05/11 10:01:30 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL, 948 bytes) | |
15/05/11 10:01:30 INFO Executor: Running task 1.0 in stage 0.0 (TID 3) | |
15/05/11 10:01:30 INFO Executor: Running task 0.0 in stage 0.0 (TID 2) | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 6 ms | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 6 ms | |
15/05/11 10:01:30 INFO PythonRDD: Times: total = 38, boot = 4, init = 34, finish = 0 | |
15/05/11 10:01:30 INFO PythonRDD: Times: total = 39, boot = 7, init = 31, finish = 1 | |
15/05/11 10:01:30 INFO Executor: Finished task 0.0 in stage 0.0 (TID 2). 862 bytes result sent to driver | |
15/05/11 10:01:30 INFO Executor: Finished task 1.0 in stage 0.0 (TID 3). 862 bytes result sent to driver | |
15/05/11 10:01:30 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 3) in 52 ms on localhost (1/2) | |
15/05/11 10:01:30 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in 58 ms on localhost (2/2) | |
15/05/11 10:01:30 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool | |
15/05/11 10:01:30 INFO DAGScheduler: Stage 0 (sortByKey at word_count.py:11) finished in 0.065 s | |
15/05/11 10:01:30 INFO SparkContext: Job finished: sortByKey at word_count.py:11, took 1.014226 s | |
15/05/11 10:01:30 INFO SparkContext: Starting job: sortByKey at word_count.py:11 | |
15/05/11 10:01:30 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 146 bytes | |
15/05/11 10:01:30 INFO DAGScheduler: Got job 1 (sortByKey at word_count.py:11) with 2 output partitions (allowLocal=false) | |
15/05/11 10:01:30 INFO DAGScheduler: Final stage: Stage 2(sortByKey at word_count.py:11) | |
15/05/11 10:01:30 INFO DAGScheduler: Parents of final stage: List(Stage 3) | |
15/05/11 10:01:30 INFO DAGScheduler: Missing parents: List() | |
15/05/11 10:01:30 INFO DAGScheduler: Submitting Stage 2 (PythonRDD[9] at RDD at PythonRDD.scala:43), which has no missing parents | |
15/05/11 10:01:30 INFO MemoryStore: ensureFreeSpace(5968) called with curMem=177409, maxMem=278302556 | |
15/05/11 10:01:30 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 5.8 KB, free 265.2 MB) | |
15/05/11 10:01:30 INFO DAGScheduler: Submitting 2 missing tasks from Stage 2 (PythonRDD[9] at RDD at PythonRDD.scala:43) | |
15/05/11 10:01:30 INFO TaskSchedulerImpl: Adding task set 2.0 with 2 tasks | |
15/05/11 10:01:30 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4, localhost, PROCESS_LOCAL, 948 bytes) | |
15/05/11 10:01:30 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 5, localhost, PROCESS_LOCAL, 948 bytes) | |
15/05/11 10:01:30 INFO Executor: Running task 0.0 in stage 2.0 (TID 4) | |
15/05/11 10:01:30 INFO Executor: Running task 1.0 in stage 2.0 (TID 5) | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 0 ms | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 0 ms | |
15/05/11 10:01:30 INFO PythonRDD: Times: total = 16, boot = 3, init = 8, finish = 5 | |
15/05/11 10:01:30 INFO PythonRDD: Times: total = 14, boot = 3, init = 8, finish = 3 | |
15/05/11 10:01:30 INFO Executor: Finished task 0.0 in stage 2.0 (TID 4). 1202 bytes result sent to driver | |
15/05/11 10:01:30 INFO Executor: Finished task 1.0 in stage 2.0 (TID 5). 1184 bytes result sent to driver | |
15/05/11 10:01:30 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 30 ms on localhost (1/2) | |
15/05/11 10:01:30 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 5) in 33 ms on localhost (2/2) | |
15/05/11 10:01:30 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool | |
15/05/11 10:01:30 INFO DAGScheduler: Stage 2 (sortByKey at word_count.py:11) finished in 0.041 s | |
15/05/11 10:01:30 INFO SparkContext: Job finished: sortByKey at word_count.py:11, took 0.054668 s | |
15/05/11 10:01:30 INFO SparkContext: Starting job: saveAsTextFile at NativeMethodAccessorImpl.java:-2 | |
15/05/11 10:01:30 INFO DAGScheduler: Registering RDD 11 (RDD at PythonRDD.scala:261) | |
15/05/11 10:01:30 INFO DAGScheduler: Got job 2 (saveAsTextFile at NativeMethodAccessorImpl.java:-2) with 2 output partitions (allowLocal=false) | |
15/05/11 10:01:30 INFO DAGScheduler: Final stage: Stage 4(saveAsTextFile at NativeMethodAccessorImpl.java:-2) | |
15/05/11 10:01:30 INFO DAGScheduler: Parents of final stage: List(Stage 6) | |
15/05/11 10:01:30 INFO DAGScheduler: Missing parents: List(Stage 6) | |
15/05/11 10:01:30 INFO DAGScheduler: Submitting Stage 6 (PairwiseRDD[11] at RDD at PythonRDD.scala:261), which has no missing parents | |
15/05/11 10:01:30 INFO MemoryStore: ensureFreeSpace(6840) called with curMem=183377, maxMem=278302556 | |
15/05/11 10:01:30 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 6.7 KB, free 265.2 MB) | |
15/05/11 10:01:30 INFO DAGScheduler: Submitting 2 missing tasks from Stage 6 (PairwiseRDD[11] at RDD at PythonRDD.scala:261) | |
15/05/11 10:01:30 INFO TaskSchedulerImpl: Adding task set 6.0 with 2 tasks | |
15/05/11 10:01:30 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 6, localhost, PROCESS_LOCAL, 937 bytes) | |
15/05/11 10:01:30 INFO TaskSetManager: Starting task 1.0 in stage 6.0 (TID 7, localhost, PROCESS_LOCAL, 937 bytes) | |
15/05/11 10:01:30 INFO Executor: Running task 0.0 in stage 6.0 (TID 6) | |
15/05/11 10:01:30 INFO Executor: Running task 1.0 in stage 6.0 (TID 7) | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 1 ms | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 1 ms | |
15/05/11 10:01:30 INFO PythonRDD: Times: total = 12, boot = 4, init = 7, finish = 1 | |
15/05/11 10:01:30 INFO PythonRDD: Times: total = 14, boot = 3, init = 9, finish = 2 | |
15/05/11 10:01:30 INFO Executor: Finished task 1.0 in stage 6.0 (TID 7). 996 bytes result sent to driver | |
15/05/11 10:01:30 INFO Executor: Finished task 0.0 in stage 6.0 (TID 6). 996 bytes result sent to driver | |
15/05/11 10:01:30 INFO TaskSetManager: Finished task 1.0 in stage 6.0 (TID 7) in 38 ms on localhost (1/2) | |
15/05/11 10:01:30 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 6) in 45 ms on localhost (2/2) | |
15/05/11 10:01:30 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool | |
15/05/11 10:01:30 INFO DAGScheduler: Stage 6 (RDD at PythonRDD.scala:261) finished in 0.050 s | |
15/05/11 10:01:30 INFO DAGScheduler: looking for newly runnable stages | |
15/05/11 10:01:30 INFO DAGScheduler: running: Set() | |
15/05/11 10:01:30 INFO DAGScheduler: waiting: Set(Stage 4) | |
15/05/11 10:01:30 INFO DAGScheduler: failed: Set() | |
15/05/11 10:01:30 INFO DAGScheduler: Missing parents for Stage 4: List() | |
15/05/11 10:01:30 INFO DAGScheduler: Submitting Stage 4 (MappedRDD[16] at saveAsTextFile at NativeMethodAccessorImpl.java:-2), which is now runnable | |
15/05/11 10:01:30 INFO MemoryStore: ensureFreeSpace(61344) called with curMem=190217, maxMem=278302556 | |
15/05/11 10:01:30 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 59.9 KB, free 265.2 MB) | |
15/05/11 10:01:30 INFO DAGScheduler: Submitting 2 missing tasks from Stage 4 (MappedRDD[16] at saveAsTextFile at NativeMethodAccessorImpl.java:-2) | |
15/05/11 10:01:30 INFO TaskSchedulerImpl: Adding task set 4.0 with 2 tasks | |
15/05/11 10:01:30 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 8, localhost, PROCESS_LOCAL, 948 bytes) | |
15/05/11 10:01:30 INFO TaskSetManager: Starting task 1.0 in stage 4.0 (TID 9, localhost, PROCESS_LOCAL, 948 bytes) | |
15/05/11 10:01:30 INFO Executor: Running task 0.0 in stage 4.0 (TID 8) | |
15/05/11 10:01:30 INFO Executor: Running task 1.0 in stage 4.0 (TID 9) | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 0 ms | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks | |
15/05/11 10:01:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 0 ms | |
15/05/11 10:01:30 INFO PythonRDD: Times: total = 14, boot = 2, init = 10, finish = 2 | |
15/05/11 10:01:30 INFO PythonRDD: Times: total = 16, boot = 4, init = 10, finish = 2 | |
15/05/11 10:01:30 INFO FileOutputCommitter: Saved output of task 'attempt_201505111001_0004_m_000001_9' to file:/tmp/cloudcamp/examples/text_counts/_temporary/0/task_201505111001_0004_m_000001 | |
15/05/11 10:01:30 INFO FileOutputCommitter: Saved output of task 'attempt_201505111001_0004_m_000000_8' to file:/tmp/cloudcamp/examples/text_counts/_temporary/0/task_201505111001_0004_m_000000 | |
15/05/11 10:01:30 INFO SparkHadoopWriter: attempt_201505111001_0004_m_000001_9: Committed | |
15/05/11 10:01:30 INFO SparkHadoopWriter: attempt_201505111001_0004_m_000000_8: Committed | |
15/05/11 10:01:30 INFO Executor: Finished task 1.0 in stage 4.0 (TID 9). 846 bytes result sent to driver | |
15/05/11 10:01:30 INFO Executor: Finished task 0.0 in stage 4.0 (TID 8). 846 bytes result sent to driver | |
15/05/11 10:01:30 INFO TaskSetManager: Finished task 1.0 in stage 4.0 (TID 9) in 105 ms on localhost (1/2) | |
15/05/11 10:01:30 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID 8) in 112 ms on localhost (2/2) | |
15/05/11 10:01:30 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool | |
15/05/11 10:01:30 INFO DAGScheduler: Stage 4 (saveAsTextFile at NativeMethodAccessorImpl.java:-2) finished in 0.116 s | |
15/05/11 10:01:30 INFO SparkContext: Job finished: saveAsTextFile at NativeMethodAccessorImpl.java:-2, took 0.207841 s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment