Skip to content

Instantly share code, notes, and snippets.

@msukmanowsky
Created August 17, 2018 17:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save msukmanowsky/b9cb6700e8ccaf93f265962000403f28 to your computer and use it in GitHub Desktop.
Save msukmanowsky/b9cb6700e8ccaf93f265962000403f28 to your computer and use it in GitHub Desktop.
2018-08-17 13:01:07 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-08-17 13:01:07 INFO SparkContext:54 - Running Spark version 2.3.1
2018-08-17 13:01:07 INFO SparkContext:54 - Submitted application: pandas_udf
2018-08-17 13:01:07 INFO SecurityManager:54 - Changing view acls to: mikesukmanowsky
2018-08-17 13:01:07 INFO SecurityManager:54 - Changing modify acls to: mikesukmanowsky
2018-08-17 13:01:07 INFO SecurityManager:54 - Changing view acls groups to:
2018-08-17 13:01:07 INFO SecurityManager:54 - Changing modify acls groups to:
2018-08-17 13:01:07 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mikesukmanowsky); groups with view permissions: Set(); users with modify permissions: Set(mikesukmanowsky); groups with modify permissions: Set()
2018-08-17 13:01:08 INFO Utils:54 - Successfully started service 'sparkDriver' on port 51078.
2018-08-17 13:01:08 INFO SparkEnv:54 - Registering MapOutputTracker
2018-08-17 13:01:08 INFO SparkEnv:54 - Registering BlockManagerMaster
2018-08-17 13:01:08 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-08-17 13:01:08 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-08-17 13:01:08 INFO DiskBlockManager:54 - Created local directory at /private/var/folders/07/gnp1f3hs7kn7p0j7kcs2g4s80000gn/T/blockmgr-82f6215e-9b12-4124-a513-cb8d5a0bb750
2018-08-17 13:01:08 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-08-17 13:01:08 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2018-08-17 13:01:08 INFO log:192 - Logging initialized @2046ms
2018-08-17 13:01:08 INFO Server:346 - jetty-9.3.z-SNAPSHOT
2018-08-17 13:01:08 INFO Server:414 - Started @2108ms
2018-08-17 13:01:08 INFO AbstractConnector:278 - Started ServerConnector@780d9633{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-08-17 13:01:08 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7d97f5ee{/jobs,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@12309e0b{/jobs/json,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@407108cf{/jobs/job,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@366763bc{/jobs/job/json,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@203163bd{/stages,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@68ce3004{/stages/json,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3c4a7dfc{/stages/stage,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@36d7f08{/stages/stage/json,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6ca74704{/stages/pool,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@315dbe09{/stages/pool/json,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7abe7a83{/storage,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1243dfe3{/storage/json,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@448a25c8{/storage/rdd,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@187b76a6{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@11987d48{/environment,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1d28145f{/environment/json,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@899c657{/executors,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4b945fa4{/executors/json,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@19b419a3{/executors/threadDump,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@44e37c26{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@20eb193{/static,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3dc2eef3{/,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@287b3151{/api,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4e5c8d1{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@687d1e94{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://192.168.1.9:4040
2018-08-17 13:01:08 INFO SparkContext:54 - Added file file:/Users/mikesukmanowsky/code/parsely/engineering/casterisk-realtime/spark-simple.py at file:/Users/mikesukmanowsky/code/parsely/engineering/casterisk-realtime/spark-simple.py with timestamp 1534525268603
2018-08-17 13:01:08 INFO Utils:54 - Copying /Users/mikesukmanowsky/code/parsely/engineering/casterisk-realtime/spark-simple.py to /private/var/folders/07/gnp1f3hs7kn7p0j7kcs2g4s80000gn/T/spark-5fc00be2-3067-4905-8c0e-dc8516137fad/userFiles-e52c2c94-708f-4ec6-9c51-b27035b5f307/spark-simple.py
2018-08-17 13:01:08 INFO Executor:54 - Starting executor ID driver on host localhost
2018-08-17 13:01:08 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51079.
2018-08-17 13:01:08 INFO NettyBlockTransferService:54 - Server created on 192.168.1.9:51079
2018-08-17 13:01:08 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-08-17 13:01:08 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, 192.168.1.9, 51079, None)
2018-08-17 13:01:08 INFO BlockManagerMasterEndpoint:54 - Registering block manager 192.168.1.9:51079 with 366.3 MB RAM, BlockManagerId(driver, 192.168.1.9, 51079, None)
2018-08-17 13:01:08 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, 192.168.1.9, 51079, None)
2018-08-17 13:01:08 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, 192.168.1.9, 51079, None)
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@540e144{/metrics/json,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO SharedState:54 - Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/Users/mikesukmanowsky/code/parsely/engineering/casterisk-realtime/spark-warehouse/').
2018-08-17 13:01:08 INFO SharedState:54 - Warehouse path is 'file:/Users/mikesukmanowsky/code/parsely/engineering/casterisk-realtime/spark-warehouse/'.
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1032332a{/SQL,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1ee0e9de{/SQL/json,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2522c537{/SQL/execution,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@758a303a{/SQL/execution/json,null,AVAILABLE,@Spark}
2018-08-17 13:01:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@8f9f390{/static/sql,null,AVAILABLE,@Spark}
2018-08-17 13:01:09 INFO StateStoreCoordinatorRef:54 - Registered StateStoreCoordinator endpoint
/Users/mikesukmanowsky/.pyenv/versions/3.6.6/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
2018-08-17 13:01:11 INFO ContextCleaner:54 - Cleaned accumulator 1
2018-08-17 13:01:12 INFO CodeGenerator:54 - Code generated in 144.472757 ms
2018-08-17 13:01:12 INFO CodeGenerator:54 - Code generated in 20.947228 ms
2018-08-17 13:01:12 INFO CodeGenerator:54 - Code generated in 17.026646 ms
2018-08-17 13:01:12 INFO CodeGenerator:54 - Code generated in 9.059243 ms
2018-08-17 13:01:12 INFO SparkContext:54 - Starting job: showString at NativeMethodAccessorImpl.java:0
2018-08-17 13:01:12 INFO DAGScheduler:54 - Registering RDD 7 (showString at NativeMethodAccessorImpl.java:0)
2018-08-17 13:01:12 INFO DAGScheduler:54 - Got job 0 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions
2018-08-17 13:01:12 INFO DAGScheduler:54 - Final stage: ResultStage 1 (showString at NativeMethodAccessorImpl.java:0)
2018-08-17 13:01:12 INFO DAGScheduler:54 - Parents of final stage: List(ShuffleMapStage 0)
2018-08-17 13:01:12 INFO DAGScheduler:54 - Missing parents: List(ShuffleMapStage 0)
2018-08-17 13:01:12 INFO DAGScheduler:54 - Submitting ShuffleMapStage 0 (MapPartitionsRDD[7] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents
2018-08-17 13:01:12 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 12.4 KB, free 366.3 MB)
2018-08-17 13:01:12 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 6.4 KB, free 366.3 MB)
2018-08-17 13:01:12 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on 192.168.1.9:51079 (size: 6.4 KB, free: 366.3 MB)
2018-08-17 13:01:12 INFO SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1039
2018-08-17 13:01:12 INFO DAGScheduler:54 - Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[7] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0, 1))
2018-08-17 13:01:12 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 2 tasks
2018-08-17 13:01:12 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7874 bytes)
2018-08-17 13:01:12 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7905 bytes)
2018-08-17 13:01:12 INFO Executor:54 - Running task 0.0 in stage 0.0 (TID 0)
2018-08-17 13:01:12 INFO Executor:54 - Running task 1.0 in stage 0.0 (TID 1)
2018-08-17 13:01:12 INFO Executor:54 - Fetching file:/Users/mikesukmanowsky/code/parsely/engineering/casterisk-realtime/spark-simple.py with timestamp 1534525268603
2018-08-17 13:01:12 INFO Utils:54 - /Users/mikesukmanowsky/code/parsely/engineering/casterisk-realtime/spark-simple.py has been previously copied to /private/var/folders/07/gnp1f3hs7kn7p0j7kcs2g4s80000gn/T/spark-5fc00be2-3067-4905-8c0e-dc8516137fad/userFiles-e52c2c94-708f-4ec6-9c51-b27035b5f307/spark-simple.py
2018-08-17 13:01:13 INFO CodeGenerator:54 - Code generated in 11.482855 ms
2018-08-17 13:01:13 INFO CodeGenerator:54 - Code generated in 13.829726 ms
2018-08-17 13:01:13 INFO PythonRunner:54 - Times: total = 508, boot = 505, init = 2, finish = 1
2018-08-17 13:01:13 INFO PythonRunner:54 - Times: total = 505, boot = 497, init = 8, finish = 0
2018-08-17 13:01:13 INFO Executor:54 - Finished task 1.0 in stage 0.0 (TID 1). 2115 bytes result sent to driver
2018-08-17 13:01:13 INFO Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 2115 bytes result sent to driver
2018-08-17 13:01:13 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 731 ms on localhost (executor driver) (1/2)
2018-08-17 13:01:13 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 723 ms on localhost (executor driver) (2/2)
2018-08-17 13:01:13 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool
2018-08-17 13:01:13 INFO DAGScheduler:54 - ShuffleMapStage 0 (showString at NativeMethodAccessorImpl.java:0) finished in 0.869 s
2018-08-17 13:01:13 INFO DAGScheduler:54 - looking for newly runnable stages
2018-08-17 13:01:13 INFO DAGScheduler:54 - running: Set()
2018-08-17 13:01:13 INFO DAGScheduler:54 - waiting: Set(ResultStage 1)
2018-08-17 13:01:13 INFO DAGScheduler:54 - failed: Set()
2018-08-17 13:01:13 INFO DAGScheduler:54 - Submitting ResultStage 1 (MapPartitionsRDD[13] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents
2018-08-17 13:01:13 INFO MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 21.3 KB, free 366.3 MB)
2018-08-17 13:01:13 INFO MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 10.8 KB, free 366.3 MB)
2018-08-17 13:01:13 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on 192.168.1.9:51079 (size: 10.8 KB, free: 366.3 MB)
2018-08-17 13:01:13 INFO SparkContext:54 - Created broadcast 1 from broadcast at DAGScheduler.scala:1039
2018-08-17 13:01:13 INFO DAGScheduler:54 - Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[13] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0))
2018-08-17 13:01:13 INFO TaskSchedulerImpl:54 - Adding task set 1.0 with 1 tasks
2018-08-17 13:01:13 INFO TaskSetManager:54 - Starting task 0.0 in stage 1.0 (TID 2, localhost, executor driver, partition 0, PROCESS_LOCAL, 7754 bytes)
2018-08-17 13:01:13 INFO Executor:54 - Running task 0.0 in stage 1.0 (TID 2)
2018-08-17 13:01:13 INFO ShuffleBlockFetcherIterator:54 - Getting 0 non-empty blocks out of 2 blocks
2018-08-17 13:01:13 INFO ShuffleBlockFetcherIterator:54 - Started 0 remote fetches in 6 ms
2018-08-17 13:01:13 INFO CodeGenerator:54 - Code generated in 9.873753 ms
2018-08-17 13:01:13 INFO CodeGenerator:54 - Code generated in 6.998625 ms
2018-08-17 13:01:13 INFO CodeGenerator:54 - Code generated in 8.157707 ms
/Users/mikesukmanowsky/.pyenv/versions/3.6.6/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
objc[42215]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called.
objc[42215]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
2018-08-17 13:01:13 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 (TID 2)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:333)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:322)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:177)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:121)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:252)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:158)
... 24 more
2018-08-17 13:01:13 WARN TaskSetManager:66 - Lost task 0.0 in stage 1.0 (TID 2, localhost, executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:333)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:322)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:177)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:121)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:252)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:158)
... 24 more
2018-08-17 13:01:13 ERROR TaskSetManager:70 - Task 0 in stage 1.0 failed 1 times; aborting job
2018-08-17 13:01:13 INFO TaskSchedulerImpl:54 - Removed TaskSet 1.0, whose tasks have all completed, from pool
2018-08-17 13:01:13 INFO TaskSchedulerImpl:54 - Cancelling stage 1
2018-08-17 13:01:13 INFO DAGScheduler:54 - ResultStage 1 (showString at NativeMethodAccessorImpl.java:0) failed in 0.594 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 2, localhost, executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:333)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:322)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:177)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:121)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:252)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:158)
... 24 more
Driver stacktrace:
2018-08-17 13:01:13 INFO DAGScheduler:54 - Job 0 failed: showString at NativeMethodAccessorImpl.java:0, took 1.507249 s
Traceback (most recent call last):
File "/Users/mikesukmanowsky/code/parsely/engineering/casterisk-realtime/spark-simple.py", line 25, in <module>
.apply(normalize)
File "/Users/mikesukmanowsky/.opt/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 350, in show
File "/Users/mikesukmanowsky/.opt/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/Users/mikesukmanowsky/.opt/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/Users/mikesukmanowsky/.opt/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o59.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 2, localhost, executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:333)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:322)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:177)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:121)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:252)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:158)
... 24 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1602)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1590)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1589)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1589)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1823)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:363)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3273)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2484)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2484)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3254)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3253)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2484)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2698)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:333)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:322)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:177)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:121)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:252)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:158)
... 24 more
2018-08-17 13:01:13 INFO SparkContext:54 - Invoking stop() from shutdown hook
2018-08-17 13:01:13 INFO AbstractConnector:318 - Stopped Spark@780d9633{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-08-17 13:01:13 INFO SparkUI:54 - Stopped Spark web UI at http://192.168.1.9:4040
2018-08-17 13:01:13 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-08-17 13:01:13 INFO MemoryStore:54 - MemoryStore cleared
2018-08-17 13:01:13 INFO BlockManager:54 - BlockManager stopped
2018-08-17 13:01:13 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2018-08-17 13:01:13 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-08-17 13:01:13 INFO SparkContext:54 - Successfully stopped SparkContext
2018-08-17 13:01:13 INFO ShutdownHookManager:54 - Shutdown hook called
2018-08-17 13:01:13 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/07/gnp1f3hs7kn7p0j7kcs2g4s80000gn/T/spark-5fc00be2-3067-4905-8c0e-dc8516137fad/pyspark-4ae4108c-ea13-44d9-a714-3a42c0778703
2018-08-17 13:01:13 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/07/gnp1f3hs7kn7p0j7kcs2g4s80000gn/T/spark-89c6c777-5431-4f30-b020-dcc25e322b3a
2018-08-17 13:01:13 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/07/gnp1f3hs7kn7p0j7kcs2g4s80000gn/T/spark-5fc00be2-3067-4905-8c0e-dc8516137fad
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment