- Spark 2.0.0 pre-built for Hadoop 2.7
- Mac OS X 10.11
- Python 3.5.2
Use s3 within pyspark with minimal hassle.
by Bjørn Friese
Beautiful is better than ugly. Explicit is better than implicit.
I frequently deal with collections of things in the programs I write. Collections of droids, jedis, planets, lightsabers, starfighters, etc. When programming in Python, these collections of things are usually represented as lists, sets and dictionaries. Oftentimes, what I want to do with collections is to transform them in various ways. Comprehensions is a powerful syntax for doing just that. I use them extensively, and it's one of the things that keep me coming back to Python. Let me show you a few examples of the incredible usefulness of comprehensions.
|from __future__ import print_function|
|if isinstance(node, ast.BinOp):|
|if isinstance(node.op, ast.Mult) or isinstance(node.op, ast.Div):|
I've been using the Anaconda python package from continuum.io recently and found it to be a good way to get all the complex compiled libs you need for a scientific python environment. Even better, their conda tool lets you create environments much like virtualenv, but without having to re-compile stuff like numpy, which gets old very very quickly with virtualenv and can be a nightmare to get correctly set up on OSX.
The only thing missing was an easy way to switch environments - their docs suggest running python executables from the install folder, which I find a bit of a pain. Coincidentally I came across this article - Virtualenv's bin/activate is Doing It Wrong - which desribes a simple way to launch a sub-shell with certain environment variables set. Now simple was the key word for me since my bash-fu isn't very strong, but I managed to come up with the script below. Put this in a text file called conda-work
|# Get current version|
|# Get new version|
|ver=`ls $src | grep linux- | sort -V | tail -1`|
|from mesos.interface import Scheduler|
|from mesos.native import MesosSchedulerDriver|
|from mesos.interface import mesos_pb2|
|STARTUP_MSG: java = 1.8.0_25|
|14/10/28 06:27:07 INFO mapred.JobTracker: registered UNIX signal handlers for [TERM, HUP, INT]|
|14/10/28 06:27:08 FATAL mapred.JobTracker: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: local|