The best way to run pyspark on a machine in a virtualized environment is to use docker. Docker is a container technology
that allows developers to 'package' their software and ship it so that it takes away the headache of things like setting up
environment properly, configuring logging, setting proper options, breaking the machine etc.. basically removes
the excuse It works on my machine
I'll stop babbling about docker and containers. If you're interested to know more, head here: http://unix.stackexchange.com/questions/254956/what-is-the-difference-between-docker-lxd-and-lxc
Step1: Installing docker on Mac
- Download & install the docker dmg file: https://download.docker.com/mac/stable/Docker.dmg
- Download & install docker toolbox: https://download.docker.com/mac/stable/DockerToolbox.pkg