Skip to content

Instantly share code, notes, and snippets.

@poldeuce-sys
Created September 19, 2017 08:58
Show Gist options
  • Save poldeuce-sys/fee4b0bf4f4e039faec5563d9a9a8269 to your computer and use it in GitHub Desktop.
Save poldeuce-sys/fee4b0bf4f4e039faec5563d9a9a8269 to your computer and use it in GitHub Desktop.
Docker Spark Notes

Networking

The docker containers create their own internal IP addresses. These are bridged, however, when the slave container connects to the master, it gives the master this address as its' location. By default this is not visible to the master container (on another machine) and so the worker is not visible.

Some examples involve parsing the hosts file and so forth however, newer versions of Docker do not require this. It is possible instead to use the docker swarm functionality to effectively create dns entries for the containers and make them visible to each other without messing with the container itself. Note the steps below are for interactively running. The service docker-compose.yml will have to do some things differently, but will get to that later.

  • Create a swarm overlay network: docker network create --driver=overlay --attachable spark-network

  • Run the master on this network, with the required ports open and a network alias (optional, but depending on which container, perhaps easier): docker run -it -p 4040:4040 -p 8080:8080 -p 7077:7077 -h spark-master --name=spark-master --network spark-network --network-alias spark <spark container repo/image>

  • Run the workers on the same network docker run -it -p 8081:8081 -h spark-worker --name spark-worker --network spark-network --network-alias spark-worker <spark container repo/image>

    You can then run spark-master and the spark workers. Connections to the master from the worker can use the url: spark://spark:7077

    Note the various Web Guis will work, but linking between them won't work from outside the container, as the master link to the worker will use the worker container ip in the URL which won't resolve from a deskto or whatever. THis could be addresed with a proxy or similar running in a container in the swarm I suspect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment