Skip to content

Instantly share code, notes, and snippets.

@abhioncbr
Created January 12, 2019 22:29
Show Gist options
  • Save abhioncbr/223a2fd3d2db77b84902b7c9c00735f3 to your computer and use it in GitHub Desktop.
Save abhioncbr/223a2fd3d2db77b84902b7c9c00735f3 to your computer and use it in GitHub Desktop.
Docker image of Apache Superset

A couple of days back, I wrote the post about how to run Apache Superset in the production environment for serving hundreds or thousands of users. Superset community members and users appreciated the post for which I am thankful to them, however over the Superset Slack and Gitter channels; many users asked various questions on setting Superset as a Docker container and how to use/run it. In this post, I am trying to explore more about docker image of a Superset, and I am hoping that after reading the post you will acquire a conceptual understanding of setting Superset as a Docker container and benefits of it.

Container Image

First, let's quickly understand what exactly terms container and image means and how it is related to Docker.

  • As per wikipedia, any structure which holds product for storage, packaging, and shipping is a container. Same applies for the container in a software world.

A container is a standard unit of software that packages up the code and all its dependencies, so the application runs quickly and reliably from one computing environment to another.

  • Now, let's look what an image means.

A container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, run-time, system tools, system libraries, and settings.

  • Finally, relationship with Docker.

Container images become containers at runtime and in the case of Docker containers - images become containers when they run on Docker Engine. Available for both Linux and Windows-based applications, containerized software will always run the same, regardless of the infrastructure.

There are many other container runtime environments, but Docker among them is the most popular one.

Back to Superset Docker image

There are multiple active repositories and images of Superset available over GitHub and DockerHub. Below is the list of some of them

Why so many repositories? Are they different? Aren't they suppose to be the same and provide the same functionality, i.e., packaging the Superset and, it's dependencies. Yes, they should be identical, but there are multiple different ways and mode to start the Superset, an image should be generic for handling all method and commands which is not the case, and that's why there are multiple repositories.

I started working on Superset with the perspective of running it in a completely distributed manner so that hundreds or thousands of users can access the Superset concurrently. In the beginning, I was exploring the Apache Superset code but realized that several changes are required to run Superset multiple containers for a distributed architecture and that's why I decided to have a separate repository.

Features of the Docker image of Superset

  • Multiple ways to start the container, i.e., either by using docker-compose or by using docker run command.
  • Superset all components, i.e., web application, celery worker, celery flower UI can run in the same container or different containers.
  • All database plugins and packages are installed by default.
  • Container first runs sets required Superset metadata database along with sample data and the Fabmanager user account with credentials username: admin & password: admin.
  • Apart of the packaged Superset config file in a container image, custom config file i.e. superset_config.py can be mounted to the container. No need to rebuild image for changing configurations.
  • The default configuration uses MySQL as a Superset metadata database and Redis as a cache & celery broker which can be easily replaceable.
  • Starting the container using docker-compose will start three containers. mysql5.7 as the metadata database, redis3.4 as a cache & celery broker and Superset container.
    • Expects multiple environment variables defined in docker-compose.yml file. Default environment variables are present in the file .env.
    • Default environment variables can be overridden either by editing a .env file or passing through commands like SUPERSET_ENV.
    • Permissible value of SUPERSET_ENV can be either local or prod.
    • In local mode one celery worker and Superset flask-based superset web application run.
    • In prod mode two celery workers and Gunicorn based Superset web application run.
  • Starting container using docker run can be a used for a complete distributed setup, requires metadata database & Redis URL for starting the container.
    • Single or multiple server(using load balancer) container can be spawned. In the server, Gunicorn based superset web application runs.
    • Multiple celery workers container running on same or different machines. In worker, celery worker & flower UI runs.

How to run

  • First, copy superset_config.py, docker-compose.yml, and .env files in your execution environment. Please follow the directory structure like below
      docker-superset
         |__config
         |    |__superset_config.py
         |
         |__docker-files
         |    |__docker-compose.yml
         |    |__.env   
    
  • Runing a container using docker-compose command:

    • Starting a Superset image as a superset container in a local mode:
       cd docker-superset/docker-files/ && docker-compose up -d
    • Starting a Superset image as a superset container in a prod mode:
      cd docker-superset/docker-files/ && SUPERSET_ENV=prod SUPERSET_VERSION=<version-tag> docker-compose up -d
  • Runing a container using docker run command:

    • starting a superset image as a server container:
      cd docker-superset && docker run -p 8088:8088 -v config:/home/superset/config/ abhioncbr/docker-superset:<version-tag> cluster server <superset_metadata_db_url> <redis_url>
    • starting a superset image as a worker container:
      cd docker-superset && docker run -p 5555:5555 -v config:/home/superset/config/ abhioncbr/docker-superset:<version-tag> cluster worker <superset_metadata_db_url> <redis_url>
  • Note: There is no need of building an image, if you are not making changes in the image. you can pull image from dockerHub using below command

      docker pull abhioncbr/docker-superset:<version-tag>

    where, can be any superset-version or latest.

Extending Superset Docker image

  • No changes are required for adding new environment variables, for e.g, for BigQuery connection with Superset GOOGLE_APPLICATION_CREDENTIALS which can be easily provided through a docker-compose.yml file or passing through command.
  • Also, changes done superset_config.py fileis easily refectable in to conatiner by mounting the config file into the container.
  • For any further changes, or bug, please contact me or contribute in to the repository.

Happy Superset Exploration!!!

@Mellorison
Copy link

Alright. But is there a way to access the superset assets?

@putrevusandeep
Copy link

this download path is no longer existing : https://github.com/apache/incubator-superset/archive/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment