abhioncbr/Apache_Superset.md

## Apache_Superset.md

      
    Raw
  

              Apache_Superset.md
            
          
    Apache Superset in the production environment

Visualising data helps in building a much deeper understanding of the data and fastens analytics around the data. There are several mature paid products available in the market. Recently, I explored an open-source product name Apache-Superset which I found a very upbeat product in this space. Some prominent features of Superset are:

A rich set of data visualisations
An easy-to-use interface for exploring and visualising data
Create and share dashboards

After reading about Superset, I wanted to try it, and as Superset is a python programming language based project,  we can easily install it using pip, but I decided to set it up as a container based on Docker. Apache-Superset GitHub Repo contains code for building and running Superset as a container. Since I want to run Superset in a completely distributed manner and less modification is possible in the code(my opinion), I decided to modify the code so that it could run in multiple different modes.
Below is a list of specific changes/enhancements done in the code

Different version of Superset image can be built using the same code.
Superset configuration can be easily edited and mounted into the container, no need of rebuilding the image.
Asynchronous query execution through Celery based executor and managing it through Flower UI

Exploration made easy

While for exploring a project, development mode is an excellent choice, however, it would be great if initial exploration happens with all the features for instance, in-case of Superset, running queries in async mode, and storing the result in cache. You can explore Superset smoothly by the below commands.

First pull a docker-superset image from docker-hub

docker pull abhioncbr/docker-superset:<tag>

Get docker-compose.yml and superset-config.py from code-base and follow same directory structure.
Lastly, start a Superset image as a container in a local or prod mode using docker-compose:

cd docker-files/ && SUPERSET_ENV=<local | prod> SUPERSET_VERSION=<tag> docker-compose up -d
Running Superset in a complete distributed mode

As per my understanding, running a Superset in the production environment for serving thousands of end-users setup should be distributed in nature and can be easily scalable as per the requirements. The below image depicts such setup

Published docker-image of Superset can be leveraged to achieve the above depicted image

Load-balancer in front for routing the request from clients to one server container.
Multiple containers in server mode for serving the UI of the Superset. Starting a server container using docker run can be done as

docker run -p 8088:8088 -v config:/home/superset/config/ abhioncbr/docker-superset:<tag> cluster server <db_url> <redis_url>

Multiple containers in worker mode for executing the SQL queries in an async mode using Celery executor. Starting a worker container using docker run can be done as

docker run -p 5555:5555 -v config:/home/superset/config/ abhioncbr/docker-superset:<tag> cluster worker <db_url> 
<redis_url>

Centralised Redis container or Redis-cluster for serving as cache layer and Celery task queues for workers.
Centralised Superset metadata database.

I found setting up a Superset as Docker container is quite easy and the same can be used for different environments. You can similarly explore Superset.