Abhishek Sharma abhioncbr

## CircleCi.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                abhioncbr
                / CircleCi.md
            
            
              Last active
              November 29, 2018 01:20
            
              
                Manage CI/CD efficiently with CircleCI
              
          
    How to manage CI/CD efficiently with CircleCI?

Continuous integration & deployment(CI/CD) is one of the well-known standard practice in modern-day software development. In a fiercely competitive world, businesses are relying on much more frequent feature releases to target customers. Recently, I came to know about CircleCI which is one of the excellent tools for achieving CI/CD efficiently. In my experience, the following CircleCI tagline is entirely apt:

Automate your development process quickly, safely, and at scale.

In this post, I will be sharing how to build and deploy Docker images using CircleCI quickly.
Introduction to a CircleCI config file

First, enable CircleCI webhook for your public GitHub repository. CirclecCi expects config.yml in the .circleci sub-folder of the project root directory. The config file should follow the rule specified here. CircleCI config file consists of three basic definitions as follows:

[version](h


## Apache_Superset.md

      
              1 file
            
          
              2 forks
            
          
              1 comment
            
          
              14 stars
            
          
                abhioncbr
                / Apache_Superset.md
            
            
              Last active
              November 11, 2023 09:53
            
              
                Apache Superset in the production environment
              
          
    Apache Superset in the production environment

Visualising data helps in building a much deeper understanding of the data and fastens analytics around the data. There are several mature paid products available in the market. Recently, I explored an open-source product name Apache-Superset which I found a very upbeat product in this space. Some prominent features of Superset are:

A rich set of data visualisations
An easy-to-use interface for exploring and visualising data
Create and share dashboards

After reading about Superset, I wanted to try it, and as Superset is a python programming language based project, we can easily install it using pip, but I decided to set it up as a container based on Docker. Apache-Superset GitHub Repo contains code for building and running Superset as a container. Since I wan

  
## superset_server.sh
docker run -p 8088:8088 \
-v config:/home/superset/config/ abhioncbr/docker-superset:<tag> \
cluster server <db_url> <redis_url>

## superset_worker.sh
docker run -p 5555:5555 \
-v config:/home/superset/config/ \
abhioncbr/docker-superset:<tag> cluster worker <db_url> <redis_url>

## superset_pull.sh
docker pull abhioncbr/docker-superset:<tag>

## superset_compse.sh
cd docker-files/ && SUPERSET_ENV=<local | prod> \
SUPERSET_VERSION=<tag> docker-compose up -d

## Druid.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                abhioncbr
                / Druid.md
            
            
              Last active
              March 7, 2019 13:52
            
              
                Making S3A Hadoop connector workable with Druid
              
          
    Apache Druid is a high-performance real-time analytics database. Druid is a unique type of database that combines ideas from OLAP/analytic databases, timeseries databases, and search systems to enable new use cases in real-time architectures. For building a framework for time series trend analysis, prediction model and anomaly detection, I decided to use Druid. As per the requirements, apart from real-time data ingestion, there is a need for batch-based data ingestion too in Druid. After reading several blogs and articles around the production environment setup of Druid cluster for handling petabytes of data, I decided to follow the below architecture:

2


## job-spec.json
"tuningConfig": {
   "type": "hadoop",
   "jobProperties": {
      "fs.s3a.endpoint": "s3.ca-central-1.amazonaws.com",
      "fs.s3.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
      "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
      "io.compression.codecs":  "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec"
   }
}

## hadoop-index-druid-exception.txt
Caused by: java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287) ~[?:?]
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) ~[?:?]
	at org.apache.hadoop.fs.FileSystem.access00(FileSystem.java:94) ~[?:?]
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) ~[?:?]
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) ~[?:?]
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) ~[?:?]
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) ~[?:?]

## segment-output.json
 "ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "type" : "static",
        "paths" : "s3a://experiment-druid/input_data/wikiticker-2015-09-12-sampled.json.gz"
      },
      "metadataUpdateSpec" : null,
      "segmentOutputPath" : "s3n://experiment-druid/deepstorage"
    },
	docker run -p 8088:8088 \
	-v config:/home/superset/config/ abhioncbr/docker-superset:<tag> \
	cluster server <db_url> <redis_url>
	docker run -p 5555:5555 \
	-v config:/home/superset/config/ \
	abhioncbr/docker-superset:<tag> cluster worker <db_url> <redis_url>
	cd docker-files/ && SUPERSET_ENV=<local \| prod> \
	SUPERSET_VERSION=<tag> docker-compose up -d
	"tuningConfig": {
	"type": "hadoop",
	"jobProperties": {
	"fs.s3a.endpoint": "s3.ca-central-1.amazonaws.com",
	"fs.s3.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
	"fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
	"io.compression.codecs": "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec"
	}
	}
	Caused by: java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287) ~[?:?]
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) ~[?:?]
	at org.apache.hadoop.fs.FileSystem.access00(FileSystem.java:94) ~[?:?]
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) ~[?:?]
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) ~[?:?]
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) ~[?:?]
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) ~[?:?]
	"ioConfig" : {
	"type" : "hadoop",
	"inputSpec" : {
	"type" : "static",
	"paths" : "s3a://experiment-druid/input_data/wikiticker-2015-09-12-sampled.json.gz"
	},
	"metadataUpdateSpec" : null,
	"segmentOutputPath" : "s3n://experiment-druid/deepstorage"
	},