Skip to content

Instantly share code, notes, and snippets.

@rajkrrsingh
Last active July 1, 2018 18:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rajkrrsingh/b70dd81511fd5ac9d9255579fd079149 to your computer and use it in GitHub Desktop.
Save rajkrrsingh/b70dd81511fd5ac9d9255579fd079149 to your computer and use it in GitHub Desktop.
quick-start guide to ingest data into druid using batch mode on HDP platform.

source : http://druid.io/docs/latest/tutorials/tutorial-batch.html ENV : HDP-2.6.4

pageview.json

{"time": "2015-09-01T00:00:00Z", "url": "/foo/bar", "user": "alice", "latencyMs": 32}
{"time": "2015-09-01T01:00:00Z", "url": "/", "user": "bob", "latencyMs": 11}
{"time": "2015-09-01T01:30:00Z", "url": "/foo/bar", "user": "bob", "latencyMs": 45}

index task json

cp /usr/hdp/2.6.4.0-91/druid/quickstart/wikiticker-index.json /tmp/sampledata-index.json

modified index task json

cat sampledata-index.json
{
  "type" : "index_hadoop",
  "spec" : {
    "ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "type" : "static",
        "paths" : "/tmp/pageview.json"
      }
    },
    "dataSchema" : {
      "dataSource" : "pageviews",
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "day",
        "queryGranularity" : "none",
        "intervals" : ["2015-09-01/2015-09-02"]
      },
      "parser" : {
        "type" : "hadoopyString",
        "parseSpec" : {
          "format" : "json",
          "dimensionsSpec" : {
            "dimensions" : ["url","user"]
          },
          "timestampSpec" : {
            "format" : "auto",
            "column" : "time"
          }
        }
      },
      "metricsSpec" : [
        {
          "name" : "views",
          "type" : "count"
        },
        {
          "name" : "latencyMs",
          "type" : "doubleSum",
          "fieldName" : "latencyMs"
     } 
     ]
    },
    "tuningConfig" : {
      "type" : "hadoop",
      "partitionsSpec" : {
        "type" : "hashed",
        "targetPartitionSize" : 5000000
      },
      "jobProperties" : {}
    }
  }
}

copy pageviews.json to hdfs /tmp directory

submit index task to overlord.

curl -X 'POST' -H 'Content-Type:application/json' -d @sampledata-index.json  `hostname`:8090/druid/indexer/v1/task

look for completed/running task in overlord console UI.

check the coordinator UI for new datasource with the name of pageviews

Realtime ingestion in druid using tranquility on HDP

ENV HDP-2.6.4

download tranquility

curl -O http://static.druid.io/tranquility/releases/tranquility-distribution-0.8.0.tgz
tar -xzf tranquility-distribution-0.8.0.tgz
cd tranquility-distribution-0.8.0

start tranquility

[druid@c215-node2 tranquility-distribution-0.8.0]$ pwd
/home/druid/tranquility-distribution-0.8.0
bin/tranquility server -configFile /usr/hdp/current/druid-broker/conf-quickstart/tranquility/server.json 
.
.

.
2018-07-01 18:12:05,620 [main] INFO  org.eclipse.jetty.server.Server - Started @5026ms

send data into Druid

[root@c215-node2 bin]# pwd
/usr/hdp/2.6.4.0-91/druid/bin
[root@c215-node2 bin]# python generate-example-metrics -c 1000 | curl -XPOST -H'Content-Type: application/json' --data-binary @- http://localhost:8200/v1/post/metrics
{"result":{"received":1000,"sent":1000}}[root@c215-node2 bin]# 

create hive table on top of druid datasource

CREATE EXTERNAL TABLE merices_hive
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "metrics");

now you can query realtime data through hive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment