Skip to content

Instantly share code, notes, and snippets.

@tilakpatidar
tilakpatidar / basic.sql
Last active March 15, 2018 09:58
Common SQL server queries
-- Finding difference in values for some key
SELECT * from A a,b WHERE a.id = b.id AND NOT a.name = b.name;
--Returns one row for each CHECK, UNIQUE, PRIMARY KEY, and/or FOREIGN KEY
SELECT *
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
WHERE CONSTRAINT_NAME='XYZ'
--Returns one row for each FOREIGN KEY constrain
@tilakpatidar
tilakpatidar / postgres_to_kafka.sh
Last active August 27, 2020 22:31
Postgres to Kafka streaming using debezium
# Run postgres instance
docker run --name postgres -p 5000:5432 debezium/postgres
# Run zookeeper instance
docker run -it --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 debezium/zookeeper
# Run kafka instance
docker run -it --name kafka -p 9092:9092 --link zookeeper:zookeeper debezium/kafka
# Run kafka connect
@tilakpatidar
tilakpatidar / README.md
Last active October 30, 2018 19:05
Install HUE on HDP

Install HUE on HDP

Prerequisites

Install required packages

yum install -y ant gcc g++ libkrb5-dev libmysqlclient-dev
yum install -y libssl-dev libsasl2-dev libsasl2-modules-gssapi-mit
yum install -y libsqlite3-dev libtidy-0.99-0 libxml2-dev libxslt-dev
yum install -y maven libldap2-dev python-dev python-simplejson python-setuptools
yum install -y libxslt-devel libxml++-devel libxml2-devel libffi-devel
@tilakpatidar
tilakpatidar / README.md
Last active January 25, 2018 21:43
Uninstall Cloudera and install HDP 2.6

Install HDP 2.6 and remove Cloudera 5.12

Uninstall cloudera

Follow the guide to uninstall. Link Also, remove the existing folders and users.

#Remove conf and logs
rm -rf rm -rf /etc/hadoop
rm -rf rm -rf /etc/hbase
rm -rf rm -rf /etc/hive
import org.apache.spark.sql.functions.udf
val csv = spark.read.format("csv").option("header", true).load("/Users/tilak/Downloads/Pam/SalesAnalysis/data/store_sales_unified_2017.csv")
val uniqueKey: (String, String, String, String) => String = (x, y ,z , v) => x + "_" + y + "_" + z + "_" + v
val someFn = udf(uniqueKey)
val newData = csv.withColumn("unique", someFn(csv.col("receipt_id"), csv.col("cash_register_id"), csv.col("sale_time"), csv.col("date")))
val countArticles = newData.groupBy("unique", "article_id").count()
val sameBill = countArticles.crossJoin(countArticles).filter(x => x.getString(0) == x.getString(3) && x.getString(1) != x.getString(4))
val newNames = sameBill.columns.toList.zipWithIndex.map((x) => x._1 + "_" + x._2)
import org.apache.spark.sql.functions.udf
import spark.sessionState.conf
conf.setConfString("spark.sql.pivotMaxValues", "" + Int.MaxValue)
val csv = spark.read.format("csv").option("header", true).load("/Users/tilak/Downloads/Pam/SalesAnalysis/data/store_sales_unified_2017.csv")
val uniqueKey: (String, String, String, String) => String = (x, y, z, v) => x + "_" + y + "_" + z + "_" + v
val someFn = udf(uniqueKey)
val newData = csv.withColumn("unique", someFn(csv.col("receipt_id"), csv.col("cash_register_id"), csv.col("sale_time"), csv.col("date")))
val countArticles = newData.groupBy("unique", "article_id").count()
var articles = countArticles.select("article_id").distinct()
val articleIds = articles.collect.map(x => x(0))
@tilakpatidar
tilakpatidar / gobblin-source-schema.md
Last active December 19, 2017 10:20
Gobblin Converters schema documentation

Source Schema and Converters

Source schema

A source schema has to be declared before extracting the data from the source. To define the source schema source.schema property is available which takes a JSON value defining the source schema. This schema is used by Converters to perform data type or data format conversions. The java class representation of a source schema can be found here Schema.java.

Converters

In Gobblin library a Converter is an interface for classes that implement data transformations, e.g., data type conversions, schema projections, data manipulations, data filtering, etc. This interface is responsible for converting both schema and data records. Classes implementing this interface are composible and can be chained together to achieve more complex data transformations.

A converter basically needs four inputs:

  • Input schema
@tilakpatidar
tilakpatidar / keybase.md
Created August 10, 2017 10:01
My keybase declaration

Keybase proof

I hereby claim:

  • I am tilakpatidar on github.
  • I am tilakpatidar (https://keybase.io/tilakpatidar) on keybase.
  • I have a public key ASBrc8-ucimp_8n0hPOuAsj1mFBpAf84XYHuuGuTavTTewo

To claim this, I am signing this object:

@tilakpatidar
tilakpatidar / monit_http_monitor.conf
Created April 19, 2017 08:47
Monit monitor process without pid instead using HTTP request
check host appsrv1 with address 127.0.0.1
start program = "/sbin/start myapp"
stop program = "/sbin/stop myapp"
alert alerts@example.com on {timeout,connection}
if failed port 9009 protocol HTTP
request /
with timeout 3 seconds
then restart
if 10 restarts within 10 cycles then timeout
if 10 restarts within 10 cycles then exec "/usr/bin/monit start aws-dns-healthcheck"
@tilakpatidar
tilakpatidar / lazy_val.scala
Created April 15, 2017 07:53
From http://stackoverflow.com/questions/7484928/what-does-a-lazy-val-do Example of how lazy val does memoization. Values are evaluated on first call only.
val x = { println("x"); 15 }
//x
//x: Int = 15
lazy val y = { println("y"); 13 }
//y: Int = <lazy>
x
//res2: Int = 15
y
//y