Tilak Patidar tilakpatidar

## basic.sql
-- Finding difference in values for some key
SELECT * from A a,b WHERE a.id = b.id AND NOT a.name = b.name;

--Returns one row for each CHECK, UNIQUE, PRIMARY KEY, and/or FOREIGN KEY
SELECT *
    FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
    WHERE CONSTRAINT_NAME='XYZ'


--Returns one row for each FOREIGN KEY constrain

## postgres_to_kafka.sh
# Run postgres instance
docker run --name postgres -p 5000:5432 debezium/postgres

# Run zookeeper instance
docker run -it --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 debezium/zookeeper

# Run kafka instance
docker run -it --name kafka -p 9092:9092 --link zookeeper:zookeeper debezium/kafka

# Run kafka connect

## README.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              2 stars
            
          
                tilakpatidar
                / README.md
            
            
              Last active
              October 30, 2018 19:05
            
              
                Install HUE on HDP
              
          
    Install HUE on HDP

Prerequisites

Install required packages

yum install -y ant gcc g++ libkrb5-dev libmysqlclient-dev
yum install -y libssl-dev libsasl2-dev libsasl2-modules-gssapi-mit
yum install -y libsqlite3-dev libtidy-0.99-0 libxml2-dev libxslt-dev
yum install -y maven libldap2-dev python-dev python-simplejson python-setuptools
yum install -y libxslt-devel libxml++-devel libxml2-devel libffi-devel

  
## README.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                tilakpatidar
                / README.md
            
            
              Last active
              January 25, 2018 21:43
            
              
                Uninstall Cloudera and install HDP 2.6
              
          
    Install HDP 2.6 and remove Cloudera 5.12

Uninstall cloudera

Follow the guide to uninstall. Link
Also, remove the existing folders and users.
#Remove conf and logs
rm -rf rm -rf /etc/hadoop
rm -rf rm -rf /etc/hbase
rm -rf rm -rf /etc/hive

  
## third.scala
import org.apache.spark.sql.functions.udf

val csv = spark.read.format("csv").option("header", true).load("/Users/tilak/Downloads/Pam/SalesAnalysis/data/store_sales_unified_2017.csv")
val uniqueKey: (String, String, String, String) => String = (x, y ,z , v) => x + "_" + y + "_" + z + "_" + v
val someFn = udf(uniqueKey)
val newData = csv.withColumn("unique", someFn(csv.col("receipt_id"), csv.col("cash_register_id"), csv.col("sale_time"), csv.col("date")))

val countArticles = newData.groupBy("unique", "article_id").count()
val sameBill = countArticles.crossJoin(countArticles).filter(x => x.getString(0) == x.getString(3) && x.getString(1) != x.getString(4))
val newNames = sameBill.columns.toList.zipWithIndex.map((x) => x._1 + "_" + x._2)

## dummy.scala
import org.apache.spark.sql.functions.udf
import spark.sessionState.conf
conf.setConfString("spark.sql.pivotMaxValues", "" + Int.MaxValue)
val csv = spark.read.format("csv").option("header", true).load("/Users/tilak/Downloads/Pam/SalesAnalysis/data/store_sales_unified_2017.csv")
val uniqueKey: (String, String, String, String) => String = (x, y, z, v) => x + "_" + y + "_" + z + "_" + v
val someFn = udf(uniqueKey)
val newData = csv.withColumn("unique", someFn(csv.col("receipt_id"), csv.col("cash_register_id"), csv.col("sale_time"), csv.col("date")))
val countArticles = newData.groupBy("unique", "article_id").count()
var articles = countArticles.select("article_id").distinct()
val articleIds = articles.collect.map(x => x(0))

## gobblin-source-schema.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                tilakpatidar
                / gobblin-source-schema.md
            
            
              Last active
              December 19, 2017 10:20
            
              
                Gobblin Converters schema documentation
              
          
    Source Schema and Converters

Source schema

A source schema has to be declared before extracting the data from the source. To define the source schema source.schema property is available which takes a JSON value defining the source schema. This schema is used by Converters to perform data type or data format conversions. The java class representation of a source schema can be found here Schema.java.
Converters

In Gobblin library a Converter is an interface for classes that implement data transformations, e.g., data type conversions, schema projections, data manipulations, data filtering, etc. This interface is responsible for converting both schema and data records. Classes implementing this interface are composible and can be chained together to achieve more complex data transformations.
A converter basically needs four inputs:

Input schema


## keybase.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                tilakpatidar
                / keybase.md
            
            
              Created
              August 10, 2017 10:01
            
              
                My keybase declaration
              
          
    Keybase proof

I hereby claim:

I am tilakpatidar on github.
I am tilakpatidar (https://keybase.io/tilakpatidar) on keybase.
I have a public key ASBrc8-ucimp_8n0hPOuAsj1mFBpAf84XYHuuGuTavTTewo

To claim this, I am signing this object:

  
## monit_http_monitor.conf
check host appsrv1 with address 127.0.0.1
    start program = "/sbin/start myapp"
    stop program  = "/sbin/stop myapp"
    alert alerts@example.com on {timeout,connection}
    if failed port 9009 protocol HTTP
      request /
      with timeout 3 seconds
      then restart
    if 10 restarts within 10 cycles then timeout
    if 10 restarts within 10 cycles then exec "/usr/bin/monit start aws-dns-healthcheck"

## lazy_val.scala
val x = { println("x"); 15 }
//x
//x: Int = 15

lazy val y = { println("y"); 13 }
//y: Int = <lazy>
x
//res2: Int = 15
y
//y
	-- Finding difference in values for some key
	SELECT * from A a,b WHERE a.id = b.id AND NOT a.name = b.name;

	--Returns one row for each CHECK, UNIQUE, PRIMARY KEY, and/or FOREIGN KEY
	SELECT *
	FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
	WHERE CONSTRAINT_NAME='XYZ'


	--Returns one row for each FOREIGN KEY constrain
	# Run postgres instance
	docker run --name postgres -p 5000:5432 debezium/postgres

	# Run zookeeper instance
	docker run -it --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 debezium/zookeeper

	# Run kafka instance
	docker run -it --name kafka -p 9092:9092 --link zookeeper:zookeeper debezium/kafka

	# Run kafka connect
	import org.apache.spark.sql.functions.udf

	val csv = spark.read.format("csv").option("header", true).load("/Users/tilak/Downloads/Pam/SalesAnalysis/data/store_sales_unified_2017.csv")
	val uniqueKey: (String, String, String, String) => String = (x, y ,z , v) => x + "_" + y + "_" + z + "_" + v
	val someFn = udf(uniqueKey)
	val newData = csv.withColumn("unique", someFn(csv.col("receipt_id"), csv.col("cash_register_id"), csv.col("sale_time"), csv.col("date")))

	val countArticles = newData.groupBy("unique", "article_id").count()
	val sameBill = countArticles.crossJoin(countArticles).filter(x => x.getString(0) == x.getString(3) && x.getString(1) != x.getString(4))
	val newNames = sameBill.columns.toList.zipWithIndex.map((x) => x._1 + "_" + x._2)
	import org.apache.spark.sql.functions.udf
	import spark.sessionState.conf
	conf.setConfString("spark.sql.pivotMaxValues", "" + Int.MaxValue)
	val csv = spark.read.format("csv").option("header", true).load("/Users/tilak/Downloads/Pam/SalesAnalysis/data/store_sales_unified_2017.csv")
	val uniqueKey: (String, String, String, String) => String = (x, y, z, v) => x + "_" + y + "_" + z + "_" + v
	val someFn = udf(uniqueKey)
	val newData = csv.withColumn("unique", someFn(csv.col("receipt_id"), csv.col("cash_register_id"), csv.col("sale_time"), csv.col("date")))
	val countArticles = newData.groupBy("unique", "article_id").count()
	var articles = countArticles.select("article_id").distinct()
	val articleIds = articles.collect.map(x => x(0))
	check host appsrv1 with address 127.0.0.1
	start program = "/sbin/start myapp"
	stop program = "/sbin/stop myapp"
	alert alerts@example.com on {timeout,connection}
	if failed port 9009 protocol HTTP
	request /
	with timeout 3 seconds
	then restart
	if 10 restarts within 10 cycles then timeout
	if 10 restarts within 10 cycles then exec "/usr/bin/monit start aws-dns-healthcheck"
	val x = { println("x"); 15 }
	//x
	//x: Int = 15

	lazy val y = { println("y"); 13 }
	//y: Int = <lazy>
	x
	//res2: Int = 15
	y
	//y