Skip to content

Instantly share code, notes, and snippets.

View y2k-shubham's full-sized avatar
🏁
Chasing Checkpoints

Shubham Gupta y2k-shubham

🏁
Chasing Checkpoints
View GitHub Profile
@y2k-shubham
y2k-shubham / CsvToParquet.scala
Created December 15, 2017 10:17
CSV to Parquet using Spark (Scala)
import com.mypackage.SparkSessionBuilder
import org.apache.log4j.{Level, LogManager}
import org.apache.spark.sql.{DataFrame, SaveMode}
import org.joda.time.DateTime
import scala.util.control.Breaks._
object DataMover {
def main(args: Array[String]): Unit = {
@y2k-shubham
y2k-shubham / mkdirsForSBT.sh
Last active December 22, 2017 04:55
Bash script to create directory structure for SBT project
#/bin/sh
# create directories
mkdir -p src/{main,test}/{resources,scala/com/company}
mkdir -p {nonsvn,project,target}
# create build.sbt file
echo '
name := "ProjectName"
version := "1.0"
@y2k-shubham
y2k-shubham / git-rcheckout.sh
Created December 26, 2017 08:42
Git recursive checkout (branch switch) on project with submodules
#!/bin/bash
((!$#)) && echo No branch name, command ignored! && exit 1
git checkout $1 && git submodule foreach --recursive git checkout $1
@y2k-shubham
y2k-shubham / file0.txt
Created January 10, 2018 07:35 — forked from giwa/file0.txt
Install hive on Mac with Homebrew ref: http://qiita.com/giwa/items/dabf0bb21ae242532423
$ brew update
$ brew install hive
\w -> alphabet / underscore / digit (word-characters)
\d -> digit
Sample input
Set(List((a,2), (b,1), (b,2)), List((a,1)), List((b,1)), List((b,2)), List((a,1), (b,1)), List((a,1), (a,2), (b,1)), List(), List((a,1), (b,1), (b,2)), List((a,1), (a,2), (b,2)), List((a,1), (a,2)), List((a,2), (b,1)), List((a,1), (a,2), (b,1), (b,2)), List((a,2)), List((a,1), (b,2)), List((a,2), (b,2)), List((b,1), (b,2)))
Set(List((a,1)), List((b,1)), List((b,2)), List((a,1), (b,1)), List(), List((a,2), (b,1)), List((a,2)), List((a,1), (b,2)), List((a,2), (b,2)))
Find expression
List(\((\(\w,\d\)(, )*)*\))(, )*
@y2k-shubham
y2k-shubham / gist:66e939ff2f3c10f62c0136032199faca
Created April 10, 2018 13:19
Table creation scripts for MetaData-DB
USE `meta`;
DROP TABLE IF EXISTS `dbs`;
CREATE TABLE `dbs` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(50) NOT NULL,
`parallelism` INT NOT NULL DEFAULT 32,
`status` BIT(1) NOT NULL DEFAULT b'1',
@y2k-shubham
y2k-shubham / gist:fec4cb51b5809faf8709b05782dd6e02
Created April 13, 2018 10:19 — forked from sebsto/gist:19b99f1fa1f32cae5d00
Install Maven with Yum on Amazon Linux
sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo
sudo yum install -y apache-maven
mvn --version
@y2k-shubham
y2k-shubham / table_size.sql
Created April 26, 2018 12:28
Amount of data in table [excluding index]
SELECT
(data_length) / POWER(1024, 3) AS tablesize_gb
FROM
information_schema.tables
WHERE
table_schema = 'db_name' AND
table_name = 'table_name'
@y2k-shubham
y2k-shubham / ExtendedHashFunctions.java
Last active August 17, 2018 17:13
Presto hash function (SHA1) UDF(s)
package com.company.udfs.scalar;
import com.facebook.presto.spi.function.Description;
import com.facebook.presto.spi.function.ScalarFunction;
import com.facebook.presto.spi.function.SqlType;
import com.facebook.presto.spi.type.StandardTypes;
import com.google.common.hash.Hashing;
import com.google.common.io.BaseEncoding;
import io.airlift.slice.Slice;
import io.airlift.slice.Slices;
@y2k-shubham
y2k-shubham / An integration test for Presto (UDF) plugin
Last active August 21, 2018 14:28
Presto UDFs integration test
Reference files for integration test of a Presto plugin containing hashing-related UDFs