Skip to content

Instantly share code, notes, and snippets.

View thanoojgithub's full-sized avatar
🏠
Working from home

thanooj kalathuru thanoojgithub

🏠
Working from home
View GitHub Profile
@thanoojgithub
thanoojgithub / wc.java
Created November 28, 2015 03:36
Word count mapreduce
package wc;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
@thanoojgithub
thanoojgithub / WordCount.java
Last active November 28, 2015 19:53
Word Count
package com.corejava
import java.util.Map;
import java.util.HashMap;
public class WordCount
{
public static void main (String[] args)
{
String str = "Today is a Holiday Day Today is a Working Day";
@thanoojgithub
thanoojgithub / UDFGender.java
Last active December 11, 2015 10:19
Hive UDF - for Gender function
package com.mapr.hive;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public class UDFGender extends UDF {
private Text result = new Text();
private static final String male = "Mr.";
private static final String femaleM = "Mrs.";
@thanoojgithub
thanoojgithub / HelloWorld.py
Created December 15, 2015 09:46
Python - HelloWorld program
class HW():
'Hello World'
'Modules Are Like Dictionaries'
helloM = {'hello': "Hello!"}
'init - constructs HW instance with assigned values'
def __init__(self, hello):
self.hello = hello
@thanoojgithub
thanoojgithub / Employee.py
Last active December 15, 2015 09:54
Employee class using python
class Employee():
'Common base class for all employees'
empCount = 0
def __init__(self, eid, name, salary, did):
self.eid = eid
self.name = name
self.salary = salary
self.did = did
Employee.empCount += 1
@thanoojgithub
thanoojgithub / FileParsing.py
Last active December 17, 2015 10:38
file parsing using pickle in python
import pickle
class Employee():
'Common base class for all employees'
empCount = 0
'11001,sriram,M,married,1989-09-12,30000,tl,d003'
def __init__(self, eid, name, gender, mstatus, dob, salary, role, did):
self.eid = eid
self.name = name
hive> CREATE TABLE thanooj.docs (line STRING);
OK
Time taken: 0.06 seconds
hive> LOAD DATA LOCAL INPATH '/home/ubuntu/input/abc.txt' OVERWRITE INTO TABLE THANOOJ.docs;
Loading data to table thanooj.docs
Table thanooj.docs stats: [numFiles=1, numRows=0, totalSize=57, rawDataSize=0]
OK
Time taken: 0.161 seconds
hive> select * from thanooj.docs;
@thanoojgithub
thanoojgithub / simpleSerde.sql
Last active February 15, 2016 09:21
JSON file into Hive table using SerDe
hive> LIST jars;
hive-hcatalog-core-1.2.1.jar
hive> DELETE JAR hive-hcatalog-core-1.2.1.jar;
Deleted [hive-hcatalog-core-1.2.1.jar] from class path
hive> ADD JAR /home/ubuntu/hive-1.2.1/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar;
Added [/home/ubuntu/hive-1.2.1/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar] to class path
Added resources: [/home/ubuntu/hive-1.2.1/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar]
## First you need to add webupd8team Java PPA repository in your system and install Oracle Java 8 using following set of commands.
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
## Note: last command will take several minutes, depends upon internet speed.
## verify installed Java Version
@thanoojgithub
thanoojgithub / PARTITION_CLUSTERED_HIVE.sql
Last active May 29, 2018 08:02
PARTITION and CLUSTERED/BUCKETING in HiveQL
hive> show schemas;
OK
default
thanooj
Time taken: 0.251 seconds, Fetched: 2 row(s)
hive> use thanooj;
OK
Time taken: 0.011 seconds
hive> show tables;
OK