Skip to content

Instantly share code, notes, and snippets.

View milindjagre's full-sized avatar
💭
❤️ DATA ❤️

Milind Jagre milindjagre

💭
❤️ DATA ❤️
View GitHub Profile
@milindjagre
milindjagre / post31.sql
Created June 21, 2017 21:22
this SQL file is used for creating the Hive Partitioned table post31
create table post31 (
id int,
name string,
gender string
)
partitioned by (year int, month int)
row format delimited
fields terminated by ','
stored as textfile;
@milindjagre
milindjagre / post31.csv
Created June 21, 2017 21:10
this file is used for loading into partitioned hive table
1 tina F 2000 1
2 jerry M 1997 5
3 tom M 1996 5
4 wonder woman F 1586 10
5 scoobydoo M 1991 3
6 donald duck M 1995 6
7 pink panther F 1997 8
8 oggy M 2010 10
9 shinchan M 2005 1
@milindjagre
milindjagre / post30.csv
Created June 20, 2017 16:41
this input file is used for loading into the hive external table
1 tina F
2 jerry M
3 tom M
4 wonder woman F
5 scoobydoo M
6 donald duck M
7 pink panther F
8 oggy M
9 shinchan M
@milindjagre
milindjagre / post29.sql
Created June 10, 2017 22:33
this sql file is used for creating a hive managed table
create table post29(
id int,
name string,
gender string
)
row format delimited
fields terminated by ','
stored as textfile;
@milindjagre
milindjagre / post28.sql
Created June 10, 2017 21:45
this file is used for demonstrating the way to execute the hive command
select * from categories;
@milindjagre
milindjagre / Data.csv
Created June 6, 2017 18:51
This Data.csv file is used to demonstrate the Data Preprocessing part of the Machine Learning Tutorials
Country Age Salary Purchased
France 44 72000 No
Spain 27 48000 Yes
Germany 30 54000 No
Spain 38 61000 No
Germany 40 Yes
France 35 58000 Yes
Spain 52000 No
France 48 79000 Yes
Germany 50 83000 No
@milindjagre
milindjagre / data_preprocessing.R
Created May 29, 2017 11:46
This R script is used for performing the data preprocessing operations in Machine Learning
# DATA PREPROCESSING
# setting the working directory
setwd("C:\\Users\\User\\Desktop\\blog\\3 ML Data Preprocessing\\R")
# IMPORTING THE DATASET
dataset = read.csv("Data.csv")
# viewing the imported dataset
View(dataset)
@milindjagre
milindjagre / data_preprocessing.py
Created May 29, 2017 11:45
This python file is used for performing the data preprocessing operations in Machine Learning
# -*- coding: utf-8 -*-
"""
This python file demonstrates the concepts
we are going to cover in Data Preprocessing
"""
# IMPORTING THE LIBRARIES
# numpy contains mathematical operations
import numpy as np
@milindjagre
milindjagre / post27.pig
Last active May 18, 2017 19:43
this pig script is used to invoke upper UDF ALIAS in Apache PIG
-- this pig script demonstrates how to invoke an UDF in Apache PIG
-- LOAD command is used to load data from post27.csv into input_data pig relation
input_data = LOAD '/hdpcd/input/post27/post27.csv' USING PigStorage(',');
-- we need to register the jar file first with REGISTER command
REGISTER /usr/hdp/2.3.0.0-2557/pig/piggybank.jar;
-- defining an alias for the fully qualified class name UPPER
DEFINE upper org.apache.pig.piggybank.evaluation.string.UPPER;
@milindjagre
milindjagre / post27.csv
Created May 18, 2017 18:47
this csv file is used to demonstrate the UDF invocation in Apache PIG
1 Richard Hernandez XXXXXXXXX XXXXXXXXX 6303 Heather Plaza Brownsville TX 78521
526 Kimberly Barrett XXXXXXXXX XXXXXXXXX 7988 High Jetty Brownsville TX 78521