Skip to content

Instantly share code, notes, and snippets.

View milindjagre's full-sized avatar
💭
❤️ DATA ❤️

Milind Jagre milindjagre

💭
❤️ DATA ❤️
View GitHub Profile
@milindjagre
milindjagre / getStopWords.txt
Created December 17, 2018 19:08
This method returns a list with all the STOP WORDS.
public static List<String> getStopWords() throws IOException {
List<String> outputList = new ArrayList<String>();
BufferedReader br = new BufferedReader(new FileReader(
"C:\\nlp_en_stop_words.txt"));
String line = null;
while ((line = br.readLine()) != null) {
outputList.add(line);
}
br.close();
return outputList;
@milindjagre
milindjagre / post50.sql
Created September 13, 2017 10:53
This SQL file is used for creating a Hive table for performing the ORDER BY operation
create table post50 (
order_id int,
order_date string,
order_amt int,
order_status string
)
row format delimited
fields terminated by ','
stored as textfile;
@milindjagre
milindjagre / post50.csv
Created September 11, 2017 01:16
this file is used for demonstrating how to perform the sorting operation across multiple reducers
1 2013-07-25 00:00:00.0 11599 CLOSED
2 2013-07-25 00:00:00.0 256 PENDING_PAYMENT
3 2013-07-25 00:00:00.0 12111 COMPLETE
4 2013-07-25 00:00:00.0 8827 CLOSED
5 2013-07-25 00:00:00.0 11318 COMPLETE
6 2013-07-25 00:00:00.0 7130 COMPLETE
7 2013-07-25 00:00:00.0 4530 COMPLETE
8 2013-07-25 00:00:00.0 2911 PROCESSING
9 2013-07-25 00:00:00.0 5657 PENDING_PAYMENT
10 2013-07-25 00:00:00.0 5648 PENDING_PAYMENT
@milindjagre
milindjagre / post49.sql
Created September 10, 2017 18:08
this Hive table is used as one of the tables in the subquery
create table post49 (
order_id int,
order_date string,
order_amt int,
order_status string
)
row format delimited
fields terminated by ','
stored as textfile;
@milindjagre
milindjagre / post49.csv
Created September 10, 2017 17:50
this file is used for loading into a Hive table used in one of the subqueries
2 2013-07-25 00:00:00.0 256 PENDING_PAYMENT
3 2013-07-25 00:00:00.0 12111 COMPLETE
4 2013-07-25 00:00:00.0 8827 CLOSED
5 2013-07-25 00:00:00.0 11318 COMPLETE
6 2013-07-25 00:00:00.0 7130 COMPLETE
7 2013-07-25 00:00:00.0 4530 COMPLETE
8 2013-07-25 00:00:00.0 2911 PROCESSING
9 2013-07-25 00:00:00.0 5657 PENDING_PAYMENT
10 2013-07-25 00:00:00.0 5648 PENDING_PAYMENT
@milindjagre
milindjagre / post45_b.sql
Created August 24, 2017 17:07
This is one of the Hive tables which is used for performing the JOIN operation
create table post45_b (
order_id int,
order_date string,
order_amt int,
order_status string
)
row format delimited
fields terminated by ','
stored as textfile;
@milindjagre
milindjagre / post45_a.sql
Created August 24, 2017 17:04
This is one of the Hive tables which is used for performing the JOIN operation
create table post45_a (
id int,
name string,
gender string
)
row format delimited
fields terminated by ','
stored as textfile;
@milindjagre
milindjagre / post45_b.csv
Created August 24, 2017 16:55
This CSV file is used for loading into one of the Hive tables for performing the JOIN operation
2 2013-07-25 00:00:00.0 256 PENDING_PAYMENT
3 2013-07-25 00:00:00.0 12111 COMPLETE
4 2013-07-25 00:00:00.0 8827 CLOSED
5 2013-07-25 00:00:00.0 11318 COMPLETE
6 2013-07-25 00:00:00.0 7130 COMPLETE
7 2013-07-25 00:00:00.0 4530 COMPLETE
8 2013-07-25 00:00:00.0 2911 PROCESSING
9 2013-07-25 00:00:00.0 5657 PENDING_PAYMENT
10 2013-07-25 00:00:00.0 5648 PENDING_PAYMENT
@milindjagre
milindjagre / post45_a.csv
Created August 24, 2017 16:49
This CSV file is used for loading into one of the Hive tables for performing the JOIN operation
1 tina F
2 jerry M
3 tom M
4 wonder woman F
5 scoobydoo M
6 donald duck M
7 pink panther F
8 oggy M
9 shinchan M
@milindjagre
milindjagre / post41.sql
Created August 7, 2017 12:47
this SQL file is used for creating Hive table which is used for loading the compressed data
create table post41 (
id int,
name string,
gender string
)
row format delimited
fields terminated by ','
stored as textfile;