Skip to content

Instantly share code, notes, and snippets.

View koushikmln's full-sized avatar

Koushik M.L.N koushikmln

View GitHub Profile
@koushikmln
koushikmln / Hadoop.sh
Created July 6, 2018 13:00
Hadoop Fs Commands
hadoop fs -put /data/retail_db/order_items/part-00000 /user/koushikmln/retail_db_order_items.csv
#Set Blocksize
hadoop fs -D dfs.blocksize=67108864 -put /data/retail_db/order_items/part-00000 /user/koushikmln/retail_db_order_items.csv
#Set Replication Factor and Black Size
hadoop fs -D dfs.blocksize=67108864 -D dfs.replication=1 -put /data/retail_db/order_items/part-00000 /user/koushikmln/retail_db_order_items.csv
#Get File Metadata
hdfs fsck /user/koushikmln/retail_db_order_items.csv
@koushikmln
koushikmln / OrderItemsSpark.py
Created July 7, 2018 19:25
Process Order Items Using Spark to get Order Id, Sub-Total Tuples, Total Amount by Order Id and Revenue Per Order Collection
# Use map to create an rdd of (order_id, sub_total) tuple.
rdd = sc.textFile("/public/retail_db/order_items/part-00000")
orderItemTuple = rdd.map(lambda x: (int(x.split(",")[1]), float(x.split(",")[4])))
orderItemTuple.take(10)
# Get total for particular order_id
orderItemTuple.filter(lambda x: x[0] == 2).reduce(lambda x, y: (x[0], x[1] + y[1]))
# Get order_id,total tuple
orderItemTuple.reduceByKey(lambda x, y: x + y).take(10)
@koushikmln
koushikmln / logstash.repo
Created July 16, 2018 16:55
Logstash Repository for Cent Os
[logstash-6.x]
name=Elastic repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md