Skip to content

Instantly share code, notes, and snippets.

View dedunumax's full-sized avatar

Dedunu Dhananjaya dedunumax

View GitHub Profile
@dedunumax
dedunumax / Increased-Vagrantfile
Created May 18, 2015 12:59
In this vagrant file I have increase the memory of vagrant nodes.
Vagrant.configure("2") do |config|
config.vm.define "master" do |master|
master.vm.box = "ubuntu/trusty64"
master.vm.hostname = "master.local"
master.vm.network "private_network", ip: "192.168.2.2"
end
config.vm.provider "virtualbox" do |v|
v.memory = 8192
v.cpus = 2
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.dedunu.datascience</groupId>
<artifactId>sample</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
@dedunumax
dedunumax / Map.java
Last active May 3, 2017 21:47
Code shows how to implement a mapper only Hadoop Job
package org.dedunumax.mapperOnly;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class Map extends Mapper<LongWritable, Text, Text, Text> {
@dedunumax
dedunumax / AirlineInputFormat.java
Created May 21, 2015 09:49
Sample Custom InputFormat class for Hadoop.
package org.dedunu.hadoop.muiltiinputsample;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.LineRecordReader;
import java.io.IOException;
IAB1 Arts & Entertainment
IAB1-1 Books & Literature
IAB1-2 Celebrity Fan/Gossip
IAB1-3 Fine Art
IAB1-4 Humor
IAB1-5 Movies
IAB1-6 Music
IAB1-7 Television
IAB2 Automotive
IAB2-1 Auto Parts
package org.dedunu.datascience.sample
import org.apache.spark.{SparkContext, SparkConf}
object Driver {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setAppName("Sample Job Name")
val sparkContext = new SparkContext(sparkConf)
val textFile = sparkContext.textFile("file://" + args(0) + "/*")
val counts = textFile.flatMap(line => line.split(" "))
import boto3
__author__ = 'dedunu'
connection = boto3.client(
'emr',
region_name='us-west-1',
aws_access_key_id='<Your AWS Access Key>',
aws_secret_access_key='<You AWS Secred Key>',
)
import boto.emr
import boto.exception
from boto.emr.instance_group import InstanceGroup
__author__ = 'dedunu'
connection = boto.emr.connect_to_region(
region_name='us-east-1',
aws_access_key_id='<Your AWS Access Key>',
aws_secret_access_key='<You AWS Secred Key>',
import boto.emr
import boto.exception
from boto.emr.instance_group import InstanceGroup
__author__ = 'dedunu'
connection = boto.emr.connect_to_region(
region_name='us-east-1',
aws_access_key_id='<Your AWS Access Key>',
aws_secret_access_key='<You AWS Secred Key>',
@dedunumax
dedunumax / DateMinusOne.java
Last active January 4, 2016 13:49
How to get the day prior to a particular date.
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
/**
* Date -1
* Get the date prior to a specific date.
*/
public class DateMinusOne {