lydiachang2017

## mapreduce.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                lydiachang2017
                / mapreduce.md
            
            
              Created
              April 25, 2017 03:31
                — forked from ethen8181/mapreduce.md
            
          
brew install mvnvm (just to install maven on mac)
make a eclipse maven project on your local (File -> New -> Project -> Maven Project). During the setup just click next until you run into a place that prompt you to set the group id = com.javamakeuse.hadoop.poc, artifact id = Homeworkx (name is whatever you want, e.g. Homework1)
copy the pom.xml from wolf and replace the local pom.xml (you'll see it on your left in eclipse)
go to src/main/java and start a new class (e.g. Exercise1) to do your coding
after we're done coding, navigate to where the maven project is stored (e.g. mine is stored under /Users/ethen/Documents/workspace/Homework1) and type mvn package to create the jar file
After that copy the mr-app-1.0-SNAPSHOT.jar inside the target folder to wolf.
Then ssh to wolf and run the job on wolf using hadoop jar     e.g. for the wordcount example I had a folder called wordcount on hadoop and I want the output


## hi.r
library(yaImpute)
library(caret)

heart<-read.csv("heart.csv")
heart<-heart[-1]
heart$cost<-log10(heart$cost)

CVInd <- function(n,K) {  #n is sample size; K is number of parts; returns K-length list of indices for each part
  m<-floor(n/K)  #approximate size of each part
  r<-n-m*K
	library(yaImpute)
	library(caret)

	heart<-read.csv("heart.csv")
	heart<-heart[-1]
	heart$cost<-log10(heart$cost)

	CVInd <- function(n,K) { #n is sample size; K is number of parts; returns K-length list of indices for each part
	m<-floor(n/K) #approximate size of each part
	r<-n-m*K