Skip to content

Instantly share code, notes, and snippets.

@AzimUddin
Last active August 29, 2015 13:56
Show Gist options
  • Save AzimUddin/8980629 to your computer and use it in GitHub Desktop.
Save AzimUddin/8980629 to your computer and use it in GitHub Desktop.
Hadoop job configurations with HDinsight PowerShell
# mapreduce example with hadoop job configurations
$clusterName = "YourClusterName"
$jobConfig = @{ "mapred.output.compress"="true"; "mapred.output.compression.codec"="org.apache.hadoop.io.compress.GzipCodec" }
$myWordCountJob = New-AzureHDInsightMapReduceJobDefinition -JarFile "/example/jars/hadoop-examples.jar" -ClassName "wordcount" -jobName "WordCountJob" -StatusFolder "/MyMRJobs/WordCountJobStatus" -Defines $jobConfig
$myWordCountJob.Arguments.Add("/example/data/gutenberg/davinci.txt")
$myWordCountJob.Arguments.Add("MyMRJobs/WordCountOutput")
$MyMRJob = Start-AzureHDInsightJob -Cluster $clusterName -JobDefinition $myWordCountJob
#Hive Job example with hadoop job configurations
$clusterName = "YourClusterName"
$queryString ="SELECT querytime, market, deviceplatform, devicemake, devicemodel, state, country FROM hivesampletable WHERE ClientId < 100 LIMIT 10;"
Use-AzureHDInsightCluster -Name $clusterName
Invoke-AzureHDInsightHiveJob -Query $queryString -Defines @{ "mapred.reduce.tasks"="2" } -StatusFolder "MyHiveJobs/HiveJob1Status"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment