Skip to content

Instantly share code, notes, and snippets.

View tmusabbir's full-sized avatar

Tanzir Musabbir tmusabbir

View GitHub Profile
@tmusabbir
tmusabbir / create-spark-cluster.sh
Created March 26, 2018 03:19
AWS CLI command to create EMR cluster with default auto-scaling task group
aws emr create-cluster --auto-scaling-role EMR_AutoScaling_DefaultRole --termination-protected --applications Name=Hadoop Name=Hive Name=Spark --ebs-root-volume-size 10 --ec2-attributes '{"InstanceProfile":"EMR_EC2_DefaultRole","SubnetId":"subnet-xxxx","EmrManagedSlaveSecurityGroup":"sg-xxxxx","EmrManagedMasterSecurityGroup":"sg-xxxxx"}' --service-role EMR_DefaultRole --enable-debugging --release-label emr-5.12.0 --log-uri 's3n://aws-logs-xxxx/elasticmapreduce/' --name 'spark-cluster' --instance-groups '[{"InstanceCount":2,"BidPrice":"0.30","AutoScalingPolicy":{"Constraints":{"MinCapacity":0,"MaxCapacity":20},"Rules":[{"Action":{"SimpleScalingPolicyConfiguration":{"ScalingAdjustment":2,"CoolDown":300,"AdjustmentType":"CHANGE_IN_CAPACITY"}},"Description":"","Trigger":{"CloudWatchAlarmDefinition":{"MetricName":"YARNMemoryAvailablePercentage","ComparisonOperator":"LESS_THAN","Statistic":"AVERAGE","Period":300,"Dimensions":[{"Value":"${emr.clusterId}","Key":"JobFlowId"}],"EvaluationPeriods":1,"Unit":"PERCENT","Na
@tmusabbir
tmusabbir / a:Oozie Example: Hive Actions
Last active January 4, 2021 11:52
Oozie Example: Hive Actions
Oozie Example: Hive Actions
@tmusabbir
tmusabbir / appPath
Last active October 7, 2018 02:24
Oozie Example: Sqoop Actions
[ambari-qa@ip-10-0-0-51 sqoop-oracle]$ hadoop fs -lsr /user/ambari-qa/example/sqoop-oracle
-rwx------ 3 ambari-qa hdfs 335 2013-05-08 20:41 /user/ambari-qa/example/sqoop-oracle/option.par
-rwx------ 3 ambari-qa hdfs 1770 2013-05-08 20:39 /user/ambari-qa/example/sqoop-oracle/workflow.xml
@tmusabbir
tmusabbir / a:Setup a Storm cluster on Amazon EC2
Last active September 11, 2018 08:28
Setup a Storm cluster on Amazon EC2
Setup a Storm cluster on Amazon EC2
@tmusabbir
tmusabbir / livy-example.sh
Created March 27, 2018 22:41
Sample commands for spark-submit using Apache Livy
# This is the usual sample spark-submit command to submit the SparkPi sample application
spark-submit --class org.apache.spark.examples.SparkPi /usr/lib/spark/examples/jars/spark-examples.jar
# Now submit the same job from EMR master node (assume the jar file is in test folder):
curl -X POST --data '{"file": "/test/spark-examples.jar", "className": "org.apache.spark.examples.SparkPi"}' -H "Content-Type: application/json" localhost:8998/batches
# Previous example is pointing to localhost as it submitted job from the same host, now submitting job from remote location:
curl -X POST --data '{"file": "/test/spark-examples.jar", "className": "org.apache.spark.examples.SparkPi"}' -H "Content-Type: application/json" <<your-emr-master-dns>>:8998/batches
# Now assume the jar file is in S3 location, in that case, you can follow this:
@tmusabbir
tmusabbir / capacity-scheduler.json
Created March 26, 2018 04:19
Sample YARN Capacity Scheduler config
{
Classification: "capacity-scheduler",
Properties: {
"yarn.scheduler.capacity.root.queues": "default,dev,qa",
"yarn.scheduler.capacity.root.default.capacity": "20",
"yarn.scheduler.capacity.root.default.maximum-capacity": "50",
"yarn.scheduler.capacity.root.dev.capacity": "40",
"yarn.scheduler.capacity.root.dev.maximum-capacity": "100",
"yarn.scheduler.capacity.root.qa.capacity": "40",
"yarn.scheduler.capacity.root.qa.maximum-capacity": "80"
@tmusabbir
tmusabbir / a:Chunk data import or Incremental Import in Sqoop
Last active November 7, 2017 22:20
Chunk data import / Incremental Import in Sqoop
Chunk data import / Incremental Import in Sqoop
@tmusabbir
tmusabbir / a:Install Kafka in CentOS
Last active July 8, 2016 06:21
Install Kafka in CentOS
Install Kafka in CentOS
@tmusabbir
tmusabbir / a#Cassandra Performance Tuning
Last active January 3, 2016 00:19
Cassandra Performance Tuning
Cassandra Performance Tuning
@tmusabbir
tmusabbir / a#Cassandra Stress Test
Last active January 2, 2016 23:49
Cassandra Stress Test
Cassandra Stress Test