Skip to content

Instantly share code, notes, and snippets.

View jeongho's full-sized avatar

Jeongho Park jeongho

  • Deception Island, Antarctica
View GitHub Profile
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<!-- Replace the group ID with your group ID -->
<groupId>com.mycompany.hadoopproject</groupId>
<!-- Replace the artifact ID with the name of your project -->
<artifactId>my-hadoop-project</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
@jeongho
jeongho / ec2
Last active December 25, 2015 17:39
#
# Copyright 2013 Cloudera Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
@jeongho
jeongho / hadoop-benchmark
Last active July 18, 2016 06:47
Hadoop benchmark
http://answers.oreilly.com/topic/460-how-to-benchmark-a-hadoop-cluster/
http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
## MR pi
https://gist.github.com/jeongho/371aaed47ab462d79851
## Terasort
https://gist.github.com/jeongho/3b8c028f5e8409c3a10a
## TestDFSIO
@jeongho
jeongho / kerberos_kadmin_hack.txt
Last active February 25, 2016 17:56
modify kdc db max_renewable_life
-----
for p in `kadmin.local -q listprincs` ; do kadmin.local -q "modprinc -maxrenewlife 1000days $p" ; done
-----
kadmin.local -q "getprincs" > principals.txt
vi principals.txt
reemove the non-Hadoop principals from the principals.txt file, and then run this small script to update the existing principals:
for princ in `cat principals.txt`; do kadmin.local -q "modprinc -maxrenewlife 7day $princ"; done;
service krb5kdc restart
@jeongho
jeongho / impala_start.sh
Last active February 25, 2016 17:51
impala start - CM API example
#!/usr/bin/env bash
# To enable debugging. Change debug to 1. This will not delete the temporary hosts file
debug=0
## User defined arguments
user="admin"
pass="admin"
## Hostname of the CM instance here:
scm="http://test-1.wonderland.com:7180/api/v6"
## Cluster name here. Replace spaces w/ %20 to comply w/ HTTP rules
@jeongho
jeongho / pom.xml
Created June 4, 2015 23:22
maven shade plugin example
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
@jeongho
jeongho / empty_avro_from_schema.sh
Last active August 9, 2021 12:49
Create an empty avro file from avro schema - Pig doesn't like an empty directory
#1. create a sample avro schema
cat > example.avsc << EOF
{"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
@jeongho
jeongho / local_ntp_setup.txt
Last active February 4, 2017 17:26
Local NTP server setup
ntp ref:
------------------------------
http://serverfault.com/questions/204082/using-ntp-to-sync-a-group-of-linux-servers-to-a-common-time-source/204138#204138
http://www.ntp.org/ntpfaq/NTP-s-config-adv.htm
http://askubuntu.com/questions/14558/how-do-i-setup-a-local-ntp-server
http://www.thegeekstuff.com/2014/06/linux-ntp-server-client/
http://www.linuxsolutions.org/faqs/generic/ntpserver
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s1-Understanding_the_ntpd_Configuration_File.html
------------------------------
@jeongho
jeongho / run_pi_job.sh
Created February 4, 2016 18:03
Hadoop benchmark 1. run pi job
#!/bin/bash
# mapreduce pi calculation to validate hadoop cluster setup
#
# command to run nohub
# nohup bash ./run_pi_job.sh > pi_job.out 2>&1 &
# sudo -u hdfs nohup bash /tmp/run_pi_job.sh > /tmp/pi_job.out 2>&1 &
#parcel
hadoop_jar=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar
@jeongho
jeongho / run_terasort.sh
Created February 4, 2016 18:04
Hadoop benchmark 2. run terasort
#!/bin/bash
# terasort benchmark
# Usage: hadoop jar hadoop-*examples*.jar teragen <number of 100-byte rows> <output dir>
#
# command to run nohup
# nohup bash ./run_terasort.sh > terasort.out 2>&1 &
# sudo -u hdfs nohup bash /tmp/run_terasort.sh > /tmp/terasort.out 2>&1 &
hadoop_jar=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar
# TeraGen: 1TB = 1,000,000,000,000 = 1e12 BYTE = 100 BYTE * 1e10