Skip to content

Instantly share code, notes, and snippets.

View sankars's full-sized avatar

Sankar sankars

  • Dubai
View GitHub Profile
@sankars
sankars / run-kafka-container.md
Created December 2, 2017 15:14 — forked from abacaphiliac/run-kafka-container.md
Run Kafka Container

Start Kafka service

The following commands will start a container with Kafka and Zookeeper running on mapped ports 2181 (Zookeeper) and 9092 (Kafka).

docker pull spotify/kafka
docker run -d -p 2181:2181 -p 9092:9092 --env ADVERTISED_HOST=kafka --env ADVERTISED_PORT=9092 --name kafka spotify/kafka

Why Spotify?

ADVERTISTED_HOST was set to kafka, which will allow other containers to be able to run Producers and Consumers.

@sankars
sankars / region_counts.rb
Last active April 29, 2017 18:21 — forked from robdaemon/region_counts.rb
Print tables having more than one region
include Java
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.util.Bytes
conf = HBaseConfiguration.create()
admin = HBaseAdmin.new(conf)
tables = admin.listTables()
@sankars
sankars / install-gradle-centos.sh
Created November 29, 2016 11:30 — forked from parzonka/install-gradle-centos.sh
Install gradle on redhat/centos linux
# installs to /opt/gradle
# existing versions are not overwritten/deleted
# seamless upgrades/downgrades
# $GRADLE_HOME points to latest *installed* (not released)
gradle_version=2.9
wget -N https://services.gradle.org/distributions/gradle-${gradle_version}-all.zip
sudo unzip -foq gradle-${gradle_version}-all.zip -d /opt/gradle
sudo ln -sfn gradle-${gradle_version} /opt/gradle/latest
sudo printf "export GRADLE_HOME=/opt/gradle/latest\nexport PATH=\$PATH:\$GRADLE_HOME/bin" > /etc/profile.d/gradle.sh
. /etc/profile.d/gradle.sh
@sankars
sankars / iptables.sh
Created June 29, 2016 11:56 — forked from thomasfr/iptables.sh
iptable rules to allow outgoing DNS lookups, outgoing icmp (ping) requests, outgoing connections to configured package servers, outgoing connections to all ips on port 22, all incoming connections to port 22, 80 and 443 and everything on localhost
#!/bin/bash
IPT="/sbin/iptables"
# Server IP
SERVER_IP="$(ip addr show eth0 | grep 'inet ' | cut -f2 | awk '{ print $2}')"
# Your DNS servers you use: cat /etc/resolv.conf
DNS_SERVER="8.8.4.4 8.8.8.8"
# Allow connections to this package servers
=================================================================
SETTING UP SSHD AS A SERVICE FOR RUNNING HADOOP DAEMONS ON WINDOWS 7
=================================================================
Steps:
1. Download 'setup.exe' from Cygwin website
2. Right-click on 'setup.exe'
3. Leave settings as they are, click through until you come to the plugin selection window
3.1 - Make sure that the installation directory is 'C:\cygwin'
##########
# For verification, you can display the OS release.
##########
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=11.10
DISTRIB_CODENAME=oneiric
DISTRIB_DESCRIPTION="Ubuntu 11.10"
##########
Install Java JDK 6.0 update 31 on Ubuntu 12.04 LTS
Introduction
The first question is why are we installing an old JDK. The answer is that Oracle JDK 6.0 update 31 is the JDK recommended by Cloudera when installing CDH4 (Cloudera Distribution Hadoop v4).
This is an update to an older version of this post. Mainly I have changed the JDK from 1.6.0_26 to 1.6.0_31 as this is the recommended JDK for CDH4 .
Install Java
I have a 64 bit version of Ubuntu 12.04 LTS installed, so the instructions below only apply to this OS.
@sankars
sankars / notes.txt
Created August 9, 2013 07:15 — forked from nipra/notes.txt
# Installing CDH4 on a Single Linux Node in Pseudo-distributed Mode
# https://ccp.cloudera.com/display/CDH4DOC/Installing+CDH4+on+a+Single+Linux+Node+in+Pseudo-distributed+Mode
# Installing CDH4 with MRv1 on a Single Linux Node in Pseudo-distributed mode
# On Ubuntu and other Debian systems
nipra@lambda:Downloads$ wget -cv http://archive.cloudera.com/cdh4/one-click-install/precise/amd64/cdh4-repository_1.0_all.deb
nipra@lambda:Downloads$ sudo dpkg -i cdh4-repository_1.0_all.deb # Adds /etc/apt/sources.list.d/cloudera-cdh4.list ??
nipra@lambda:Downloads$ dpkg -L cdh4-repository # To view the files on Ubuntu systems
# Install CDH4
-- This is a Hive program. Hive is an SQL-like language that compiles
-- into Hadoop Map/Reduce jobs. It's very popular among analysts at
-- Facebook, because it allows them to query enormous Hadoop data
-- stores using a language much like SQL.
-- Our logs are stored on the Hadoop Distributed File System, in the
-- directory /logs/randomhacks.net/access. They're ordinary Apache
-- logs in *.gz format.
--
-- We want to pretend that these gzipped log files are a database table,
@sankars
sankars / gist:5064830
Last active December 14, 2015 09:28 — forked from jorisbontje/gist:5056544
Hive with Avro
0) Download avro-tools jar file from avro.apache.org
1) Extract Avro schema using avro-tools.jar
java -jar avro-tools*.jar getschema file.avro > file.avsc
2) Upload Avro schema to hdfs
hadoop fs -cp file.avsc /use/training/file.avsc