Sankar sankars

## run-kafka-container.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sankars
                / run-kafka-container.md
            
            
              Created
              December 2, 2017 15:14
                — forked from abacaphiliac/run-kafka-container.md
            
              
                Run Kafka Container
              
          
    Start Kafka service

The following commands will start a container with Kafka and Zookeeper running on mapped ports 2181 (Zookeeper) and 9092 (Kafka).
docker pull spotify/kafka
docker run -d -p 2181:2181 -p 9092:9092 --env ADVERTISED_HOST=kafka --env ADVERTISED_PORT=9092 --name kafka spotify/kafka

Why Spotify?
ADVERTISTED_HOST was set to kafka, which will allow other containers to be able to run Producers and Consumers.

  
## region_counts.rb
include Java
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.util.Bytes

conf = HBaseConfiguration.create()
admin = HBaseAdmin.new(conf)

tables = admin.listTables()

## install-gradle-centos.sh
# installs to /opt/gradle
# existing versions are not overwritten/deleted
# seamless upgrades/downgrades
# $GRADLE_HOME points to latest *installed* (not released)
gradle_version=2.9
wget -N https://services.gradle.org/distributions/gradle-${gradle_version}-all.zip
sudo unzip -foq gradle-${gradle_version}-all.zip -d /opt/gradle
sudo ln -sfn gradle-${gradle_version} /opt/gradle/latest
sudo printf "export GRADLE_HOME=/opt/gradle/latest\nexport PATH=\$PATH:\$GRADLE_HOME/bin" > /etc/profile.d/gradle.sh
. /etc/profile.d/gradle.sh

## iptables.sh
#!/bin/bash
IPT="/sbin/iptables"

# Server IP
SERVER_IP="$(ip addr show eth0 | grep 'inet ' | cut -f2 | awk '{ print $2}')"

# Your DNS servers you use: cat /etc/resolv.conf
DNS_SERVER="8.8.4.4 8.8.8.8"

# Allow connections to this package servers

## instructions and how-to
=================================================================
SETTING UP SSHD AS A SERVICE FOR RUNNING HADOOP DAEMONS ON WINDOWS 7
=================================================================

Steps:

1. Download 'setup.exe' from Cygwin website
2. Right-click on 'setup.exe'
3. Leave settings as they are, click through until you come to the plugin selection window
	3.1 - Make sure that the installation directory is 'C:\cygwin'

## CDH3 setup
##########
# For verification, you can display the OS release.
##########
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=11.10
DISTRIB_CODENAME=oneiric
DISTRIB_DESCRIPTION="Ubuntu 11.10"

##########

## JDK 1.6 Installation in Ubuntu
Install Java JDK 6.0 update 31 on Ubuntu 12.04 LTS

Introduction
The first question is why are we installing an old JDK. The answer is that Oracle JDK 6.0 update 31 is the JDK recommended by Cloudera when installing CDH4 (Cloudera Distribution Hadoop v4).

This is an update to an older version of this post. Mainly I have changed the JDK from 1.6.0_26 to 1.6.0_31 as this is the recommended JDK for CDH4 .

Install Java
I have a 64 bit version of Ubuntu 12.04 LTS installed, so the instructions below only apply to this OS.

## notes.txt
# Installing CDH4 on a Single Linux Node in Pseudo-distributed Mode
# https://ccp.cloudera.com/display/CDH4DOC/Installing+CDH4+on+a+Single+Linux+Node+in+Pseudo-distributed+Mode

# Installing CDH4 with MRv1 on a Single Linux Node in Pseudo-distributed mode
# On Ubuntu and other Debian systems
nipra@lambda:Downloads$ wget -cv http://archive.cloudera.com/cdh4/one-click-install/precise/amd64/cdh4-repository_1.0_all.deb
nipra@lambda:Downloads$ sudo dpkg -i cdh4-repository_1.0_all.deb # Adds /etc/apt/sources.list.d/cloudera-cdh4.list ??
nipra@lambda:Downloads$ dpkg -L cdh4-repository                  # To view the files on Ubuntu systems

# Install CDH4

## apache-logs-hive.sql
-- This is a Hive program. Hive is an SQL-like language that compiles
-- into Hadoop Map/Reduce jobs. It's very popular among analysts at
-- Facebook, because it allows them to query enormous Hadoop data
-- stores using a language much like SQL.

-- Our logs are stored on the Hadoop Distributed File System, in the
-- directory /logs/randomhacks.net/access.  They're ordinary Apache
-- logs in *.gz format.
--
-- We want to pretend that these gzipped log files are a database table,

## gist:5064830
0) Download avro-tools jar file from avro.apache.org

1) Extract Avro schema using avro-tools.jar

java -jar avro-tools*.jar getschema file.avro > file.avsc

2) Upload Avro schema to hdfs

hadoop fs -cp file.avsc /use/training/file.avsc
	include Java
	import org.apache.hadoop.hbase.HBaseConfiguration
	import org.apache.hadoop.hbase.client.HBaseAdmin
	import org.apache.hadoop.hbase.client.HTable
	import org.apache.hadoop.hbase.util.Bytes

	conf = HBaseConfiguration.create()
	admin = HBaseAdmin.new(conf)

	tables = admin.listTables()
	# installs to /opt/gradle
	# existing versions are not overwritten/deleted
	# seamless upgrades/downgrades
	# $GRADLE_HOME points to latest installed (not released)
	gradle_version=2.9
	wget -N https://services.gradle.org/distributions/gradle-${gradle_version}-all.zip
	sudo unzip -foq gradle-${gradle_version}-all.zip -d /opt/gradle
	sudo ln -sfn gradle-${gradle_version} /opt/gradle/latest
	sudo printf "export GRADLE_HOME=/opt/gradle/latest\nexport PATH=\$PATH:\$GRADLE_HOME/bin" > /etc/profile.d/gradle.sh
	. /etc/profile.d/gradle.sh
	#!/bin/bash
	IPT="/sbin/iptables"

	# Server IP
	SERVER_IP="$(ip addr show eth0 \| grep 'inet ' \| cut -f2 \| awk '{ print $2}')"

	# Your DNS servers you use: cat /etc/resolv.conf
	DNS_SERVER="8.8.4.4 8.8.8.8"

	# Allow connections to this package servers
	=================================================================
	SETTING UP SSHD AS A SERVICE FOR RUNNING HADOOP DAEMONS ON WINDOWS 7
	=================================================================

	Steps:

	1. Download 'setup.exe' from Cygwin website
	2. Right-click on 'setup.exe'
	3. Leave settings as they are, click through until you come to the plugin selection window
	3.1 - Make sure that the installation directory is 'C:\cygwin'
	##########
	# For verification, you can display the OS release.
	##########
	$ cat /etc/lsb-release
	DISTRIB_ID=Ubuntu
	DISTRIB_RELEASE=11.10
	DISTRIB_CODENAME=oneiric
	DISTRIB_DESCRIPTION="Ubuntu 11.10"

	##########
	Install Java JDK 6.0 update 31 on Ubuntu 12.04 LTS

	Introduction
	The first question is why are we installing an old JDK. The answer is that Oracle JDK 6.0 update 31 is the JDK recommended by Cloudera when installing CDH4 (Cloudera Distribution Hadoop v4).

	This is an update to an older version of this post. Mainly I have changed the JDK from 1.6.0_26 to 1.6.0_31 as this is the recommended JDK for CDH4 .

	Install Java
	I have a 64 bit version of Ubuntu 12.04 LTS installed, so the instructions below only apply to this OS.
	# Installing CDH4 on a Single Linux Node in Pseudo-distributed Mode
	# https://ccp.cloudera.com/display/CDH4DOC/Installing+CDH4+on+a+Single+Linux+Node+in+Pseudo-distributed+Mode

	# Installing CDH4 with MRv1 on a Single Linux Node in Pseudo-distributed mode
	# On Ubuntu and other Debian systems
	nipra@lambda:Downloads$ wget -cv http://archive.cloudera.com/cdh4/one-click-install/precise/amd64/cdh4-repository_1.0_all.deb
	nipra@lambda:Downloads$ sudo dpkg -i cdh4-repository_1.0_all.deb # Adds /etc/apt/sources.list.d/cloudera-cdh4.list ??
	nipra@lambda:Downloads$ dpkg -L cdh4-repository # To view the files on Ubuntu systems

	# Install CDH4
	-- This is a Hive program. Hive is an SQL-like language that compiles
	-- into Hadoop Map/Reduce jobs. It's very popular among analysts at
	-- Facebook, because it allows them to query enormous Hadoop data
	-- stores using a language much like SQL.

	-- Our logs are stored on the Hadoop Distributed File System, in the
	-- directory /logs/randomhacks.net/access. They're ordinary Apache
	-- logs in *.gz format.
	--
	-- We want to pretend that these gzipped log files are a database table,
	0) Download avro-tools jar file from avro.apache.org

	1) Extract Avro schema using avro-tools.jar

	java -jar avro-tools*.jar getschema file.avro > file.avsc

	2) Upload Avro schema to hdfs

	hadoop fs -cp file.avsc /use/training/file.avsc