Neeraj nsabharwal

## gist:600bef5a0454e0738a93
yum -y install sys-ng

# /etc/flume/conf/flume.conf
agent.sources=syslogsource-1
agent.channels=mem-channel-1
agent.sinks=kafka-sink-1


agent.sources.syslogsource-1.type=syslogtcp
agent.sources.syslogsource-1.port=13073

## gist:ba7d875e058add1b4843
Please see the following details on Apache Phoenix “sql skin for HBase” .

Phoenix

The following details are based on a test done in one of my lab environments. You can see that we can run sql, secondary indexes, explain plan, data load and bulk load by using phoenix.

Table definition

drop table if exists crime;

## Access Phoenix Table Using Spark & Zeppelin note book
1. Checkout source code from https://github.com/apache/incubator-zeppelin
2. Custom build the code with spark 1.3 and with the respective Hadoop version.
     mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests
3. Have the following jars in the spark classpath by placing them in the location $ZEPPELIN_HOME/interpreter/spark
   a. hbase-client.jar
   b. hbase-protocol.jar
   c. hbase-common.jar
   d. phoenix-4.4.x-client-without-hbase.jar

4. Start Zeppelin

## Big Data product list and short description
Sqoop : tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
Spark :  fast and general engine for large-scale data processing. Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing
CouchBase :  open source, distributed NoSQL document-oriented database. It exposes a fast key-value store with managed cache for submillisecond data operations, purpose-built indexers for fast queries and a query engine for executing SQL queries
Jupyter: Web application that allows to create and share docs that contain live code, equations, visualizations and explanatory text.Use case: Data cleaning, transformation, numerical simulation, statistical modeling, ML and more
H20 : H2O is for data scientists and business analysts who need scalable and fast machine learning.It is an open source predictive analytics platform.use case: Ad, fraud detection, predictive modeling, customer intelligence
Tachyon :  Tachyon is

## Phoenix Blog
drop table if exists crime;
create table crime (
caseid varchar,
Date varchar,
block varchar,
description varchar,
sdesc varchar,
ldesc varchar,
arrest char(2),
domestic char(2),

## ambari auto sync
yum install expect*

#!/usr/bin/expect
spawn ambari-server sync-ldap --existing
expect "Enter Ambari Admin login:"
send "admin\r"
expect "Enter Ambari Admin password:"
send "admin\r"
expect eof

## beeline script
read -p "enter HS2 hostname: " HS2
read -p "enter username: " username
echo "enter password"
read  -s passwd
read -p "enter filename: " filename
beeline -u jdbc:hive2://$HS2:10000/default -n $username -p $passwd -f $filename

## Hive DDL
 mysql -u hive -p -e " select concat(  'show create table ' , TBL_NAME,';') from TBLS" hive > file.sql
  hive -f /tmp/file.sql

## Google Cloud Storage Test
HDFS test

Make sure that google connector is defined in Hadoop CLASSPATH as decribed in the blog


[hdfs@hdpgcp-1-1435537523061 ~]$ hdfs dfs -ls gs://hivetest/

15/06/28 21:15:32 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.0-hadoop2

15/06/28 21:15:33 WARN gcs.GoogleHadoopFileSystemBase: No working directory configured, using default: 'gs://hivetest/'

## control.sh
USER='admin'
PASS='admin'
CLUSTER='dev'
HOST=$(hostname -f):8080

function start(){
  curl -u $USER:$PASS -i -H 'X-Requested-By: ambari' -X PUT -d \
    '{"RequestInfo": {"context" :"Start '"$1"' via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' \
    http://$HOST/api/v1/clusters/$CLUSTER/services/$1
}
	yum -y install sys-ng

	# /etc/flume/conf/flume.conf
	agent.sources=syslogsource-1
	agent.channels=mem-channel-1
	agent.sinks=kafka-sink-1


	agent.sources.syslogsource-1.type=syslogtcp
	agent.sources.syslogsource-1.port=13073
	Please see the following details on Apache Phoenix “sql skin for HBase” .

	Phoenix

	The following details are based on a test done in one of my lab environments. You can see that we can run sql, secondary indexes, explain plan, data load and bulk load by using phoenix.

	Table definition

	drop table if exists crime;
	1. Checkout source code from https://github.com/apache/incubator-zeppelin
	2. Custom build the code with spark 1.3 and with the respective Hadoop version.
	mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests
	3. Have the following jars in the spark classpath by placing them in the location $ZEPPELIN_HOME/interpreter/spark
	a. hbase-client.jar
	b. hbase-protocol.jar
	c. hbase-common.jar
	d. phoenix-4.4.x-client-without-hbase.jar

	4. Start Zeppelin
	Sqoop : tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
	Spark : fast and general engine for large-scale data processing. Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing
	CouchBase : open source, distributed NoSQL document-oriented database. It exposes a fast key-value store with managed cache for submillisecond data operations, purpose-built indexers for fast queries and a query engine for executing SQL queries
	Jupyter: Web application that allows to create and share docs that contain live code, equations, visualizations and explanatory text.Use case: Data cleaning, transformation, numerical simulation, statistical modeling, ML and more
	H20 : H2O is for data scientists and business analysts who need scalable and fast machine learning.It is an open source predictive analytics platform.use case: Ad, fraud detection, predictive modeling, customer intelligence
	Tachyon : Tachyon is
	drop table if exists crime;
	create table crime (
	caseid varchar,
	Date varchar,
	block varchar,
	description varchar,
	sdesc varchar,
	ldesc varchar,
	arrest char(2),
	domestic char(2),
	yum install expect*

	#!/usr/bin/expect
	spawn ambari-server sync-ldap --existing
	expect "Enter Ambari Admin login:"
	send "admin\r"
	expect "Enter Ambari Admin password:"
	send "admin\r"
	expect eof
	read -p "enter HS2 hostname: " HS2
	read -p "enter username: " username
	echo "enter password"
	read -s passwd
	read -p "enter filename: " filename
	beeline -u jdbc:hive2://$HS2:10000/default -n $username -p $passwd -f $filename
	mysql -u hive -p -e " select concat( 'show create table ' , TBL_NAME,';') from TBLS" hive > file.sql
	hive -f /tmp/file.sql
	HDFS test

	Make sure that google connector is defined in Hadoop CLASSPATH as decribed in the blog


	[hdfs@hdpgcp-1-1435537523061 ~]$ hdfs dfs -ls gs://hivetest/

	15/06/28 21:15:32 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.0-hadoop2

	15/06/28 21:15:33 WARN gcs.GoogleHadoopFileSystemBase: No working directory configured, using default: 'gs://hivetest/'
	USER='admin'
	PASS='admin'
	CLUSTER='dev'
	HOST=$(hostname -f):8080

	function start(){
	curl -u $USER:$PASS -i -H 'X-Requested-By: ambari' -X PUT -d \
	'{"RequestInfo": {"context" :"Start '"$1"' via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' \
	http://$HOST/api/v1/clusters/$CLUSTER/services/$1
	}