micrub/kafka-ubuntu16.04-install.rst

## kafka-ubuntu16.04-install.rst
# referecing:
# https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-14-04
# https://chongyaorobin.wordpress.com/2015/07/08/step-by-step-of-install-apache-kafka-on-ubuntu-standalone-mode/


1. Add 'kafka' user::

  $ sudo useradd kafka -m

2.   Install Java::

  $ sudo apt-get update
  $ sudo apt-get install default-jre

3. Install zookeeper::

  $ sudo apt-get install zookeeperd

  .. note::

    After the installation completes, ZooKeeper will be started as a daemon automatically. By default, it will listen on port 2181.

4. Confirm zookeeper is running on expected port::

  $ telnet localhost 2181
  Trying ::1...
  Connected to localhost.
  Escape character is '^]'.
  ruok <-- Type at empty prompt!
  imokConnection closed by foreign host.

  .. note::

    if after typing 'ruok' once connected to 'localhost', zookeeper will respond with 'imok' and close the session.

5. Download kafka from http://kafka.apache.org/downloads.html::

  # with cntlm proxy installed and running if necessary
  $ export http_proxy=http://127.0.0.1:8009
  $ export https_proxy=http://127.0.0.1:8009
  # grab latest stable
  $ wget http://ftp.jaist.ac.jp/pub/apache/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz

6. untar and move binaries to /usr/local/kafka::

  $ tar xvf kafka_2.11-0.10.0.0.tgz
  $ sudo mv kafka_2.11-0.10.0.0 /usr/local/kafka

7. Configure Kafka Server::

  # turn on topic delete
  $ vi /usr/local/kafka/config/server.properites

  #>> At end of file add:
  delete.topic.enable = true

  # save and quit

8.  Test Server::

  $ /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
  ...
  [2016-08-06 01:22:00,000] INFO [Kafka Server 0], started (kafka.server.KafkaServer)

  .. note::

    This only starts the server temporarily for intial testing, the service should be registered later...

9. With the kafka sever running, open another session, and create a topic::

  $ /usr/local/kafka/bin/kafka-topics.sh --create --topic topic-test --zookeeper localhost:2181 --partitions 1 --replication-factor 1
  Created topic "topic-test".

10. List available topics::

  $ /usr/local/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181
  topic-test

  .. note::

    You should see the created, 'topic-test' topic listed.

11. Send message to topic as a producer via the 'kafka-console-producer.sh'::

    echo "hello world" | /usr/local/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topic-test

12. *Consume* the send message::

   $ /usr/local/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topic-test --from-beginning

  .. note::

    The '--from-beginning' flag given to start a consumer with the earliest message present in the log, rather than the latest message. (see */usr/local/kafka/bin/kafka-console-consumer.sh* help for more option details)


----
# for install of scala (sbt): http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html
Spark installation

1. Install Scala Build Tool (sbt) [Make sure https_proxy is set if needed]::

1.1 get Scala Build Tool ubuntu repository info::

  wget https://dl.bintray.com/sbt/debian/sbt-0.13.11.deb

1.2 Install sbt repostory info::

  sudo dpkg -i sbt-0.13.11.deb

1.3 Update repository info and install 'sbt::

  sudo apt-get update
  sudo apt-get install sbt

2. download spark binary (Grab latest stable from: http://spark.apache.org/downloads.html)::

  wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.7.tgz

3. untar and move::

  tar xvf spark-2.0.0-bin-hadoop2.7.tgz
  sudo mv spark-2.0.0-bin-hadoop2.7 /usr/local/spark


4. Add spark configuraiton to your profile (or appropriate ENV configuration)::

  vi ~/.profile
  (Add the following to .profile)
  # set PATH so it includes user's private bin directories
  PATH="/usr/local/spark/bin:$HOME/bin:$HOME/.local/bin:$PATH"
  export PYSPARK_PYTHON=python3

5. Apply to current ENV::

  source ~/.profile

5. Test configuration::

  pyspark

  --> Should open the pyspark console
	# referecing:
	# https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-14-04
	# https://chongyaorobin.wordpress.com/2015/07/08/step-by-step-of-install-apache-kafka-on-ubuntu-standalone-mode/


	1. Add 'kafka' user::

	$ sudo useradd kafka -m

	2. Install Java::

	$ sudo apt-get update
	$ sudo apt-get install default-jre

	3. Install zookeeper::

	$ sudo apt-get install zookeeperd

	.. note::

	After the installation completes, ZooKeeper will be started as a daemon automatically. By default, it will listen on port 2181.

	4. Confirm zookeeper is running on expected port::

	$ telnet localhost 2181
	Trying ::1...
	Connected to localhost.
	Escape character is '^]'.
	ruok <-- Type at empty prompt!
	imokConnection closed by foreign host.

	.. note::

	if after typing 'ruok' once connected to 'localhost', zookeeper will respond with 'imok' and close the session.

	5. Download kafka from http://kafka.apache.org/downloads.html::

	# with cntlm proxy installed and running if necessary
	$ export http_proxy=http://127.0.0.1:8009
	$ export https_proxy=http://127.0.0.1:8009
	# grab latest stable
	$ wget http://ftp.jaist.ac.jp/pub/apache/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz

	6. untar and move binaries to /usr/local/kafka::

	$ tar xvf kafka_2.11-0.10.0.0.tgz
	$ sudo mv kafka_2.11-0.10.0.0 /usr/local/kafka

	7. Configure Kafka Server::

	# turn on topic delete
	$ vi /usr/local/kafka/config/server.properites

	#>> At end of file add:
	delete.topic.enable = true

	# save and quit

	8. Test Server::

	$ /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
	...
	[2016-08-06 01:22:00,000] INFO [Kafka Server 0], started (kafka.server.KafkaServer)

	.. note::

	This only starts the server temporarily for intial testing, the service should be registered later...

	9. With the kafka sever running, open another session, and create a topic::

	$ /usr/local/kafka/bin/kafka-topics.sh --create --topic topic-test --zookeeper localhost:2181 --partitions 1 --replication-factor 1
	Created topic "topic-test".

	10. List available topics::

	$ /usr/local/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181
	topic-test

	.. note::

	You should see the created, 'topic-test' topic listed.

	11. Send message to topic as a producer via the 'kafka-console-producer.sh'::

	echo "hello world" \| /usr/local/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topic-test

	12. Consume the send message::

	$ /usr/local/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topic-test --from-beginning

	.. note::

	The '--from-beginning' flag given to start a consumer with the earliest message present in the log, rather than the latest message. (see /usr/local/kafka/bin/kafka-console-consumer.sh help for more option details)




	----
	# for install of scala (sbt): http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html
	Spark installation

	1. Install Scala Build Tool (sbt) [Make sure https_proxy is set if needed]::

	1.1 get Scala Build Tool ubuntu repository info::

	wget https://dl.bintray.com/sbt/debian/sbt-0.13.11.deb

	1.2 Install sbt repostory info::

	sudo dpkg -i sbt-0.13.11.deb

	1.3 Update repository info and install 'sbt::

	sudo apt-get update
	sudo apt-get install sbt

	2. download spark binary (Grab latest stable from: http://spark.apache.org/downloads.html)::

	wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.7.tgz

	3. untar and move::

	tar xvf spark-2.0.0-bin-hadoop2.7.tgz
	sudo mv spark-2.0.0-bin-hadoop2.7 /usr/local/spark




	4. Add spark configuraiton to your profile (or appropriate ENV configuration)::

	vi ~/.profile
	(Add the following to .profile)
	# set PATH so it includes user's private bin directories
	PATH="/usr/local/spark/bin:$HOME/bin:$HOME/.local/bin:$PATH"
	export PYSPARK_PYTHON=python3

	5. Apply to current ENV::

	source ~/.profile

	5. Test configuration::

	pyspark

	--> Should open the pyspark console