monkut/kafka-ubuntu16.04-install.rst

## kafka-ubuntu16.04-install.rst

      
    Raw
  

              kafka-ubuntu16.04-install.rst
            
          
    # referecing: # https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-14-04 # https://chongyaorobin.wordpress.com/2015/07/08/step-by-step-of-install-apache-kafka-on-ubuntu-standalone-mode/

Add 'kafka' user:


$ sudo useradd kafka -m


Install Java:


$ sudo apt-get update $ sudo apt-get install default-jre


Install zookeeper:


$ sudo apt-get install zookeeperd


Note

After the installation completes, ZooKeeper will be started as a daemon automatically. By default, it will listen on port 2181.


Confirm zookeeper is running on expected port:


$ telnet localhost 2181 Trying ::1... Connected to localhost. Escape character is '^]'. ruok <-- Type at empty prompt! imokConnection closed by foreign host.


Note

if after typing 'ruok' once connected to 'localhost', zookeeper will respond with 'imok' and close the session.


Download kafka from http://kafka.apache.org/downloads.html:


# with cntlm proxy installed and running if necessary $ export http_proxy=http://127.0.0.1:8009 $ export https_proxy=http://127.0.0.1:8009 # grab latest stable $ wget http://ftp.jaist.ac.jp/pub/apache/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz


untar and move binaries to /usr/local/kafka:


$ tar xvf kafka_2.11-0.10.0.0.tgz $ sudo mv kafka_2.11-0.10.0.0 /usr/local/kafka


Configure Kafka Server:


# turn on topic delete $ vi /usr/local/kafka/config/server.properites
#>> At end of file add: delete.topic.enable = true
# save and quit


Test Server:


$ /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties ... [2016-08-06 01:22:00,000] INFO [Kafka Server 0], started (kafka.server.KafkaServer)


Note

This only starts the server temporarily for intial testing, the service should be registered later...


With the kafka sever running, open another session, and create a topic:


$ /usr/local/kafka/bin/kafka-topics.sh --create --topic topic-test --zookeeper localhost:2181 --partitions 1 --replication-factor 1 Created topic "topic-test".


List available topics:


$ /usr/local/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181 topic-test


Note

You should see the created, 'topic-test' topic listed.


Send message to topic as a producer via the 'kafka-console-producer.sh':
echo "hello world" | /usr/local/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topic-test
Consume the send message:


$ /usr/local/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topic-test --from-beginning


Note

The '--from-beginning' flag given to start a consumer with the earliest message present in the log, rather than the latest message. (see /usr/local/kafka/bin/kafka-console-consumer.sh help for more option details)


----# for install of scala (sbt): http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html Spark installation

Install Scala Build Tool (sbt) [Make sure https_proxy is set if needed]:

1.1 get Scala Build Tool ubuntu repository info:
wget https://dl.bintray.com/sbt/debian/sbt-0.13.11.deb
1.2 Install sbt repostory info:
sudo dpkg -i sbt-0.13.11.deb
1.3 Update repository info and install 'sbt:
sudo apt-get update
sudo apt-get install sbt

download spark binary (Grab latest stable from: http://spark.apache.org/downloads.html):


wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.7.tgz


untar and move:


tar xvf spark-2.0.0-bin-hadoop2.7.tgz sudo mv spark-2.0.0-bin-hadoop2.7 /usr/local/spark


Add spark configuraiton to your profile (or appropriate ENV configuration):


vi ~/.profile (Add the following to .profile) # set PATH so it includes user's private bin directories PATH="/usr/local/spark/bin:$HOME/bin:$HOME/.local/bin:$PATH" export PYSPARK_PYTHON=python3


Apply to current ENV:


source ~/.profile


Test configuration:


pyspark
--> Should open the pyspark console