Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save micrub/423bed2699fd0e7edd4c1b5fd6bba365 to your computer and use it in GitHub Desktop.
Save micrub/423bed2699fd0e7edd4c1b5fd6bba365 to your computer and use it in GitHub Desktop.
kafka installation on ubuntu 16.04
# referecing:
# https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-14-04
# https://chongyaorobin.wordpress.com/2015/07/08/step-by-step-of-install-apache-kafka-on-ubuntu-standalone-mode/
1. Add 'kafka' user::
$ sudo useradd kafka -m
2. Install Java::
$ sudo apt-get update
$ sudo apt-get install default-jre
3. Install zookeeper::
$ sudo apt-get install zookeeperd
.. note::
After the installation completes, ZooKeeper will be started as a daemon automatically. By default, it will listen on port 2181.
4. Confirm zookeeper is running on expected port::
$ telnet localhost 2181
Trying ::1...
Connected to localhost.
Escape character is '^]'.
ruok <-- Type at empty prompt!
imokConnection closed by foreign host.
.. note::
if after typing 'ruok' once connected to 'localhost', zookeeper will respond with 'imok' and close the session.
5. Download kafka from http://kafka.apache.org/downloads.html::
# with cntlm proxy installed and running if necessary
$ export http_proxy=http://127.0.0.1:8009
$ export https_proxy=http://127.0.0.1:8009
# grab latest stable
$ wget http://ftp.jaist.ac.jp/pub/apache/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
6. untar and move binaries to /usr/local/kafka::
$ tar xvf kafka_2.11-0.10.0.0.tgz
$ sudo mv kafka_2.11-0.10.0.0 /usr/local/kafka
7. Configure Kafka Server::
# turn on topic delete
$ vi /usr/local/kafka/config/server.properites
#>> At end of file add:
delete.topic.enable = true
# save and quit
8. Test Server::
$ /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
...
[2016-08-06 01:22:00,000] INFO [Kafka Server 0], started (kafka.server.KafkaServer)
.. note::
This only starts the server temporarily for intial testing, the service should be registered later...
9. With the kafka sever running, open another session, and create a topic::
$ /usr/local/kafka/bin/kafka-topics.sh --create --topic topic-test --zookeeper localhost:2181 --partitions 1 --replication-factor 1
Created topic "topic-test".
10. List available topics::
$ /usr/local/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181
topic-test
.. note::
You should see the created, 'topic-test' topic listed.
11. Send message to topic as a producer via the 'kafka-console-producer.sh'::
echo "hello world" | /usr/local/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topic-test
12. *Consume* the send message::
$ /usr/local/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topic-test --from-beginning
.. note::
The '--from-beginning' flag given to start a consumer with the earliest message present in the log, rather than the latest message. (see */usr/local/kafka/bin/kafka-console-consumer.sh* help for more option details)
----
# for install of scala (sbt): http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html
Spark installation
1. Install Scala Build Tool (sbt) [Make sure https_proxy is set if needed]::
1.1 get Scala Build Tool ubuntu repository info::
wget https://dl.bintray.com/sbt/debian/sbt-0.13.11.deb
1.2 Install sbt repostory info::
sudo dpkg -i sbt-0.13.11.deb
1.3 Update repository info and install 'sbt::
sudo apt-get update
sudo apt-get install sbt
2. download spark binary (Grab latest stable from: http://spark.apache.org/downloads.html)::
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.7.tgz
3. untar and move::
tar xvf spark-2.0.0-bin-hadoop2.7.tgz
sudo mv spark-2.0.0-bin-hadoop2.7 /usr/local/spark
4. Add spark configuraiton to your profile (or appropriate ENV configuration)::
vi ~/.profile
(Add the following to .profile)
# set PATH so it includes user's private bin directories
PATH="/usr/local/spark/bin:$HOME/bin:$HOME/.local/bin:$PATH"
export PYSPARK_PYTHON=python3
5. Apply to current ENV::
source ~/.profile
5. Test configuration::
pyspark
--> Should open the pyspark console
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment