# referecing: # https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-14-04 # https://chongyaorobin.wordpress.com/2015/07/08/step-by-step-of-install-apache-kafka-on-ubuntu-standalone-mode/
- Add 'kafka' user:
$ sudo useradd kafka -m
- Install Java:
$ sudo apt-get update $ sudo apt-get install default-jre
- Install zookeeper:
$ sudo apt-get install zookeeperd
Note
After the installation completes, ZooKeeper will be started as a daemon automatically. By default, it will listen on port 2181.
- Confirm zookeeper is running on expected port:
$ telnet localhost 2181 Trying ::1... Connected to localhost. Escape character is '^]'. ruok <-- Type at empty prompt! imokConnection closed by foreign host.
Note
if after typing 'ruok' once connected to 'localhost', zookeeper will respond with 'imok' and close the session.
- Download kafka from http://kafka.apache.org/downloads.html:
# with cntlm proxy installed and running if necessary $ export http_proxy=http://127.0.0.1:8009 $ export https_proxy=http://127.0.0.1:8009 # grab latest stable $ wget http://ftp.jaist.ac.jp/pub/apache/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
- untar and move binaries to /usr/local/kafka:
$ tar xvf kafka_2.11-0.10.0.0.tgz $ sudo mv kafka_2.11-0.10.0.0 /usr/local/kafka
- Configure Kafka Server:
# turn on topic delete $ vi /usr/local/kafka/config/server.properites
#>> At end of file add: delete.topic.enable = true
# save and quit
- Test Server:
$ /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties ... [2016-08-06 01:22:00,000] INFO [Kafka Server 0], started (kafka.server.KafkaServer)
Note
This only starts the server temporarily for intial testing, the service should be registered later...
- With the kafka sever running, open another session, and create a topic:
$ /usr/local/kafka/bin/kafka-topics.sh --create --topic topic-test --zookeeper localhost:2181 --partitions 1 --replication-factor 1 Created topic "topic-test".
- List available topics:
$ /usr/local/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181 topic-test
Note
You should see the created, 'topic-test' topic listed.
Send message to topic as a producer via the 'kafka-console-producer.sh':
echo "hello world" | /usr/local/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topic-test
- Consume the send message:
$ /usr/local/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topic-test --from-beginning
Note
The '--from-beginning' flag given to start a consumer with the earliest message present in the log, rather than the latest message. (see /usr/local/kafka/bin/kafka-console-consumer.sh help for more option details)
----# for install of scala (sbt): http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html Spark installation
- Install Scala Build Tool (sbt) [Make sure https_proxy is set if needed]:
1.1 get Scala Build Tool ubuntu repository info:
wget https://dl.bintray.com/sbt/debian/sbt-0.13.11.deb
1.2 Install sbt repostory info:
sudo dpkg -i sbt-0.13.11.deb
1.3 Update repository info and install 'sbt:
sudo apt-get update
sudo apt-get install sbt
- download spark binary (Grab latest stable from: http://spark.apache.org/downloads.html):
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.7.tgz
- untar and move:
tar xvf spark-2.0.0-bin-hadoop2.7.tgz sudo mv spark-2.0.0-bin-hadoop2.7 /usr/local/spark
- Add spark configuraiton to your profile (or appropriate ENV configuration):
vi ~/.profile (Add the following to .profile) # set PATH so it includes user's private bin directories PATH="/usr/local/spark/bin:$HOME/bin:$HOME/.local/bin:$PATH" export PYSPARK_PYTHON=python3
- Apply to current ENV:
source ~/.profile
- Test configuration:
pyspark
--> Should open the pyspark console