Skip to content

Instantly share code, notes, and snippets.

@rmoff
Created July 20, 2016 16:01
Show Gist options
  • Save rmoff/40197c9e30aac5a4cdd208dc14aeabec to your computer and use it in GitHub Desktop.
Save rmoff/40197c9e30aac5a4cdd208dc14aeabec to your computer and use it in GitHub Desktop.
Flume config : Kafka -> HDFS
Make sure HDFS libraries are available in CLASSPATH, otherwise this won't work.
flume-ng agent --name target_agent --conf /opt/apache-flume-1.6.0-bin/conf/ --conf-file /home/rmoff/config/flume-kafka-twitter.conf
With debug:
flume-ng agent --name target_agent --conf /opt/apache-flume-1.6.0-bin/conf/ --conf-file /home/rmoff/config/flume-kafka-twitter.conf -Dflume.root.logger=DEBUG,console
target_agent.sources = kafkaSource
target_agent.channels = memoryChannel
target_agent.sinks = hdfsSink
target_agent.sources.kafkaSource.type = org.apache.flume.source.kafka.KafkaSource
target_agent.sources.kafkaSource.zookeeperConnect = confluent-02:2181
target_agent.sources.kafkaSource.topic = twitter
target_agent.sources.kafkaSource.channels = memoryChannel
target_agent.sources.kafkaSource.groupId = confluent-02-flume-01
# http://flume.apache.org/FlumeUserGuide.html#memory-channel
target_agent.channels.memoryChannel.type = memory
target_agent.channels.memoryChannel.capacity = 100
## Write to HDFS
#http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
target_agent.sinks.hdfsSink.type = hdfs
target_agent.sinks.hdfsSink.channel = memoryChannel
target_agent.sinks.hdfsSink.hdfs.path = /user/rmoff/incoming/twitter/%Y/%m/%d/
target_agent.sinks.hdfsSink.hdfs.fileType = DataStream
target_agent.sinks.hdfsSink.hdfs.writeFormat = Text
target_agent.sinks.hdfsSink.hdfs.rollSize = 1024
target_agent.sinks.hdfsSink.hdfs.rollInterval = 0
target_agent.sinks.hdfsSink.hdfs.rollCount = 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment