Skip to content

Instantly share code, notes, and snippets.

Last active Oct 18, 2017
What would you like to do?
Running Kudu with MapReduce framework (Lightning talk in Cloudera World Tokyo)


What's Kudu?

  • From
    • Kudu completes Hadoop's storage layer to enable fast analytics on fast data.
    • Distributed Insertable/Updatable columnar store.
    • Schema on write.
      • Complementing Hadoop/HDFS and HBase.


Build from source

$ sudo apt-get -y install git autoconf automake libboost-thread-dev curl gcc g++ \
  libssl-dev libsasl2-dev libtool ntp
$ sudo apt-get -y install asciidoctor xsltproc
$ git clone
$ cd kudu
$ thirdparty/
$ thirdparty/installed/bin/cmake . -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=/hadoop1/build/opt/kudu
$ make -j4
$ make DESTDIR=/hadoop1/build/opt/kudu install
$ make docs

Installing Kudo from deb package

$ sudo wget -O /etc/apt/sources.list.d/cloudera.list
$ sudo apt-get update
$ sudo apt-get install kudu                     # Base Kudu files
$ sudo apt-get install kudu-master              # Service scripts for managing kudu-master
$ sudo apt-get install kudu-tserver             # Service scripts for managing kudu-tserver
$ sudo apt-get install libkuduclient0           # Kudu C++ client shared library
$ sudo apt-get install libkuduclient-dev       # Kudu C++ client SDK

Running Kudu daemons

$ sudo service kudu-master start
$ sudo service kudu-tserver start
$ sudo ps aux | grep kudu
kudu     11348  0.1  0.1 455092 15744 ?        Sl   Nov09   0:22 /usr/lib/kudu/sbin/kudu-master --flagfile=/etc/kudu/conf/master.gflagfile
kudu     11424  0.1  0.0 1016388 6828 ?        Sl   Nov09   0:22 /usr/lib/kudu/sbin/kudu-tserver --flagfile=/etc/kudu/conf/tserver.gflagfile

Writing Client applications

|-- kudu-client
|-- kudu-client-tools
|-- kudu-csd
|-- kudu-mapreduce
$ cd java
$ mvn package -DskipTests
$ cp kudu-client-tools/target/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar $UNDER_HADOOP_CLASSPATH
$ hadoop jar share/hadoop/mapreduce/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar
ERROR: Wrong number of arguments: 0
Usage: importcsv <colAa,colB,colC> <> <input.dir>

Imports the given input directory of CSV data into the specified table.

The column names of the CSV data must be specified in the form of comma-separated column names.
Other options that may be specified with -D include:
  -Dimportcsv.skip.bad.lines=false - fail if encountering an invalid line
  '-Dimportcsv.separator=|' - eg separate on pipes instead of tabs - use the specified mapreduce job name for the import.

Additionally, the following options are available: - timeout for read and write operations, defaults to 10000 - timeout for admin operations , defaults to 10000 - timeout for socket reads , defaults to 5000 
  -Dkudu.master.addresses=ADDRESSES - addresses to reach the Masters, defaults to which is usually wrong.
  -D kudu.num.replicas=NUM - number of replicas to use when configuring a new table, defaults to 3
$ hadoop jar share/hadoop/mapreduce/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar org.kududb.mapreduce.toolstCsv "1,2,3" test1 hdfs://
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ubuntu/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/hadoop/share/hadoop/mapreduce/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/tez/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/11/10 03:22:41 INFO impl.TimelineClientImpl: Timeline service address:
15/11/10 03:22:41 INFO client.RMProxy: Connecting to ResourceManager at /
15/11/10 03:22:41 INFO client.AHSProxy: Connecting to Application History server at /
15/11/10 03:22:41 INFO client.AsyncKuduClient: Discovered tablet Kudu Master for table Kudu Master with partition ["", "")
Exception in thread "main" java.lang.RuntimeException: Could not obtain the table from the master, is the master running and is this table created? tablename=test1 and master address=
	at org.kududb.mapreduce.KuduTableOutputFormat.setConf(
	at org.apache.hadoop.util.ReflectionUtils.setConf(
	at org.apache.hadoop.util.ReflectionUtils.newInstance(
	at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(
	at org.apache.hadoop.mapreduce.Job$
	at org.apache.hadoop.mapreduce.Job$
	at Method)
	at org.apache.hadoop.mapreduce.Job.submit(
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at java.lang.reflect.Method.invoke(
	at org.apache.hadoop.util.RunJar.main(
  • How can we create table?

  • Currently, CREATE TABLE can be done via impala-shell

  • impala-kudu

  • In a example of "Running from Spark", it still CREATE TABLE with Impala commands

  • A doc in example is good one.

  • We can add it since it's open source!


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment