Skip to content

Instantly share code, notes, and snippets.

Last active October 18, 2017 12:51
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save oza/6fdd9cfd548b74526d32 to your computer and use it in GitHub Desktop.
Save oza/6fdd9cfd548b74526d32 to your computer and use it in GitHub Desktop.
Running Kudu with MapReduce framework (Lightning talk in Cloudera World Tokyo)


What's Kudu?

  • From
    • Kudu completes Hadoop's storage layer to enable fast analytics on fast data.
    • Distributed Insertable/Updatable columnar store.
    • Schema on write.
      • Complementing Hadoop/HDFS and HBase.


Build from source

$ sudo apt-get -y install git autoconf automake libboost-thread-dev curl gcc g++ \
  libssl-dev libsasl2-dev libtool ntp
$ sudo apt-get -y install asciidoctor xsltproc
$ git clone
$ cd kudu
$ thirdparty/
$ thirdparty/installed/bin/cmake . -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=/hadoop1/build/opt/kudu
$ make -j4
$ make DESTDIR=/hadoop1/build/opt/kudu install
$ make docs

Installing Kudo from deb package

$ sudo wget -O /etc/apt/sources.list.d/cloudera.list
$ sudo apt-get update
$ sudo apt-get install kudu                     # Base Kudu files
$ sudo apt-get install kudu-master              # Service scripts for managing kudu-master
$ sudo apt-get install kudu-tserver             # Service scripts for managing kudu-tserver
$ sudo apt-get install libkuduclient0           # Kudu C++ client shared library
$ sudo apt-get install libkuduclient-dev       # Kudu C++ client SDK

Running Kudu daemons

$ sudo service kudu-master start
$ sudo service kudu-tserver start
$ sudo ps aux | grep kudu
kudu     11348  0.1  0.1 455092 15744 ?        Sl   Nov09   0:22 /usr/lib/kudu/sbin/kudu-master --flagfile=/etc/kudu/conf/master.gflagfile
kudu     11424  0.1  0.0 1016388 6828 ?        Sl   Nov09   0:22 /usr/lib/kudu/sbin/kudu-tserver --flagfile=/etc/kudu/conf/tserver.gflagfile

Writing Client applications

|-- kudu-client
|-- kudu-client-tools
|-- kudu-csd
|-- kudu-mapreduce
$ cd java
$ mvn package -DskipTests
$ cp kudu-client-tools/target/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar $UNDER_HADOOP_CLASSPATH
$ hadoop jar share/hadoop/mapreduce/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar
ERROR: Wrong number of arguments: 0
Usage: importcsv <colAa,colB,colC> <> <input.dir>

Imports the given input directory of CSV data into the specified table.

The column names of the CSV data must be specified in the form of comma-separated column names.
Other options that may be specified with -D include:
  -Dimportcsv.skip.bad.lines=false - fail if encountering an invalid line
  '-Dimportcsv.separator=|' - eg separate on pipes instead of tabs - use the specified mapreduce job name for the import.

Additionally, the following options are available: - timeout for read and write operations, defaults to 10000 - timeout for admin operations , defaults to 10000 - timeout for socket reads , defaults to 5000 
  -Dkudu.master.addresses=ADDRESSES - addresses to reach the Masters, defaults to which is usually wrong.
  -D kudu.num.replicas=NUM - number of replicas to use when configuring a new table, defaults to 3
$ hadoop jar share/hadoop/mapreduce/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar org.kududb.mapreduce.toolstCsv "1,2,3" test1 hdfs://
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ubuntu/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/hadoop/share/hadoop/mapreduce/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/tez/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/11/10 03:22:41 INFO impl.TimelineClientImpl: Timeline service address:
15/11/10 03:22:41 INFO client.RMProxy: Connecting to ResourceManager at /
15/11/10 03:22:41 INFO client.AHSProxy: Connecting to Application History server at /
15/11/10 03:22:41 INFO client.AsyncKuduClient: Discovered tablet Kudu Master for table Kudu Master with partition ["", "")
Exception in thread "main" java.lang.RuntimeException: Could not obtain the table from the master, is the master running and is this table created? tablename=test1 and master address=
	at org.kududb.mapreduce.KuduTableOutputFormat.setConf(
	at org.apache.hadoop.util.ReflectionUtils.setConf(
	at org.apache.hadoop.util.ReflectionUtils.newInstance(
	at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(
	at org.apache.hadoop.mapreduce.Job$
	at org.apache.hadoop.mapreduce.Job$
	at Method)
	at org.apache.hadoop.mapreduce.Job.submit(
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at java.lang.reflect.Method.invoke(
	at org.apache.hadoop.util.RunJar.main(
  • How can we create table?

  • Currently, CREATE TABLE can be done via impala-shell

  • impala-kudu

  • In a example of "Running from Spark", it still CREATE TABLE with Impala commands

  • A doc in example is good one.

  • We can add it since it's open source!


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment