Skip to content

Instantly share code, notes, and snippets.

@oza oza/LT1.md
Last active Oct 18, 2017

Embed
What would you like to do?
Running Kudu with MapReduce framework (Lightning talk in Cloudera World Tokyo)

Kudu

What's Kudu?

  • From http://getkudu.io/
    • Kudu completes Hadoop's storage layer to enable fast analytics on fast data.
    • Distributed Insertable/Updatable columnar store.
    • Schema on write.
      • Complementing Hadoop/HDFS and HBase.

Community

Build from source

$ sudo apt-get -y install git autoconf automake libboost-thread-dev curl gcc g++ \
  libssl-dev libsasl2-dev libtool ntp
$ sudo apt-get -y install asciidoctor xsltproc
$ git clone http://github.com/cloudera/kudu
$ cd kudu
$ thirdparty/build-if-necessary.sh
$ thirdparty/installed/bin/cmake . -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=/hadoop1/build/opt/kudu
$ make -j4
$ make DESTDIR=/hadoop1/build/opt/kudu install
$ make docs

Installing Kudo from deb package

$ sudo wget http://archive.cloudera.com/beta/kudu/ubuntu/trusty/amd64/kudu/cloudera.list -O /etc/apt/sources.list.d/cloudera.list
$ sudo apt-get update
$ sudo apt-get install kudu                     # Base Kudu files
$ sudo apt-get install kudu-master              # Service scripts for managing kudu-master
$ sudo apt-get install kudu-tserver             # Service scripts for managing kudu-tserver
$ sudo apt-get install libkuduclient0           # Kudu C++ client shared library
$ sudo apt-get install libkuduclient-dev       # Kudu C++ client SDK

Running Kudu daemons

$ sudo service kudu-master start
$ sudo service kudu-tserver start
$ sudo ps aux | grep kudu
kudu     11348  0.1  0.1 455092 15744 ?        Sl   Nov09   0:22 /usr/lib/kudu/sbin/kudu-master --flagfile=/etc/kudu/conf/master.gflagfile
kudu     11424  0.1  0.0 1016388 6828 ?        Sl   Nov09   0:22 /usr/lib/kudu/sbin/kudu-tserver --flagfile=/etc/kudu/conf/tserver.gflagfile

Writing Client applications

.
|-- kudu-client
|-- kudu-client-tools
|-- kudu-csd
|-- kudu-mapreduce
$ cd java
$ mvn package -DskipTests
$ cp kudu-client-tools/target/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar $UNDER_HADOOP_CLASSPATH
$ hadoop jar share/hadoop/mapreduce/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar org.kududb.mapreduce.tools.ImportCsv
ERROR: Wrong number of arguments: 0
Usage: importcsv <colAa,colB,colC> <table.name> <input.dir>

Imports the given input directory of CSV data into the specified table.

The column names of the CSV data must be specified in the form of comma-separated column names.
Other options that may be specified with -D include:
  -Dimportcsv.skip.bad.lines=false - fail if encountering an invalid line
  '-Dimportcsv.separator=|' - eg separate on pipes instead of tabs
  -Dimportcsv.job.name=jobName - use the specified mapreduce job name for the import.

Additionally, the following options are available:  -Dkudu.operation.timeout.ms=TIME - timeout for read and write operations, defaults to 10000 
  -Dkudu.admin.operation.timeout.ms=TIME - timeout for admin operations , defaults to 10000 
  -Dkudu.socket.read.timeout.ms=TIME - timeout for socket reads , defaults to 5000 
  -Dkudu.master.addresses=ADDRESSES - addresses to reach the Masters, defaults to 127.0.0.1 which is usually wrong.
  -D kudu.num.replicas=NUM - number of replicas to use when configuring a new table, defaults to 3
$ hadoop jar share/hadoop/mapreduce/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar org.kududb.mapreduce.toolstCsv "1,2,3" test1 hdfs://127.0.0.1:50070//user/ubuntu/csvdata/MonthlyPassengerData_200507_to_201506.csv
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ubuntu/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/hadoop/share/hadoop/mapreduce/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/tez/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/11/10 03:22:41 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/
15/11/10 03:22:41 INFO client.RMProxy: Connecting to ResourceManager at /172.31.15.42:8081
15/11/10 03:22:41 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200
15/11/10 03:22:41 INFO client.AsyncKuduClient: Discovered tablet Kudu Master for table Kudu Master with partition ["", "")
Exception in thread "main" java.lang.RuntimeException: Could not obtain the table from the master, is the master running and is this table created? tablename=test1 and master address= 127.0.0.1
	at org.kududb.mapreduce.KuduTableOutputFormat.setConf(KuduTableOutputFormat.java:114)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
	at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:559)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1324)
	at org.kududb.mapreduce.tools.ImportCsv.run(ImportCsv.java:110)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at org.kududb.mapreduce.tools.ImportCsv.main(ImportCsv.java:114)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
  • How can we create table?

  • Currently, CREATE TABLE can be done via impala-shell

  • impala-kudu

  • In a example of "Running from Spark", it still CREATE TABLE with Impala commands

  • A doc in example is good one.

  • We can add it since it's open source!

Links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.