- From http://getkudu.io/
- Kudu completes Hadoop's storage layer to enable fast analytics on fast data.
- Distributed Insertable/Updatable columnar store.
- Schema on write.
- Complementing Hadoop/HDFS and HBase.
- Some Hadoop developers join the development.
- Weekly status: https://groups.google.com/forum/#!topic/kudu-user/c6Q8RyNwY8A
- Follow @getkudu and @tlipcon to get the latest information!
- Documantation
- It works!
$ sudo apt-get -y install git autoconf automake libboost-thread-dev curl gcc g++ \
libssl-dev libsasl2-dev libtool ntp
$ sudo apt-get -y install asciidoctor xsltproc
$ git clone http://github.com/cloudera/kudu
$ cd kudu
$ thirdparty/build-if-necessary.sh
$ thirdparty/installed/bin/cmake . -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=/hadoop1/build/opt/kudu
$ make -j4
$ make DESTDIR=/hadoop1/build/opt/kudu install
$ make docs
- http://getkudu.io/docs/installation.html#_build_from_source
- RPM via yum is also available
$ sudo wget http://archive.cloudera.com/beta/kudu/ubuntu/trusty/amd64/kudu/cloudera.list -O /etc/apt/sources.list.d/cloudera.list
$ sudo apt-get update
$ sudo apt-get install kudu # Base Kudu files
$ sudo apt-get install kudu-master # Service scripts for managing kudu-master
$ sudo apt-get install kudu-tserver # Service scripts for managing kudu-tserver
$ sudo apt-get install libkuduclient0 # Kudu C++ client shared library
$ sudo apt-get install libkuduclient-dev # Kudu C++ client SDK
$ sudo service kudu-master start
$ sudo service kudu-tserver start
$ sudo ps aux | grep kudu
kudu 11348 0.1 0.1 455092 15744 ? Sl Nov09 0:22 /usr/lib/kudu/sbin/kudu-master --flagfile=/etc/kudu/conf/master.gflagfile
kudu 11424 0.1 0.0 1016388 6828 ? Sl Nov09 0:22 /usr/lib/kudu/sbin/kudu-tserver --flagfile=/etc/kudu/conf/tserver.gflagfile
- Result
- Checking Web UI
-
C++ client
-
Java Client <- for Hadoop and Spark
-
Directory structure of Java client
.
|-- kudu-client
|-- kudu-client-tools
|-- kudu-csd
|-- kudu-mapreduce
- Loading data to HDFS and run MapReduce jobs http://getkudu.io/docs/quickstart.html
- How to use MR job examples?
$ cd java
$ mvn package -DskipTests
$ cp kudu-client-tools/target/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar $UNDER_HADOOP_CLASSPATH
$ hadoop jar share/hadoop/mapreduce/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar org.kududb.mapreduce.tools.ImportCsv
ERROR: Wrong number of arguments: 0
Usage: importcsv <colAa,colB,colC> <table.name> <input.dir>
Imports the given input directory of CSV data into the specified table.
The column names of the CSV data must be specified in the form of comma-separated column names.
Other options that may be specified with -D include:
-Dimportcsv.skip.bad.lines=false - fail if encountering an invalid line
'-Dimportcsv.separator=|' - eg separate on pipes instead of tabs
-Dimportcsv.job.name=jobName - use the specified mapreduce job name for the import.
Additionally, the following options are available: -Dkudu.operation.timeout.ms=TIME - timeout for read and write operations, defaults to 10000
-Dkudu.admin.operation.timeout.ms=TIME - timeout for admin operations , defaults to 10000
-Dkudu.socket.read.timeout.ms=TIME - timeout for socket reads , defaults to 5000
-Dkudu.master.addresses=ADDRESSES - addresses to reach the Masters, defaults to 127.0.0.1 which is usually wrong.
-D kudu.num.replicas=NUM - number of replicas to use when configuring a new table, defaults to 3
$ hadoop jar share/hadoop/mapreduce/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar org.kududb.mapreduce.toolstCsv "1,2,3" test1 hdfs://127.0.0.1:50070//user/ubuntu/csvdata/MonthlyPassengerData_200507_to_201506.csv
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ubuntu/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/hadoop/share/hadoop/mapreduce/kudu-client-tools-0.6.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/tez/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/11/10 03:22:41 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/
15/11/10 03:22:41 INFO client.RMProxy: Connecting to ResourceManager at /172.31.15.42:8081
15/11/10 03:22:41 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200
15/11/10 03:22:41 INFO client.AsyncKuduClient: Discovered tablet Kudu Master for table Kudu Master with partition ["", "")
Exception in thread "main" java.lang.RuntimeException: Could not obtain the table from the master, is the master running and is this table created? tablename=test1 and master address= 127.0.0.1
at org.kududb.mapreduce.KuduTableOutputFormat.setConf(KuduTableOutputFormat.java:114)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:559)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1324)
at org.kududb.mapreduce.tools.ImportCsv.run(ImportCsv.java:110)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.kududb.mapreduce.tools.ImportCsv.main(ImportCsv.java:114)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
-
How can we create table?
-
Currently, CREATE TABLE can be done via impala-shell
-
In a example of "Running from Spark", it still CREATE TABLE with Impala commands
-
A doc in example is good one.
-
We can add it since it's open source!
- Opened KUDU-1258