Skip to content

Instantly share code, notes, and snippets.

@7c00
Created June 16, 2022 10:19
Show Gist options
  • Save 7c00/80aaf35049311f6952a2740c45a8c362 to your computer and use it in GitHub Desktop.
Save 7c00/80aaf35049311f6952a2740c45a8c362 to your computer and use it in GitHub Desktop.
How to set up a dev environment for hudi connector

How to set up a dev environment for hudi connector

Set up a hadoop cluster with hudi tables

Follow the doc https://hudi.apache.org/docs/docker_demo. It's sufficient to finish Step 3: Sync with Hive and then continue to next sections.

Build and launch local presto server

Follow the doc https://github.com/prestodb/presto/blob/master/README.md to build and launch your local presto server.

Configure presto server to access hudi tables

First, copy hadoop configuration files from containers to your local machine; eg:

mkdir -p /tmp/hadoop
docker cp adhoc-1:/etc/hadoop/core-site.xml /tmp/hadoop/core-site.xml
docker cp adhoc-1:/etc/hadoop/hdfs-site.xml /tmp/hadoop/hdfs-site.xml

Secondly, update hudi connector configuration (presto-main/etc/catalog/hudi.properties); eg:

connector.name=hudi
hive.config.resources=/tmp/hadoop/core-site.xml,/tmp/hadoop/hdfs-site.xml
hive.metastore.uri=thrift://localhost:9083

Then relaunch your local presto server. Now, you can use presto-cli to access hudi tables:

$ presto-cli/target/presto-cli-*-executable.jar --catalog hudi --schema default
presto:default> show tables;
...
presto:default> select * from stock_ticks_cow;
...

Tips

Use remote machine

You can deploy the hadoop cluster on a remote machine in case you do not have a local machine with enough memory. If so, update your local /etc/hosts by adding

127.0.0.1   namenode
127.0.0.1   datanode1
127.0.0.1   hivemetastore

and then create a ssh channel (let's suppose your remote machine ip is 10.20.30.40)

ssh -N -L 9083:10.20.30.40:9083 -L 8020:10.20.30.40:8020 -L 50010:10.20.30.40:50010 10.20.30.40

Fix readDirect error

If you meet any error like readDirect unsupported in RemoteBlockReader, insert the configuration below to your local /tmp/hadoop/hdfs-site.xml:

<property><name>dfs.client.use.legacy.blockreader</name><value>false</value></property>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment