Skip to content

Instantly share code, notes, and snippets.

@yadudoc
Last active August 29, 2015 13:56
Show Gist options
  • Save yadudoc/8829706 to your computer and use it in GitHub Desktop.
Save yadudoc/8829706 to your computer and use it in GitHub Desktop.
Swift on Hadoop

Swift on Hadoop

The purpose of this guide is to help you setup swift on a hadoop cluster. This would let you execute swiftscripts on the cluster allowing you to go beyond the limitations of MapReduce style workflows.

In current configurations Swift runs in persistent coasters mode on Hadoop. What this means is that, there are two separate Swift entities required to execute a swift run on a hadoop cluster, the Swift-coaster-service which deals with managing workers on the Hadoop cluster and the swift runtime which executes swiftscripts via the Swift-coaster-service.

Get started.

  1. TODO: Setup git repo with KLab app as example
  2. TODO: Move generating fake data to script.
  3. TODO: start_hadoop_workers.sh to check for swift in path and report failure.

Troubleshooting

Here's what would be helpful in debugging:

  1. The stdout from executing the swiftscript.
  2. The latest your-app-name-TIMESTAMP.log generated by swift in the folder you execute run-swift.sh
  3. The stdout from the start-hadoop-workers.sh script.
  4. The coaster service logs and cps-TIMESTAMP.log in the hadoop_coasters folder
  5. start_hadoop_workers.sh for settings used.

NOTE: Remeber that the workers are under the mercy of hadoop, and we currently do not get logs back from workers.

If none of these work - Mail swift-user@ci.uchicago.edu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment