yadudoc/swift_hadoop_guide.md

## swift_hadoop_guide.md

      
    Raw
  

              swift_hadoop_guide.md
            
          
    Swift on Hadoop

The purpose of this guide is to help you setup swift on a hadoop cluster. This would let you execute swiftscripts on the cluster allowing you to go beyond the limitations of MapReduce style workflows.
In current configurations Swift runs in persistent coasters mode on Hadoop. What this means is that, there are two separate Swift entities required to execute a swift run on a hadoop cluster, the Swift-coaster-service which deals with managing workers on the Hadoop cluster and the swift runtime which executes swiftscripts via the Swift-coaster-service.
Get started.


TODO: Setup git repo with KLab app as example
TODO: Move generating fake data to script.
TODO: start_hadoop_workers.sh to check for swift in path and report failure.

Troubleshooting

Here's what would be helpful in debugging:

The stdout from executing the swiftscript.
The latest your-app-name-TIMESTAMP.log generated by swift in the folder you execute run-swift.sh
The stdout from the start-hadoop-workers.sh script.
The coaster service logs and cps-TIMESTAMP.log in the hadoop_coasters folder
start_hadoop_workers.sh for settings used.

NOTE: Remeber that the workers are under the mercy of hadoop, and we currently do not get logs back from workers.
If none of these work - Mail swift-user@ci.uchicago.edu