Skip to content

Instantly share code, notes, and snippets.

@gphat
Created October 11, 2011 13:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gphat/1278030 to your computer and use it in GitHub Desktop.
Save gphat/1278030 to your computer and use it in GitHub Desktop.
# Storm Multi-Language Support
## The ShellBolt
Support for multiple languages is implemented via the ShellBolt class. This
class implements the IBolt interfaces and contains the facilities for
executing a script or program via the shell using Java's ProcessBuilder class.
## The Wrapper Class
You'll need to create a Java class that wraps your script and declares the fields
involved. You can learn more about this https://github.com/nathanmarz/storm/wiki/Concepts
## Protocol Preamble
A simple protocol is implemented via the STDIN and STDOUT of the executed
script or program. A mix of simple strings and JSON encoded data are exchanged
with the process making support possible for pretty much any language.
# Packaging Your Stuff
To run a ShellBolt on a cluster, the scripts that are shelled out to must be
in the resources directory within the jar submitted to the master.
However, During development or testing on a local machine, the resources
directory just needs to be on the classpath. It does not need to be contained
in the jar you create.
## The Protocol
Notes:
* Both ends of this protocol use a line-reading mechanism, so be sure to
trim off newlines from the input and to append them to your output.
* All inputs will be terminated by a single line contained "end".
* The bullet points below are written from the perspective of the script writer's
STDIN and STDOUT.
* Your script will be executed by the Bolt.
* STDIN: A string representing a path. This is a PID directory.
Your script should create an empty file named with it's pid in this directory. e.g.
the PID is 1234, so an empty file named 1234 is created in the directory. This is
file lets the supervisor know the PID, as it's only returned via this protocol for logging.
* STDOUT: Your PID. This is not JSON encoded, just a string.
* STDIN: (JSON) The Storm configuration. Various settings and properties.
* STDIN: (JSON) The Topology context
* STDIN: A tuple! This is a JSON encoded structure like this:
{
// The tuple's id
"id": -6955786537413359385,
// The id of the component that created this tuple
"comp": 1,
// The id of the stream this tuple was emitted to
"stream": 1,
// The id of the task that created this tuple
"task": 9,
// All the values in this tuple
"tuple": ["snow white and the seven dwarfs"]
}
* STDOUT: The results of your bolt. XXX Is this JSON encoded?
* STDOUT: sync or end XXX which one and why?
### sync
Note: This command is not JSON encoded, it is sent as a simple string.
This lets the parent bolt know that the script has finished processing and
is ready for another tuple.
### end
Note: This command is not JSON encoded, it is sent as a simple string.
This should be sent after any of the commands below, as it delimits messages.
## Commands
Commands are JSON encoded instructions sent back from the script.
### ack
Acknowledge a tuple. Acking a tuple lets Storm know that you have processed it.
### emit
Emit
### fail
Fail a tuple. Failing a tuple will cause Storm to consider it unprocessed.
### log
The command allows you to send information back to Storm for logging. This
has nothing to do with tuple processing and is purely informational.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment