Skip to content

Instantly share code, notes, and snippets.

@patrickwmcgee
Last active August 29, 2015 14:25
Show Gist options
  • Save patrickwmcgee/d4487b95c9653d5f181e to your computer and use it in GitHub Desktop.
Save patrickwmcgee/d4487b95c9653d5f181e to your computer and use it in GitHub Desktop.
Cascalog Workflow example
(defn -main
[arg]
(workflow ["/tmp/workflow"]
read-data ([:tmp-dirs [data-path]]
(import-data path1 path2))
work-step ([:deps :all]
(let [data (hfs-seqfile data-path)]
(?- (hfs-textline output-path-1 :sinkmode :replace) (query1 data)
(hfs-textline output-path-2 :sinkmode :replace) (query2 data))))))
@patrickwmcgee
Copy link
Author

In cascalog this is how you describe a workflow and to make a temp directory or a specific checkpoint you do something along the lines of [:tmp-dirs [data-path]] Under the hood I'm assuming this uses some hadoop api to create the file locally and handle it. This works seamlessly locally and in production. I'm wondering what the equivalent would be for doing the same in cascading.

String tempPath = ???(what should go here)

Tap checkpointTap = new Hfs( new TextDelimited( true, "\t" ), checkpointPath, SinkMode.REPLACE);

Checkpoint checkpoint = new Checkpoint("checkpoint", aPreviouslyDeclaredPipe);

Then within the FlowDef of the job add in .addCheckpoint(checkpoint, checkpointTap)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment