Skip to content

Instantly share code, notes, and snippets.

@Tarrasch
Created August 29, 2014 08:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Tarrasch/5e5b0fa0e0f242ece9bb to your computer and use it in GitHub Desktop.
Save Tarrasch/5e5b0fa0e0f242ece9bb to your computer and use it in GitHub Desktop.
Currently, our skeleton doesn't instantiate everything
import datetime
import luigi
from spotify.luigi.crunch import ScrubJobTask, load_avsc
from spotify.luigi import HdfsTarget
from spotify.luigi.external_shrek_anonym import CreateEndSongCleaned
class SampleEndSongSubset(luigi.ExternalTask):
def output(self):
return HdfsTarget("/user/spotify-analytics-data/examples/data_pipeline_crunch/stream_count_anonym")
class Example1StreamCountJob(ScrubJobTask):
"""
You can run this example from maven artifact:
> greaserun --runner luigi com.spotify.data:spotify-data-crunch:LATEST --module stream_count --task Example1StreamCountJob
or using your local build (uploaded to your edgenode):
> greaserun --runner luigi myartifaaaaaact-0.1.2.3.4.5-jar-with-dependencies.jar --module stream_count --task Example1StreamCountJob
"""
def main_class(self):
return "mygrooooooooooooooooup.pipeline.Example1StreamCountJob"
def requires(self):
return {
"input": SampleEndSongSubset()
}
def output(self):
return HdfsTarget('stream_count', schema=load_avsc("ExamplePlaysByCountry.avsc"))
@Tarrasch
Copy link
Author

So this is the file that get instantiated, I picked extra silly artifact and group ids to make it clear :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment