Skip to content

Instantly share code, notes, and snippets.

@wrgoldstein
Created September 19, 2015 21:22
Show Gist options
  • Save wrgoldstein/df68d549814f2307bbea to your computer and use it in GitHub Desktop.
Save wrgoldstein/df68d549814f2307bbea to your computer and use it in GitHub Desktop.
Pseudo-summary of Hadley Wickham talk on pipeable data in R. I've been generally interested in trying to find clearer patterns for describing data pipelines
# Say we want to tell a story like the following:
"the bunny Foofoo went to the forest and ate a mouse"
# we build up the pieces to tell the story:
def the_bunny(name)
"The bunny #{name}"
end
def went_to_the_forest(object)
"#{object} went to the forest"
end
def and_ate_some_grass(object)
"#{object} and ate some grass"
end
# and then what? some choices.
# use nested function calls:
story = and_ate_some_grass(went_to_the_forest(the_bunny('Foofoo')))
# this is hard to read. what if we broke it out?
# use separate variables for each state:
the_named_bunny = the_bunny('Foofoo')
with_subject = went_to_the_forest(the_named_bunny)
story = and_ate_some_grass(with_subject)
# not much better. the variable names are either redundant with the
# method names or non descriptive
# use one variable to hold the story as it builds:
story = the_bunny('Foofoo')
story = went_to_the_forest(story)
story = and_ate_some_grass(story)
# better, but contrived looking with 'story' repeated everywhere.
# what if we want to tell the same
# story several times with a different name? We'd have to copy and
# paste all three lines.
# so, obviously, make a method:
def tell_the_story(name)
story = the_bunny(name)
story = went_to_the_forest(story)
story = and_ate_some_grass(story)
story
end
tell_the_story('Foofoo')
tell_the_story('Booboo')
# thats great, but what if you want the option to
# just use a piece of your story?
def partial_story(name)
story = the_bunny(name)
story = went_to_the_forest(story)
story
end
def full_story(name)
story = partial_story(name)
story = and_ate_some_grass(story)
story
end
partial_story('Foofoo')
full_story('Booboo')
# ugh. what if there are many possible sub stories?
# use lambdas with a pipeline:
storyline = [
:the_bunny,
:went_to_the_forest,
:and_ate_some_grass
].map(&method(:method))
storyline[0..1].inject('Foofoo') { |v, m| m.(v) }
storyline.inject('Booboo') { |v, m| m.(v) }
# nice, but ruby syntax starts getting in the way
# we can at least hide it away
def tell_the_story(storyline, name)
storyline.inject(name) { |v, m| m.(v) }
end
tell_the_story(storyline[0..1], 'Foofoo')
tell_the_story(storyline, 'Booboo')
# this is still sort of all over the place.
# we can wrap it up in a class:
class Bunny
def initialize(name)
@story = "The bunny #{name}"
end
def went_to_the_forest
@story += ' went to the forest'
self
end
def and_ate_some_grass
@story += ' and ate some grass'
self
end
def the_end
@story
end
end
Bunny.new('Foofoo')
.went_to_the_forest
.and_ate_some_grass
.the_end
Bunny.new('Booboo')
.went_to_the_forest
.the_end
# which is actually pretty great in terms of readability. but
# what if there's another ending, which this class doesn't know about?
def new_ending(story)
"#{story} and then gets eaten by a fox!"
end
new_ending(Bunny.new('Booboo')
.went_to_the_forest
.the_end)
# the nice readability of our story in code is gone, especially
# if there's more than one of these building blocks.
# but suppose we skip all this superstructure and use our original
# methods plus a small glue method in the data class?
class String
def |(fun, *args)
method(fun).(self, *args)
end
end
# voila
'Foofoo' |
:the_bunny |
:went_to_the_forest |
:and_ate_some_grass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment