Skip to content

Instantly share code, notes, and snippets.

@lbradstreet
Created January 13, 2016 10:30
Show Gist options
  • Save lbradstreet/79e8e36e0af95494c722 to your computer and use it in GitHub Desktop.
Save lbradstreet/79e8e36e0af95494c722 to your computer and use it in GitHub Desktop.
building dynamic tasks/workflows for users
I've tried various things to keep my tasks as flexible as possible,
but there's always going to be a lot of configuration by the user to
setup their tasks/workflows unless they are re-running something or
using something written by another user. Anyway, mostly I use the past
the key(s) approach while proxying to a more plain clojure function
that can be used elsewhere. I messed around a bit with prismatic
schema as well to try to recognize signatures in certain maps with
different keys but similar structures. It works but I think precision
is more important than convenience when dealing with lots of different
inputs, thus trying to avoid any kind of auto-magic stuff that can
cause unintended consequences.
Most of the time I am trying to consume from and output to some mix of
kafka and/or datomic. As such I'm trying to at least make sure what
comes in and out has some sort of registered schema (easy in datomic)
I can check despite all the different inputs and outputs I am facing.
gardnervickers
21:47
@ytraverse If you come up with some nice ways of doing this, consider
contributing them
tohttps://github.com/onyx-platform/lib-onyx/tree/master. I am in the
process of updating it right now and I believe many people will run
into the issues your having.
ytraverse
21:52
Will do, thanks
If anything though it will be a bit. I am focusing on a lot of UI work
at the moment for generating these tasks, hence my concern for reusing
and minimizing the amount of user input (extra steps that produce
results like mapping fields to function params) to produce the tasks
gardnervickers
21:55
Yes this is also on my mind at the moment too. Sounds very interesting
and the perfect use case for Onyx.
ytraverse
21:58
Yes, it is exactly the reason I picked Onyx - dynamic assembly of
tasks. Professionally and in earlier iterations of what I am doing, I
was using various things like Spark, Storm, and messing around with
Samza. I've formally made the switch for my prototype which I decided
to work on full-time starting January 1.
Another reason I am using Onyx is I enjoy the Clojure ecosystem and
it's a good fit for my use-cases anyway. Onyx seems more suited to my
overall approach than lets say Storm (despite Clojure). I think for
me, flexibility is more important than processing speed.
The hard part has been trying to wrap my head around a way that isn't
too nonsensical, crushing, or confusing for the user to create
workflows in the UI. I expect most users will just reuse what other
people have built in my case, but for those that choose to create or
modify workflows, I want a good experience. I'm actually not building
a tool that is a UI wrapper around Onyx, rather it is more of a
streaming data analysis and stream exploration environment. As such, a
great majority of things are simply sent on to Onyx, while certain
things are calculated by other frameworks/plain clojure functions or
even client-side if the operation is suited that way.
gardnervickers
22:09
nice! I’m excited to see what you come up with in this space
lbradstreet
22:18
Me too. Some experimentation is definitely needed especially where
user job generation is concerned. I've got ideas but haven't settled
on anything yet.
ytraverse
22:20
I thought of going way overboard with some kind of compiler and
language type approach, but mostly it's not needed for an initial
effort. You guys should be happy that you can do quite a bit that is
simple. I will have to generate some things from what I am doing to
map them to onyx, but nothing too complicated. Perhaps I will use Onyx
to do that . Eventually I may have some sort of command-entry/repl
type option for doing things, but I am trying not to waste time on
that. Visual candy is a bit more important to the people I need to
fund my efforts.
At the same time, I am trying to avoid something that is too literal
like Yahoo Pipes
I was a big graph theory person in the past and it makes me want to
murder myself every time I see a graph rendered in some js
visualization. While it can be useful and cool for some very specific
cases, mostly it doesn't work, scale, or prove too useful for most
things people actually want to do. The minute you have a few thousand
vertices for example, it breaks. A similar problem exists for creating
workflows - while you can create some very literal interfaces, they
break down and get confusing once you need real-world complexity.
A good example of task flows done wrong is the old Workflow engines in
Visual Studio. Using the UI was beyond worthless and to make matters
even worse, they would code-gen stuff in the past when you used the
UI. If you've never encountered a visual workflow editor, consider
yourself lucky.
lbradstreet
22:27
Oh god. I used the newer version of the workflow engine in visual
studio (by force). The workflows were unmaintainable
Bad memories
The idea was that our customers would use them but they were so
complicated that programmers found them insanely tricky
ytraverse
22:35
Yeah the state management alone was enough to break most workflows.
There was a time I made a good chunk of change for an old consulting
company I was working with just going around and fixing these things
among many other abominations of nature.
gardnervickers
22:50
Yikes that looks hairy
manderson202
23:10
Great discussion here. I'm working on almost exactly the same thing
right now: providng a business-user interface into composable workflow
tasks using onyx as the engine. @ytraverse interested to hear more
about your results. @lbradstreet onyx-blocks looks cool and interested
to see more.
My initial implementation involves specifying documentation and schema
details for the parameters available for each task
This means that for each function you need to also specify doc details
that can be returned to a UI to display
My first pass is quick and dirty, but I learned a lot. Hope to improve as I go.
ytraverse
23:27
@manderson202 I am also specifying docs to be used in the UI
manderson202
23:28
Sounds like we're taking a very similar approach. I'm also making
tasks reusable via parameters.
ytraverse
23:28
@manderson202 My design has a pre-defined list of tasks like you might
find in Excel. It is searchable with various amounts of metadata and
such on it. I am storing all the information in datomic, taking
advantage of the caching/history there, and sending it on to the
client.
Since there are so many tasks in general, I am trying to reduce the
number by keeping things general but there's a fine line between
configurability and making the task setup an arduous affair
Generally I am thinking of registering the tasks to user/groups/other
primitives, meaning that the visibility of tasks will be only what you
need
So in the unix sense, it would be like installing only what you need
from the package manager
manderson202
23:30
Yep, same thing here. I've been toying with the idea of a DSL for
defining a workflow. no matter how many tasks you give users and how
much configuratbility there is always something you missed.
ytraverse
23:30
And you can of course search everything, but that's quite a bit
different than the use case where you're working with math functions
and you don't care about the 20000 other things you can do
manderson202
23:31
yep. I added in the concept of tags or labels so you can sort by the
type of function you're looking for, likemath or string functions,
etc...
ytraverse
23:31
Part of the idea is to actually allow people to restrict what can be
used, and another part is usability. Then there's of course
performance - as the task library grows, I don't want to send the
entire payload to the client at once.
Searching everything is fine because you're only showing the results
though. I'm also a believer in not blowing up things.
manderson202
23:32
always a balance between flexibility and ease of use.
ytraverse
23:32
I know it's a contentious topic, but I'm one of those people that
doesn't believe in paging at all.
That is to say, if you have 10000 results as your search, you failed.
Obviously sometimes that matters, but usually what someone wanted was
the count, not the actual results in those cases. Getting back to
tasks though, it just means I'm trying to push the minimal amount of
tasks to choose from into the UI at any one time.
manderson202
23:33
makes sense. nobody can process 1000's of tasks anyway. work to give
the user what they want.
process mentally...
ytraverse
23:34
The real part I am struggling with and I'd be curious to hear thoughts
is assigning parameters as I mentioned before. With input as numbers
and then 1 increment task, pretty easy....
When you start getting chains of tasks or more complex input,
assigning parameters is laborious
For instance, if you have a sentence split function with a param
sentence, and the previous task loaded a tweet with the field "tweet"
as the tweet text, you'd need to map tweet->sentence
In code that's super easy, but imagine doing that over and over for
lots of parameters in a UI
Yahoo Pipes I think solves it by drawing lines between fields in different pipes
I've seen other approaches like "property sheets" ala Visual Studio
To some degree you can know what the fields from the previous task
are, then use UI like a drop down to at least assign them that way
from a minimal set. But I am not sure you can always know exactly the
fields unless you explicitly require a schema up-front. That is, you
must know the schema of the input to the workflow so you can
understand how things get transformed.
manderson202
23:38
right, that's a good point. I haven't gotten that far yet, but the
more it can resemble unix pipes the better where the transition
between tasks is as dumb as possible.
ytraverse
23:38
I agree with you completely about the unix pipe approach
That is what I am going for, but unix has some advantages and
disadvantages for pipes
Onyx/Clojure is generally working with data structures, while pipes
are flexible due to raw text as input
manderson202
23:39
Right. one way would be to define some sort of canonical segment that
your tasks know about with business data and meta-data in defined
places in the map. This can work, but can also get unwieldy pretty
quickly.
ytraverse
23:39
but the raw text comes at huge costs - optimization, validity of
input, unpredictable results, etc.
I spent awhile looking at some of the people that tried to make
structured unix pipes. They all failed :(
Everything from protobuf to byte streams
Of course in the case of Onyx, there isn't the cruft that is
specifically tied to the existing ecosystem to worry about
Nonetheless, I'm for sure trying to make something where you can build
an onyx workflow as easy as a pipe
If you have any ideas on the transitions, I would love to hear them
manderson202
23:42
yeah and likewise. I'm happy to share thoughts here as I go along
ytraverse
23:43
There's also the notion of command-line like syntax. Unix generally
deals with flat flows between pipes. More complex users and later
iterations have allowed a lot more. There's obviously "tee" for
starters, and then the ability to do things like
cat file.txt | tee >(pbcopy) >(do_stuff) >(do_more_stuff) | grep errors
thankfully we don't need to worry about process substitution, forking,
etc. in the same ways as far as any kind of shell-like syntax goes,
but there is the need to express a complex dag with params
manderson202
23:45
yeah, this is similar to my thinking for a DSL for defining the workflows
ytraverse
23:45
s-expressions are already better at this in some ways but then you get
into the argument that users hate s-expressions, but what normal users
are using the shell
manderson202
23:46
i think you have to have multiple levels of abstraction for users.
many times you can knock out 80% of use cases with a simple, user
friendly UI that favors usability over flexibility. Then provide some
lower level hooks to "power users" that need the extra flexibility.
ytraverse
23:46
agreed
I'm not sure if this helps anyone, but actually my project originally
started out as a tool for graph traversals
I decided at some point stream processing was a target I needed to hit
anyway and trying to get people to stuff things into graphs is already
a pain. I can build a graph in stream processors as it is and do the
more interesting things there.
Anyway, I mention it because it's worth a look at Apache Tinkerpop
http://tinkerpop.apache.org/docs/3.1.0-incubating/
manderson202
23:49
I'm familiar with it. Used it a bit in an early prototype for another
project. May end up being used more, but haven't had time to get back
to it.
ytraverse
23:49
I've been a user for years and built some of my original prototypes
off Titan, but it got acquired by Datastax which made me a bit
nervous. I also wanted to be more clojure-centric anyway and the
clojure tooling was lagging behind at the time (still is). Long story
short, I changed things to be more centered around Onyx, but there's
some inspiration perhaps here in some of the dataflow, repl, etc.
my original UI was sort of this immutable flow of traversal steps,
which I've thought about adopting into using for making something with
Onyx
As you move forward in the traversal, if you didn't have any action
selected, you were essentially prompted for what's next....like out
edges, or in the case of onyx, it would be what task is next like
increment
and so just kept adding ui elements in a vertical list as you go, and
prompting for configuration
Hard to explain, but I haven't really seen much like it. Rather simple
in a sense of more or less being a graphical, iterative repl
manderson202
23:53
sounds cool. i like the approach.
ytraverse
23:55
so the point was to again avoid something that was just lots of mouse
dragging, layers of windows, gimmicks, etc.
unfortunately I feared it would get too confusing despite really just
showing you a full history of what you were doing and letting you do
cool things like query itself and so on
i.e. programmer would probably think it was awesome, but even some
power ordinary users might be lost
May still pursue that angle, but I am thinking now about finding a
good way of deconstructing things as Onyx does itself. Split up as
much as you can in the UI as well to keep the focus small. The
unintended consequence though is then you have a lot of things that
are related in different places and you need to ensure the user can
get a good picture of what the heck is actually going on. Trying to
think of ways to mitigate that, like simulating the input through the
workflow in a live, client-side way.
manderson202
23:58
yep, I've thought about that too. User needs to visualize what is happening.
ytraverse
23:58
In other words, let the user run things with a small amount of data to
see if it's actually doing what it is supposed to. But you invevitably
hit limitations client-side, and for some things you must go to the
server no matter what.
still, I think most remote calls can run without the overhead of onyx
itself. Just passing things in regular old function chains with
channels, transducers, and the like.
then the real workflow will run in onyx
But I don't want to create something where you will get vastly
different results because of the execution models. When you factor in
things like output mediums, side effects, etc. it gets complicated
In tikerpop pipes, you kind of face the same issue
manderson202
00:03
it's a hard problem for sure. gotta run now, but be great to hear more
as you go through the process.
ytraverse
00:03
There are 2 execution models - OLAP and OLTP. In OLTP, things run
depth-first and OLTP breadth-first. While not exactly the same
difference of running the same functions inside and outside onyx, the
point is that even though largely the same program specification can
be used, you really need to be careful because your results may not
end up the same. It's always a concern if you are doing anything in
parallel vs. serial at the most basic level.
Likewise, message me/mention me anytime. Thanks for your thought.
thoughts.
manderson202
00:04
absolutely, nice chatting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment