Created
January 13, 2016 10:30
-
-
Save lbradstreet/79e8e36e0af95494c722 to your computer and use it in GitHub Desktop.
building dynamic tasks/workflows for users
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I've tried various things to keep my tasks as flexible as possible, | |
but there's always going to be a lot of configuration by the user to | |
setup their tasks/workflows unless they are re-running something or | |
using something written by another user. Anyway, mostly I use the past | |
the key(s) approach while proxying to a more plain clojure function | |
that can be used elsewhere. I messed around a bit with prismatic | |
schema as well to try to recognize signatures in certain maps with | |
different keys but similar structures. It works but I think precision | |
is more important than convenience when dealing with lots of different | |
inputs, thus trying to avoid any kind of auto-magic stuff that can | |
cause unintended consequences. | |
Most of the time I am trying to consume from and output to some mix of | |
kafka and/or datomic. As such I'm trying to at least make sure what | |
comes in and out has some sort of registered schema (easy in datomic) | |
I can check despite all the different inputs and outputs I am facing. | |
gardnervickers | |
21:47 | |
@ytraverse If you come up with some nice ways of doing this, consider | |
contributing them | |
tohttps://github.com/onyx-platform/lib-onyx/tree/master. I am in the | |
process of updating it right now and I believe many people will run | |
into the issues your having. | |
ytraverse | |
21:52 | |
Will do, thanks | |
If anything though it will be a bit. I am focusing on a lot of UI work | |
at the moment for generating these tasks, hence my concern for reusing | |
and minimizing the amount of user input (extra steps that produce | |
results like mapping fields to function params) to produce the tasks | |
gardnervickers | |
21:55 | |
Yes this is also on my mind at the moment too. Sounds very interesting | |
and the perfect use case for Onyx. | |
ytraverse | |
21:58 | |
Yes, it is exactly the reason I picked Onyx - dynamic assembly of | |
tasks. Professionally and in earlier iterations of what I am doing, I | |
was using various things like Spark, Storm, and messing around with | |
Samza. I've formally made the switch for my prototype which I decided | |
to work on full-time starting January 1. | |
Another reason I am using Onyx is I enjoy the Clojure ecosystem and | |
it's a good fit for my use-cases anyway. Onyx seems more suited to my | |
overall approach than lets say Storm (despite Clojure). I think for | |
me, flexibility is more important than processing speed. | |
The hard part has been trying to wrap my head around a way that isn't | |
too nonsensical, crushing, or confusing for the user to create | |
workflows in the UI. I expect most users will just reuse what other | |
people have built in my case, but for those that choose to create or | |
modify workflows, I want a good experience. I'm actually not building | |
a tool that is a UI wrapper around Onyx, rather it is more of a | |
streaming data analysis and stream exploration environment. As such, a | |
great majority of things are simply sent on to Onyx, while certain | |
things are calculated by other frameworks/plain clojure functions or | |
even client-side if the operation is suited that way. | |
gardnervickers | |
22:09 | |
nice! I’m excited to see what you come up with in this space | |
lbradstreet | |
22:18 | |
Me too. Some experimentation is definitely needed especially where | |
user job generation is concerned. I've got ideas but haven't settled | |
on anything yet. | |
ytraverse | |
22:20 | |
I thought of going way overboard with some kind of compiler and | |
language type approach, but mostly it's not needed for an initial | |
effort. You guys should be happy that you can do quite a bit that is | |
simple. I will have to generate some things from what I am doing to | |
map them to onyx, but nothing too complicated. Perhaps I will use Onyx | |
to do that . Eventually I may have some sort of command-entry/repl | |
type option for doing things, but I am trying not to waste time on | |
that. Visual candy is a bit more important to the people I need to | |
fund my efforts. | |
At the same time, I am trying to avoid something that is too literal | |
like Yahoo Pipes | |
I was a big graph theory person in the past and it makes me want to | |
murder myself every time I see a graph rendered in some js | |
visualization. While it can be useful and cool for some very specific | |
cases, mostly it doesn't work, scale, or prove too useful for most | |
things people actually want to do. The minute you have a few thousand | |
vertices for example, it breaks. A similar problem exists for creating | |
workflows - while you can create some very literal interfaces, they | |
break down and get confusing once you need real-world complexity. | |
A good example of task flows done wrong is the old Workflow engines in | |
Visual Studio. Using the UI was beyond worthless and to make matters | |
even worse, they would code-gen stuff in the past when you used the | |
UI. If you've never encountered a visual workflow editor, consider | |
yourself lucky. | |
lbradstreet | |
22:27 | |
Oh god. I used the newer version of the workflow engine in visual | |
studio (by force). The workflows were unmaintainable | |
Bad memories | |
The idea was that our customers would use them but they were so | |
complicated that programmers found them insanely tricky | |
ytraverse | |
22:35 | |
Yeah the state management alone was enough to break most workflows. | |
There was a time I made a good chunk of change for an old consulting | |
company I was working with just going around and fixing these things | |
among many other abominations of nature. | |
gardnervickers | |
22:50 | |
Yikes that looks hairy | |
manderson202 | |
23:10 | |
Great discussion here. I'm working on almost exactly the same thing | |
right now: providng a business-user interface into composable workflow | |
tasks using onyx as the engine. @ytraverse interested to hear more | |
about your results. @lbradstreet onyx-blocks looks cool and interested | |
to see more. | |
My initial implementation involves specifying documentation and schema | |
details for the parameters available for each task | |
This means that for each function you need to also specify doc details | |
that can be returned to a UI to display | |
My first pass is quick and dirty, but I learned a lot. Hope to improve as I go. | |
ytraverse | |
23:27 | |
@manderson202 I am also specifying docs to be used in the UI | |
manderson202 | |
23:28 | |
Sounds like we're taking a very similar approach. I'm also making | |
tasks reusable via parameters. | |
ytraverse | |
23:28 | |
@manderson202 My design has a pre-defined list of tasks like you might | |
find in Excel. It is searchable with various amounts of metadata and | |
such on it. I am storing all the information in datomic, taking | |
advantage of the caching/history there, and sending it on to the | |
client. | |
Since there are so many tasks in general, I am trying to reduce the | |
number by keeping things general but there's a fine line between | |
configurability and making the task setup an arduous affair | |
Generally I am thinking of registering the tasks to user/groups/other | |
primitives, meaning that the visibility of tasks will be only what you | |
need | |
So in the unix sense, it would be like installing only what you need | |
from the package manager | |
manderson202 | |
23:30 | |
Yep, same thing here. I've been toying with the idea of a DSL for | |
defining a workflow. no matter how many tasks you give users and how | |
much configuratbility there is always something you missed. | |
ytraverse | |
23:30 | |
And you can of course search everything, but that's quite a bit | |
different than the use case where you're working with math functions | |
and you don't care about the 20000 other things you can do | |
manderson202 | |
23:31 | |
yep. I added in the concept of tags or labels so you can sort by the | |
type of function you're looking for, likemath or string functions, | |
etc... | |
ytraverse | |
23:31 | |
Part of the idea is to actually allow people to restrict what can be | |
used, and another part is usability. Then there's of course | |
performance - as the task library grows, I don't want to send the | |
entire payload to the client at once. | |
Searching everything is fine because you're only showing the results | |
though. I'm also a believer in not blowing up things. | |
manderson202 | |
23:32 | |
always a balance between flexibility and ease of use. | |
ytraverse | |
23:32 | |
I know it's a contentious topic, but I'm one of those people that | |
doesn't believe in paging at all. | |
That is to say, if you have 10000 results as your search, you failed. | |
Obviously sometimes that matters, but usually what someone wanted was | |
the count, not the actual results in those cases. Getting back to | |
tasks though, it just means I'm trying to push the minimal amount of | |
tasks to choose from into the UI at any one time. | |
manderson202 | |
23:33 | |
makes sense. nobody can process 1000's of tasks anyway. work to give | |
the user what they want. | |
process mentally... | |
ytraverse | |
23:34 | |
The real part I am struggling with and I'd be curious to hear thoughts | |
is assigning parameters as I mentioned before. With input as numbers | |
and then 1 increment task, pretty easy.... | |
When you start getting chains of tasks or more complex input, | |
assigning parameters is laborious | |
For instance, if you have a sentence split function with a param | |
sentence, and the previous task loaded a tweet with the field "tweet" | |
as the tweet text, you'd need to map tweet->sentence | |
In code that's super easy, but imagine doing that over and over for | |
lots of parameters in a UI | |
Yahoo Pipes I think solves it by drawing lines between fields in different pipes | |
I've seen other approaches like "property sheets" ala Visual Studio | |
To some degree you can know what the fields from the previous task | |
are, then use UI like a drop down to at least assign them that way | |
from a minimal set. But I am not sure you can always know exactly the | |
fields unless you explicitly require a schema up-front. That is, you | |
must know the schema of the input to the workflow so you can | |
understand how things get transformed. | |
manderson202 | |
23:38 | |
right, that's a good point. I haven't gotten that far yet, but the | |
more it can resemble unix pipes the better where the transition | |
between tasks is as dumb as possible. | |
ytraverse | |
23:38 | |
I agree with you completely about the unix pipe approach | |
That is what I am going for, but unix has some advantages and | |
disadvantages for pipes | |
Onyx/Clojure is generally working with data structures, while pipes | |
are flexible due to raw text as input | |
manderson202 | |
23:39 | |
Right. one way would be to define some sort of canonical segment that | |
your tasks know about with business data and meta-data in defined | |
places in the map. This can work, but can also get unwieldy pretty | |
quickly. | |
ytraverse | |
23:39 | |
but the raw text comes at huge costs - optimization, validity of | |
input, unpredictable results, etc. | |
I spent awhile looking at some of the people that tried to make | |
structured unix pipes. They all failed :( | |
Everything from protobuf to byte streams | |
Of course in the case of Onyx, there isn't the cruft that is | |
specifically tied to the existing ecosystem to worry about | |
Nonetheless, I'm for sure trying to make something where you can build | |
an onyx workflow as easy as a pipe | |
If you have any ideas on the transitions, I would love to hear them | |
manderson202 | |
23:42 | |
yeah and likewise. I'm happy to share thoughts here as I go along | |
ytraverse | |
23:43 | |
There's also the notion of command-line like syntax. Unix generally | |
deals with flat flows between pipes. More complex users and later | |
iterations have allowed a lot more. There's obviously "tee" for | |
starters, and then the ability to do things like | |
cat file.txt | tee >(pbcopy) >(do_stuff) >(do_more_stuff) | grep errors | |
thankfully we don't need to worry about process substitution, forking, | |
etc. in the same ways as far as any kind of shell-like syntax goes, | |
but there is the need to express a complex dag with params | |
manderson202 | |
23:45 | |
yeah, this is similar to my thinking for a DSL for defining the workflows | |
ytraverse | |
23:45 | |
s-expressions are already better at this in some ways but then you get | |
into the argument that users hate s-expressions, but what normal users | |
are using the shell | |
manderson202 | |
23:46 | |
i think you have to have multiple levels of abstraction for users. | |
many times you can knock out 80% of use cases with a simple, user | |
friendly UI that favors usability over flexibility. Then provide some | |
lower level hooks to "power users" that need the extra flexibility. | |
ytraverse | |
23:46 | |
agreed | |
I'm not sure if this helps anyone, but actually my project originally | |
started out as a tool for graph traversals | |
I decided at some point stream processing was a target I needed to hit | |
anyway and trying to get people to stuff things into graphs is already | |
a pain. I can build a graph in stream processors as it is and do the | |
more interesting things there. | |
Anyway, I mention it because it's worth a look at Apache Tinkerpop | |
http://tinkerpop.apache.org/docs/3.1.0-incubating/ | |
manderson202 | |
23:49 | |
I'm familiar with it. Used it a bit in an early prototype for another | |
project. May end up being used more, but haven't had time to get back | |
to it. | |
ytraverse | |
23:49 | |
I've been a user for years and built some of my original prototypes | |
off Titan, but it got acquired by Datastax which made me a bit | |
nervous. I also wanted to be more clojure-centric anyway and the | |
clojure tooling was lagging behind at the time (still is). Long story | |
short, I changed things to be more centered around Onyx, but there's | |
some inspiration perhaps here in some of the dataflow, repl, etc. | |
my original UI was sort of this immutable flow of traversal steps, | |
which I've thought about adopting into using for making something with | |
Onyx | |
As you move forward in the traversal, if you didn't have any action | |
selected, you were essentially prompted for what's next....like out | |
edges, or in the case of onyx, it would be what task is next like | |
increment | |
and so just kept adding ui elements in a vertical list as you go, and | |
prompting for configuration | |
Hard to explain, but I haven't really seen much like it. Rather simple | |
in a sense of more or less being a graphical, iterative repl | |
manderson202 | |
23:53 | |
sounds cool. i like the approach. | |
ytraverse | |
23:55 | |
so the point was to again avoid something that was just lots of mouse | |
dragging, layers of windows, gimmicks, etc. | |
unfortunately I feared it would get too confusing despite really just | |
showing you a full history of what you were doing and letting you do | |
cool things like query itself and so on | |
i.e. programmer would probably think it was awesome, but even some | |
power ordinary users might be lost | |
May still pursue that angle, but I am thinking now about finding a | |
good way of deconstructing things as Onyx does itself. Split up as | |
much as you can in the UI as well to keep the focus small. The | |
unintended consequence though is then you have a lot of things that | |
are related in different places and you need to ensure the user can | |
get a good picture of what the heck is actually going on. Trying to | |
think of ways to mitigate that, like simulating the input through the | |
workflow in a live, client-side way. | |
manderson202 | |
23:58 | |
yep, I've thought about that too. User needs to visualize what is happening. | |
ytraverse | |
23:58 | |
In other words, let the user run things with a small amount of data to | |
see if it's actually doing what it is supposed to. But you invevitably | |
hit limitations client-side, and for some things you must go to the | |
server no matter what. | |
still, I think most remote calls can run without the overhead of onyx | |
itself. Just passing things in regular old function chains with | |
channels, transducers, and the like. | |
then the real workflow will run in onyx | |
But I don't want to create something where you will get vastly | |
different results because of the execution models. When you factor in | |
things like output mediums, side effects, etc. it gets complicated | |
In tikerpop pipes, you kind of face the same issue | |
manderson202 | |
00:03 | |
it's a hard problem for sure. gotta run now, but be great to hear more | |
as you go through the process. | |
ytraverse | |
00:03 | |
There are 2 execution models - OLAP and OLTP. In OLTP, things run | |
depth-first and OLTP breadth-first. While not exactly the same | |
difference of running the same functions inside and outside onyx, the | |
point is that even though largely the same program specification can | |
be used, you really need to be careful because your results may not | |
end up the same. It's always a concern if you are doing anything in | |
parallel vs. serial at the most basic level. | |
Likewise, message me/mention me anytime. Thanks for your thought. | |
thoughts. | |
manderson202 | |
00:04 | |
absolutely, nice chatting. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment