You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// Schemas are defined in a ui, not in text, they are too rich// States are defined in a ui, not in text, they are too richcalculateAverages(cells: Cells){// This function has no context (this/self)// It is pure, no side-effects// It cannot run tasks// It can't modify tables, or any data really// It can be given table cells as arguments to read, but again can't modify them// It must be performant// The compiler checks that it is branch complete// In the UI when you make a new function, the first thing it does is ask you for examples// Functions can be run in cells with =, like in a spreadsheet "=calculateAverages(#name)"total=sum(cells)returntotal/cells.length}getAnomalousCells(cells: Cells)->[Cell]{// Return type optional like in TypeScript// Functions can be 'chained' like in unix, with the pipe `|`returncalculateAverages(cells)|filter(cell,avg)=> avg >5.0|forEach(cell,avg)=>cell}tasknotifyUser(user){// Tasks have a context 'self'// Tasks do not return a value// Tasks can send messages to other services/actors in the system// Tasks can read/write tablesawaitnotifications.send('anomalous',user)}tasknotifyAnomalousUsers(){// For every user cell, schedule the notifyUser task in 10 seconds, then sleep .1 seconds// to do it gradually.// 'self' here is table this task is defined on. self.users == the users columgetAnomalousCells(self.users)|forEach(user)=>schedulenotifyUser(user)10s// duration literal, 10 seconds|sleep(.1s)}onmouseDown(event){// This trigger can only be running once, automatically debouncessuper.send(event)awaitnotifyAnomalousUsers()gotosent// Move to sent state}oneverykeyDown(event){// This trigger will run in parallel every time the event happensrunconsole.log("keyDown: {}",event.keyName)}oninit=>loaded{// When we go from init state to loaded staterunnotifyAnomalousUsers()return{
loaded =true}}on"reload"init=>loaded{// Named trigger, has a guard which will only run the trigger if the guard is satisfiedguardhasBeenLoadedreturn{
reloaded =true}}guardhasBeenLoaded{
return context.loaded}guardhasTitle{returncontext.__title}
functions
functional style
all pure + (global/higher/self context?)
have local variables, all immutable by default
lazy evaluated
no side-effects
cannot schedule tasks
cannot read/write to tables
must be performant (!!!)
graphed / reactive
measured for completeness (ooh)
tasks (aka coroutines / routines / procedures)
imperative style
can read/write to tables
allows side-effects
always async
can be scheduled
can be awaited
can call functions
can call other tasks
can be build like: task = notifications.send.create(name='anomalous', user=user)
can be used like await task, run task, defer task, errdefer task, schedule task <when>
defer, like in zig schedules it for the end of scope to be run then
errdefer similarly runs only when an exception occurs
can be called directly run notifications.send('anomalous', user)
triggers
trigger when sent a message
imperative style
just like tasks
given exactly one argument (the message)
state triggers
trigger when enter a state from another
returns a state context update
redefine these as spreadsheet tables
sets
lists
maps
state machine ???
= table[name=bob].age + 1
tables/columns can have various run-time constraints
uniqueness
sorted
selection / state
indexes
cells
can be functions with '=' (above)
are strongly typed
state-machine
actor / service
maps states to functions
services
tracked by the routines that call them as part of the definition
data-oriented
sorta is
could we reframe tasks or functions as "systems"? Or maybe make a new type.
Hold that thought: "agents exist in an environment and react to changes".
Agents have logic (functions) and local variables (incl. pointers to persistent data). Agents subscribe to sources (instead of async/await). Agents react to updates from sources. Agents post to subscribers.
All you need now is a performant model of agency. Here is one model that I know very well: https://github.com/root-11/maslite but whether it is "performant" @270m msgs/sec is up to you to judge.
Something like this could work (if I interpret your spec right):
# magic.py # sorry. Couldn't find a better name.fromgraphimportGraphclassEnv(object):
def__init__(self):
self.graph=Graph()
def__call__(self, *args, **kwargs):
# lots of inspection magic using https://github.com/gaogaotiantian/watchpoints# registers id(var) + type(function) in graph.
Let's start in a simple way:
importmagicenv=magic.Env()
@enva=2
The @env decorator inspects the decorated value in the graph and watches the type and id.
If we check the type of a, we get something new:
>>>type(a)
env.a.int
A namespace name with a and it's type.
Let's create the first var-chain:
@envb=a+1
The @env decorator inspects the decorated value in the graph and watches the type and id. However as the right side of the function is a non-c-type the call to add will reveal that the integer of a is registered. @env therefore means that:
b.value=b.get(a.__add__, 1)
where a.add fetches as value that was set to 2.
If we change it:
a=3
the watcher detects the change of object id and cascades the change along the graphs vector 'a'--> 'b'
Let's add some more cases: What shall we do with Container types? The decorator must inspect the decorated and hijacks all signature functions (class level) as this type is a container and registers this to the graph.
@envdata= [1, 2, a]
This means that type will return a pwned list and not the cpython list:
>>>type(data)
env.data.list
Metaprogramming helps us to simplify the class management. If we didn't do this the code would NOT be performant (in my view).
Let's mix it with something that isn't registered: Here's a plain callable:
deff(x):
returnx*x
I now want to use it in conjunction with a registered function:
@envnew=map(f,data)
Again we use the decorator to inspects the decorated which is type callable and subscribe to updates to *args & **kwargs in the env.graph
At this point I've set up the equivalent of a reactive spreadsheet, which the env.graph knows as:
As we pwned all signature functions on the list when creating data, this will also work:
>>>data.append(4)
>>>new
[1, 4, 9, 16]
That's what you wanted right?
Hold on! I said performant!
As of python3.10 we can keep all of the graph's branches in shared_memory, which allows us to exploit multiprocessing. All we need to do is to tell the each subproc which branches to evaluate and for "how many steps": Most spreadsheets have a split-merge pattern and can safely be evaluated in a single subproc for the computational workload between each split and merge. Loops and nested loops can be flattened at the cost of maintaining the graph. For example
forvindata:
v+=1
can be translated to a map function where len(data) can be split between subprocs. I find that 1.0e6 is good number for splitting tasks; e.g. a million evaluations per subproc before coming back to the graph.
Ok. I can do this.
Question: Is it a good idea?
Answer: Not for analytics.
The biggest ugliest challenge is to make a user of this system understand that when their graph contains 1.0bn evaluations that merge into a single registered variable, it will required 1.0bn evaluations when 1 input is updated.
This means that stopping or pausing the cascade when doing a number of updates is actually more important that computing everything really quickly.
So somebody will at some point suggest to add env.pause = True whilst the user is making some mass-update. And after a while you'll discover that this will bring you back to square 1 of why nobody does this for analytics: The graph becomes the overhead to a program that would run simpler and faster as a classic procedural program.
but! .... sigh ...
If you're working with a pipeline of data where tiny updates trickle through the system it makes some sense. Practical use cases are monitoring systems that detect anomalies or multiplayer online games like eve online. These are better built as pure agent based systems centered around a persistent messaging system, so that modules that evaluate the data can be updated asynchronously to the flow of the data. And if you walk further down this road you'll probably find a small distributed system will be more maintainable.
data sources ---> "hot" msg queue ---> message store
v v
live analytics slow analytics
Hold that thought: "agents exist in an environment and react to changes".
Agents have logic (functions) and local variables (incl. pointers to persistent data). Agents subscribe to sources (instead of async/await). Agents react to updates from sources. Agents post to subscribers.
All you need now is a performant model of agency. Here is one model that I know very well: https://github.com/root-11/maslite but whether it is "performant" @270m msgs/sec is up to you to judge.
Something like this could work (if I interpret your spec right):
Let's start in a simple way:
The @env decorator inspects the decorated value in the graph and watches the type and id.
If we check the type of a, we get something new:
A namespace name with
a
and it's type.Let's create the first var-chain:
The @env decorator inspects the decorated value in the graph and watches the type and id. However as the right side of the function is a non-c-type the call to add will reveal that the integer of
a
is registered. @env therefore means that:where a.add fetches
a
s value that was set to 2.If we change it:
the watcher detects the change of object id and cascades the change along the graphs vector 'a'--> 'b'
Let's add some more cases: What shall we do with Container types? The decorator must inspect the decorated and hijacks all signature functions (class level) as this type is a container and registers this to the graph.
This means that
type
will return a pwned list and not thecpython
list:Metaprogramming helps us to simplify the class management. If we didn't do this the code would NOT be performant (in my view).
Let's mix it with something that isn't registered: Here's a plain callable:
I now want to use it in conjunction with a registered function:
Again we use the decorator to inspects the decorated which is type
callable
and subscribe to updates to *args & **kwargs in the env.graphAt this point I've set up the equivalent of a reactive spreadsheet, which the
env.graph
knows as:Now I want to it to react:
As the var
a
is registered in @env and watched, @env can now cascade the change using the graph as a map of operations. Here's what it can look like at scale: https://github.com/root-11/tablite/blob/master/images/incremental_dataprocessing.svg(I've been doing this in production at 1.0e12 rows of data per month).
Evaluating
data
will now show:and evaluating
new
will now show:As we pwned all signature functions on the list when creating
data
, this will also work:That's what you wanted right?
Hold on! I said performant!
As of python3.10 we can keep all of the graph's branches in shared_memory, which allows us to exploit multiprocessing. All we need to do is to tell the each subproc which branches to evaluate and for "how many steps": Most spreadsheets have a split-merge pattern and can safely be evaluated in a single subproc for the computational workload between each split and merge. Loops and nested loops can be flattened at the cost of maintaining the graph. For example
can be translated to a
map
function wherelen(data)
can be split between subprocs. I find that 1.0e6 is good number for splitting tasks; e.g. a million evaluations per subproc before coming back to the graph.The biggest ugliest challenge is to make a user of this system understand that when their graph contains 1.0bn evaluations that merge into a single registered variable, it will required 1.0bn evaluations when 1 input is updated.
This means that stopping or pausing the cascade when doing a number of updates is actually more important that computing everything really quickly.
So somebody will at some point suggest to add
env.pause = True
whilst the user is making some mass-update. And after a while you'll discover that this will bring you back to square 1 of why nobody does this for analytics: The graph becomes the overhead to a program that would run simpler and faster as a classic procedural program.If you're working with a pipeline of data where tiny updates trickle through the system it makes some sense. Practical use cases are monitoring systems that detect anomalies or multiplayer online games like eve online. These are better built as pure agent based systems centered around a persistent messaging system, so that modules that evaluate the data can be updated asynchronously to the flow of the data. And if you walk further down this road you'll probably find a small distributed system will be more maintainable.