Skip to content

Instantly share code, notes, and snippets.

@DeadWisdom
Last active August 16, 2022 10:10
Show Gist options
  • Save DeadWisdom/c65603360bcfecd2f202d177518be209 to your computer and use it in GitHub Desktop.
Save DeadWisdom/c65603360bcfecd2f202d177518be209 to your computer and use it in GitHub Desktop.
// Schemas are defined in a ui, not in text, they are too rich
// States are defined in a ui, not in text, they are too rich

calculateAverages(cells: Cells) {
    // This function has no context (this/self)
    // It is pure, no side-effects
    // It cannot run tasks
    // It can't modify tables, or any data really
    // It can be given table cells as arguments to read, but again can't modify them
    // It must be performant
    // The compiler checks that it is branch complete
    // In the UI when you make a new function, the first thing it does is ask you for examples
    // Functions can be run in cells with =, like in a spreadsheet "=calculateAverages(#name)"
    total = sum(cells)
    return total / cells.length
}

getAnomalousCells(cells: Cells) -> [Cell] {
    // Return type optional like in TypeScript
    // Functions can be 'chained' like in unix, with the pipe `|`
    return calculateAverages(cells)
           | filter (cell, avg) => avg > 5.0
           | forEach (cell, avg) => cell
}

task notifyUser(user) {
    // Tasks have a context 'self'
    // Tasks do not return a value
    // Tasks can send messages to other services/actors in the system
    // Tasks can read/write tables
    await notifications.send('anomalous', user)
}

task notifyAnomalousUsers() {
    // For every user cell, schedule the notifyUser task in 10 seconds, then sleep .1 seconds
    // to do it gradually.
    // 'self' here is table this task is defined on. self.users == the users colum
    getAnomalousCells(self.users)
    | forEach (user) => schedule notifyUser(user) 10s   // duration literal, 10 seconds
    | sleep(.1s)
}

on mouseDown(event) {
    // This trigger can only be running once, automatically debounces
    super.send(event)
    await notifyAnomalousUsers() 
    goto sent // Move to sent state
}

on every keyDown(event) {
    // This trigger will run in parallel every time the event happens
    run console.log("keyDown: {}", event.keyName)
}

on init => loaded {
    // When we go from init state to loaded state
    run notifyAnomalousUsers()

    return {
      loaded = true
    }
}

on "reload" init => loaded {
    // Named trigger, has a guard which will only run the trigger if the guard is satisfied
    guard hasBeenLoaded

    return {
        reloaded = true
    }
}

guard hasBeenLoaded {
    return context.loaded
}

guard hasTitle {
    return context.__title
}

functions

  • functional style
  • all pure + (global/higher/self context?)
  • have local variables, all immutable by default
  • lazy evaluated
  • no side-effects
  • cannot schedule tasks
  • cannot read/write to tables
  • must be performant (!!!)
  • graphed / reactive
  • measured for completeness (ooh)

tasks (aka coroutines / routines / procedures)

  • imperative style
  • can read/write to tables
  • allows side-effects
  • always async
  • can be scheduled
  • can be awaited
  • can call functions
  • can call other tasks
  • can be build like: task = notifications.send.create(name='anomalous', user=user)
  • can be used like await task, run task, defer task, errdefer task, schedule task <when>
    • defer, like in zig schedules it for the end of scope to be run then
    • errdefer similarly runs only when an exception occurs
  • can be called directly run notifications.send('anomalous', user)

triggers

  • trigger when sent a message
  • imperative style
  • just like tasks
  • given exactly one argument (the message)

state triggers

  • trigger when enter a state from another
  • returns a state context update

redefine these as spreadsheet tables

  • sets
  • lists
  • maps
  • state machine ???
  • = table[name=bob].age + 1

tables/columns can have various run-time constraints

  • uniqueness
  • sorted
  • selection / state
  • indexes

cells

  • can be functions with '=' (above)
  • are strongly typed

state-machine

  • actor / service
  • maps states to functions

services

  • tracked by the routines that call them as part of the definition

data-oriented

  • sorta is
  • could we reframe tasks or functions as "systems"? Or maybe make a new type.
@root-11
Copy link

root-11 commented Aug 16, 2022

Hold that thought: "agents exist in an environment and react to changes".

Agents have logic (functions) and local variables (incl. pointers to persistent data). Agents subscribe to sources (instead of async/await). Agents react to updates from sources. Agents post to subscribers.

All you need now is a performant model of agency. Here is one model that I know very well: https://github.com/root-11/maslite but whether it is "performant" @270m msgs/sec is up to you to judge.

Something like this could work (if I interpret your spec right):

# magic.py  # sorry. Couldn't find a better name.

from graph import Graph

class Env(object):
    def __init__(self):
        self.graph = Graph()

    def __call__(self, *args, **kwargs):
        # lots of inspection magic using https://github.com/gaogaotiantian/watchpoints
        # registers id(var) + type(function) in graph.

Let's start in a simple way:

import magic  
env = magic.Env()

@env
a = 2

The @env decorator inspects the decorated value in the graph and watches the type and id.

If we check the type of a, we get something new:

>>> type(a)
env.a.int

A namespace name with a and it's type.

Let's create the first var-chain:

@env
b = a+1

The @env decorator inspects the decorated value in the graph and watches the type and id. However as the right side of the function is a non-c-type the call to add will reveal that the integer of a is registered. @env therefore means that:

b.value = b.get(a.__add__, 1)

where a.add fetches as value that was set to 2.

If we change it:

a = 3

the watcher detects the change of object id and cascades the change along the graphs vector 'a'--> 'b'

Let's add some more cases: What shall we do with Container types? The decorator must inspect the decorated and hijacks all signature functions (class level) as this type is a container and registers this to the graph.

@env  
data = [1, 2, a] 

This means that type will return a pwned list and not the cpython list:

>>> type(data)
env.data.list

Metaprogramming helps us to simplify the class management. If we didn't do this the code would NOT be performant (in my view).

Let's mix it with something that isn't registered: Here's a plain callable:

def f(x):  
    return x*x

I now want to use it in conjunction with a registered function:

@env  
new = map(f,data)

Again we use the decorator to inspects the decorated which is type callable and subscribe to updates to *args & **kwargs in the env.graph

At this point I've set up the equivalent of a reactive spreadsheet, which the env.graph knows as:

a --> data --> new
  \
    ---> b

Now I want to it to react:

a = -3

As the var a is registered in @env and watched, @env can now cascade the change using the graph as a map of operations. Here's what it can look like at scale: https://github.com/root-11/tablite/blob/master/images/incremental_dataprocessing.svg
(I've been doing this in production at 1.0e12 rows of data per month).

Evaluating data will now show:

>>> data
[1, 2, -3]

and evaluating new will now show:

>>> new
[1, 4, 9]

As we pwned all signature functions on the list when creating data, this will also work:

>>> data.append(4)
>>> new
[1, 4, 9, 16]

That's what you wanted right?

Hold on! I said performant!

As of python3.10 we can keep all of the graph's branches in shared_memory, which allows us to exploit multiprocessing. All we need to do is to tell the each subproc which branches to evaluate and for "how many steps": Most spreadsheets have a split-merge pattern and can safely be evaluated in a single subproc for the computational workload between each split and merge. Loops and nested loops can be flattened at the cost of maintaining the graph. For example

for v in data:
    v+=1

can be translated to a map function where len(data) can be split between subprocs. I find that 1.0e6 is good number for splitting tasks; e.g. a million evaluations per subproc before coming back to the graph.

Ok. I can do this.
Question: Is it a good idea?
Answer: Not for analytics.

The biggest ugliest challenge is to make a user of this system understand that when their graph contains 1.0bn evaluations that merge into a single registered variable, it will required 1.0bn evaluations when 1 input is updated.

This means that stopping or pausing the cascade when doing a number of updates is actually more important that computing everything really quickly.

So somebody will at some point suggest to add env.pause = True whilst the user is making some mass-update. And after a while you'll discover that this will bring you back to square 1 of why nobody does this for analytics: The graph becomes the overhead to a program that would run simpler and faster as a classic procedural program.

but! .... sigh ...

If you're working with a pipeline of data where tiny updates trickle through the system it makes some sense. Practical use cases are monitoring systems that detect anomalies or multiplayer online games like eve online. These are better built as pure agent based systems centered around a persistent messaging system, so that modules that evaluate the data can be updated asynchronously to the flow of the data. And if you walk further down this road you'll probably find a small distributed system will be more maintainable.

data sources ---> "hot" msg queue ---> message store
                   v                       v
            live analytics           slow analytics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment