ianb/thunks.md

## thunks.md

      
    Raw
  

              thunks.md
            
          
    Thunks

This is imagining an API to understand other programs. These programs, either large or small, are called "thunks" here. This isn't exactly what a CS thunk means, but it's also not not what it means, so it seems like an OK word.
This is also related to Dataflow programming, where the containing environment manages these thunks and the flow of data between them.
Thunk static API

A thunk is typically some source code (though it could be something besides string code), but its internals are specific to that specific kind of code. From the outside there's a more normal representation:
Data model:

For thinking purposes, let's imagine the data exchanged between thunks is all JSON-encodable. That is, concrete, no pointers, no loops, no types.
Free variables:

These are variables used in the thunk that are not defined in the thunk. So if the thunk is some Python math.abs(x) then x is the only free variable (for now we'll assume imports and namespace resolution is handled internal to the thunk, and math.abs is not a free variable).
Free variables in turn have a bunch of properties:
Name: The probably-global name, like x in the above example
Type: In some languages we can know the type. But this is more advisory than anything, given the inconsistency of type systems.
Properties: You can imagine some code like math.abs(player.pos.x): player is still a free variable, but knowing that we want .pos.x is helpful.
Methods: Another case might be math.abs(player.position().x): now we're getting a method off of a free variable. This means the execution will be interspersed and can't be prefetched.
Property assignments: For instance player.pos.x += 10. Reassigning a variable is handled in declarations. It's unclear what the JS player.pos[dimension] += 10 would show.
Required: In the case of math.abs(x) we know that the thunk is useless without x. But in 10 if y else x we don't know if x is needed (though we know y is needed). This doesn't have to be perfect, but we know there's no point in even trying to execute a thunk without required variables. This seems superfluous, but there are cases where the code is doing feature detection and then falling back to other APIs, and so there are effectively dead code paths that cause some noise. It might be worth specifically knowing if null is an "expected" value that might be substituted. This might mean looking for explicit null checks. If the code contains if x is None:... then we know that None is an expected input value. (It's worth considering if other types might suggest other "null" values, like [] for lists.)
Declarations

This is all the variables or functions that are defined in the thunk.
Name: it's unclear to me if there's some default global scope for all thunks. Of course a thunk could be evaluated in an isolated way, but by default is sharing done through a shared global namespace? Doesn't seem right...
Static: if the value is fixed, with no contingency on free variables or other dynamic inputs.
Type: If known. The most common known and fixed value is a function definition. I'm not sure treating functions the same as other declarations is a good idea.
Return value

Is there a return value? What type is it, if known?
Exceptions

Are there exceptions that are thrown?
Some exceptions we may see specifically referenced in the code. These are very clearly "intended" exceptions.
Some exceptions come through transitively from library code. These might only be determined empirically.
Some exceptions are the nature of the thunk container. Out of memory errors are a common example.
Triggers

Is this code triggered by something? Typically we "run" a piece of code, but code like the contents of onclick="" is run based on something else.
There might be a two-phase aspect to code that has triggers, where the triggers are declared (like addEventListener) and another where the triggers are processed.
Duration

Does this code just return a value? Does it sit around and do things? Can it run indefinitely? Can it be cancelled cleanly from the outside?
Determinism

Is this thunk deterministic? That is, given the same inputs will it always return the same outputs?
Required features

This is somewhat vague, and not always easy to tell, but what machine capabilities does this thunk require? Examples:

Random number generator
Current time
Network access

HTTP(s) access
Ongoing socket/websocket access


Access to files that aren't part of its environment (e.g., Python modules would not be on this list)
Access to particular resources, like a database
Access to other APIs

Side effects

Does this have side effects, like writing to files? Some of the other effects to the runtime environment are covered in free variables and declarations.
Logs

This is output that doesn't formally affect the "result" of the computation, but does get emitted. This might be line-based logs, but really any visualization of the progress fits here too. Some of these might be automatically generated by the programming environment.
Remote vs local execution

Does this thunk execute locally, or does some part execute remotely?
Some remote execution is "simple", like a SQL query. It does happen elsewhere, but it's probaby more-or-less within the same application mental model.
Some is not so simple, specifically client-side code. Can this model cover code execution in a user's browser?
Some things like feature detection are not available remotely. Maybe they can be declared, or maybe they should be expected to be true?
Dynamic information

Much of the information we collect (sometimes speculatively) in the static API can also be updated when the thunk runs. The types of variables become clear, even if we might not know that they will always be the same type in the future. We might not know the features the thunk requires, but if it tries to use those features we can detect that.
Execution information

A thunk should know how many times it has been run, if it's running right now (maybe more than once), and what the status of some of that is.
Information representation

A theme throughout this is that many bits of information are things we learn or guess from code but cannot guarantee. Representing this level of knowledge consistently will be... helpful.
We will expect that each bit of information will be represented in its own value, e.g., .is_deterministic
.value: our best guess value (most recent or most confident)
.confident: if true, then we are certain and any deviation from the value would be an error
.confidence: some confidence level. I think of at least KNOWN, NO_DETECTION_AVAILABLE (i.e., the system can't even try), UNKNOWN (nothing detected; it tried but learned nothing), OBSERVED (some level of dynamic detection; might require multiple levels), IN_ABSENCE_OF_MAGIC (if certain language features are not used, like messing around with module or using with: in JavaScript and so on, this should be true), IF_STATIC_PROPS (if there are no getters/setters), maybe others...
.dynamic... .static...: specifically dynamically-determined information or static. Otherwise the values are a best effort merge of the two.
.knowledge: this is a list of why a property has been determined. Further knowledge is appended, though dynamic knowledge collection will be (mostly) cut off as long as it stays consistent. What's in here is somewhat ad-hoc. For things like feature detection it might refer to specific URLs that a thunk tries to access, and so forth. Static analysis might create several bits of knowledge, not just .knowledge[0]