chexxor/Feature_Architecture.md

## Feature_Architecture.md

      
    Raw
  

              Feature_Architecture.md
            
          
    Following is how I would like to define feature development. I hope that by exploring the nature of feature development and its concerns, we can better choose productive constraints to impose on it and better design features such that they can be easily evolved and minimize disruptions to other features.
Feature Architecture

The following are attributes I would like features to have:
A feature in a data management system should be a simple workflow which provides value to a client.
A feature's interface should never break active clients, if it has them.
A feature can be deployed alongside other features in the same executable, but should be able to be trivially moved out and into a separate executable.
A feature should be sufficiently minimal such that changes to one features doesn't require changes to another feature.
A feature should expect to cooperate with an executor and delegate its context-dependent effects/coeffects to it. (Maybe?)
Pure FP principles and their mapping to feature development:

Immutable data == immutable, append-only message format for feature interface
Invariants enforced by granular and refined types == Invariants and laws described in API documentation
Invariants enforced by granular and refined types == Invariants enforced by API clients having granular types (code-generated clients?)
Maximally reusable data types == Maximally reusable message format
Maximally reusable functions == Maximally reusable APIs
Composable data types == Composable messages
Composable functions == Composable APIs
Type classes with algebraic properties == Group of API specifications returning the same type with algebraic properties
Avoid shared mutable state == Design persistent data to mitigate mutations
Referential transparency == API responses can be memoized, sequences of API interactions can be replaced with another without changing desired effects

Feature Design

It's important to consider ways to limit the scope of a feature and the essential aspects that should inform its definition, responsibilities, and evolution.
For example:

Should a single feature do both reading and writing of a record of data?
How and where is a feature persisting data?
What does a feature do if it can't communicate with its dependencies?
Should a feature make itself available for triggering other features to execute?
Should a feature support an "undo" operation?
What does "transaction" mean for this feature?
Can this feature handle bulk data operations?
How will this feature be monitored, audited, and logged?
What is the performance requirements of a feature? How should it interact with caches?
What is the runtime characteristics? CPU or memory or network or persistence bound?
How well does this feature compose as a part of other features?
How should a feature handle duplicate messages? Should it mark it as responded? Should messages be required to be idempotent?
How should a feature perform under heavy load from requests? Should it maintain a message queue to protect it from spikes in traffic?
Should this feature be monitoring for data to pull or waiting for it to be pushed to it?
Which parts of a feature can be asynchronous and which must be synchronous?

Interface as a Language

A grammar is a set of rules for strings in a language which describe how to form strings that are valid according to a syntax. The meaning of the strings are defined by a semantics definition, which describes the processes a computer follows when executing a program in a language.
Wikipedia: Formal language
Wikipedia: Formal grammar
Wikipedia: Semantics (computer science)
A normal application is the set of concrete data types it manages and the verbs it defines.
In the context of application software interface design, the strings in a language can be the applicatino's domain types and operations it can perform related to them. Commands for the application can be encoded as primitives in the language, and instantiations of those command primitives can be a function whose argument is tge command's arguments.
It's said that a good language is one which its users can help to grow with new data types and commands. However, it's interesting to note that a user of an application can't unilaterally define new data nouns and verbs for the application to store, manage, and execute, but rather must ask the application maintainer to do this. Likewise in an application's software interface, it's not possible to define new nouns and verbs wholepiece. There's no reason, however, for the interface to not support defining new nouns and operations which are simple compositions of the application's primitives. Even if our language doesn't explicitly support it, people can reflect our interface language into a general purpose language, then grow their reflection of our language.
It's common for programming languages to be used maliciously by visitors to damage the language's interpreting system. Damage can be mitigated or prevented by using a restrictive, simple language. In addition, by using a simple type system, invalid requests can be rejected before the application attempts to execute it.
Encoding an Interface Language

How should our language be encoded? As a string of characters, like Lisp or SQL? Or as a set of remote functions?
Encoding it as a string of characters would make it quite flexible.
From user point of view:

A user can bundle multiple commands into a single network request.
The correct types of arguments can be included directly in the language library's source code, which is readable and guaranteed to be correct, whereas manually-written interface docs are often incorrect.
It's easy to craft a command offline, as the language could be locally type-checked.
Local type-checking helps to navigate breaking changes in command definitions; the application can define the latest interface types at a stable DNS address.
It's not complicated to communicate with the interface from any general purpose programming language.

From application point of view:

It's easy to host this on multiple interface mediums, such as HTTP, a streaming protocol, or an in-process function call in the same process as the public interface server.
It supports a bulk operations by supporting a request containing a list of operations.
It's possible to optimize a request of multiple, similar operations.

There are downsides to everything, of course. Some downsides here:
From user point of view:

Need to learn a new language.

From application point of view:

Need to define types in two places -- the public interface language and the language interpreter in the application.

Example Interface Language

Let's consider what we would like an example program in a application interface language to look like. I'll use the Dhall language, because it seems to fit the requirements.
-- We'll need to declare some words which help us describe our domain
--   and the words and phrases we will want to say.
-- First the nouns:

let RegisterAccountArg = { username = "JohnDoe" } in

-- Then some verbs:
let Commands = List Command in
let Command = < RegisterAccount : RegisterAccountArg | ResetPassword : ResetPasswordArg | ... > in

-- And then write a sentence:
[ { username = "JohnDoe" } : RegisterAccount 
, { username = "JohnDoe" } : ResetPassword
]

-- Then, the application interface would receive and interpret this sentence.
Feature Interface Design

To ensure a feature's interface never breaks active clients, it should plan to never remove arguments.
What does this mean when we are creating and modifying a feature's interface?
data FeatureArgs = FeatureArgs ArgA
-- can be changed to
data FeatureArgs = FeatureArgs ArgA ArgB
-- can be changed to
data FeatureArgs = FeatureArgs ArgA ArgB | FeatureArgs2 ArgA ArgC
-- can be changed to
data FeatureArgs = FeatureArgs ArgA (Either ArgB ArgC)
-- can *not* be changed to
data FeatureArgs = FeatureArgs ArgA ArgC -- removed ArgB
However, is data FeatureArgs = FeatureArgs ArgA ArgB | FeatureArgs2 ArgA ArgC really the same as data FeatureArgs = FeatureArgs ArgA (Either ArgB ArgC)? If it describes the abstract message, it holds the same meaning, but if it describes the concrete structure of the serialized message, it is not necessarily the same. Why could they be different? It depends entirely on how the serializer handles a coproduct/union type. If serializing to JSON, for example, it could serialize as { "tag": "FeatureArgs2", "value": [ "ArgA", "ArgC" ] } or [ "ArgA", { "tag": "Right", "value": "ArgC" } ].
It's desireable to reduce the problems with coproduct/union typed values. The most flexible solution is to say that the two data types discussed above are not the same when serialized using a coproduct codec like used above.  Another solution is to encode the coproduct as a product in the serializer and enforce the coproduct's "OR" idea in the deserializer. In my experience, an interface message's contents is most commonly weighted with products, not coproducts, so this latter solution seems to make sense. In addition, after considering this constraint it seems quite reasonable and produces some desireable aspects, so we'll presume we use this and continue the exploration.
Stable, Evolvable Message Format

To ensure a client and an interface can always communicate, they must use a shared format. Examples of shared message formats:

JSON-formatted bytestring (out-of-band schema, text in UTF-8 encoding)
Protocol Buffer-formatted bytestring (out-of-band schema, custom binary format)
Avro-formatted bytestring (in-band schema, custom binary format or JSON)
Cap'n Proto-formatted bytestring (out-of-band schema, custom binary format)
CSV-formatted bytestring (no schema, any text encoding)

The responsibility of this data type is to be a stable message type for clients to use to communicate with an application. If working with a flexible data type is desired, it's recommended to design and use a second, internal data type which is similar to the interface message type.
A shared message type can evolve in a backwards and forwards-compatible way by following some rules, inspired by the Protocol Buffers "Updating A Message Type" guide:

Use associative maps from field name to field value, as it can be extended and something like field names are necessary to support multiple values of the same type in a single message.
Don't change any field names.
To support backwards-compatible field renaming, consider using incrementing numbers as field names. The number-name mapping can be included in the API documentation.
Only support a few primitive types and possibly other associative maps as values.
It may be desireable to support defining new types which are a composite of primitive and custom types.
Don't require any fields. Message validation and handling best belongs to an application, rather than a messaging protocol.
If removing a field, add the field to a list of retired field names and don't use it again.
Consider supporting extensions to a base message type. This enables having one sender send a common subset of fields and another sender sending an addition number of fields for a more special invocation.
It's possible to change a field into a union type, if the encoding of a union type supports it. An encoding may embed its fields inside a parent message.

Feature Implementation

Stable Feature Implementation

As a client of an application interface, it's desireable that a program/command doesn't break. Many applications have changing requirements of features, such as changing inputs, input validation, feature behavior, and resulting events. Is it possible to design a feature that doesn't break?
Consider the 1 equals 1 program. Will the owner of this language ever decide that the result isn't 2? I hope not. Why? Because I presume the program obeys the laws of common integer arithmetic.
Consider the ApplyForAccount { username: "JohnDoe" }. Will the owner of this language ever decide the result isn't a successful account creation? Sure -- perhaps the owner decides that an account's username must be formatted like an email address, "JohnDoe@somehost.com".
Why is it so hard to design a real world application's interface which promises to be forwards compatible? Comparing it to integer arithmetic, is it because a real world application doesn't have "laws" like math areas have? Not quite, because we could invent some laws for the ApplyForAccount command, like GetAccountDetails . ApplyForAccount should always be return the argument. No, I think the reason it's so hard is that an application's "laws"/rules aren't well-defined up-front, unlike integer arithmetic, which has been studied and refined for centuries.
We can, however, mitigate the damage caused by a application's changing rules. Rather than having one big feature which is liable to be adjusted by many interested owners, can we break it into many smaller features, each of which is less likely to need to change? If a person wants the end result of the workflow, does it really help to break the requirements of that workflow into its "more stable" components? It would be useful to a person who wants to execute one step in isolation, such as email ownership validation, which is possible. This indicates that a workflow's independently useful features should be independently available. Perhaps the thing to note, then, is that a workflow is liable to break, while the features it consists of can be more stable.
So, it seems that an application will unavoidably need to break the interface or behavior of its features or workflows. Rather than simply changing the implementation of workflow, can we make a new workflow and support using the old workflow? If the change is to prevent bad actors/data or close a security hole, for example, then it's undesireable to leave a version of a workflow available which has those problems. I can't think of a case in which a version of a feature or workflow we broke should be left available. So, we must break our workflows and features multiple times until we find no more reasons to change them, breaking them.
However, we can mitigate the pain we give to people using our features. We can make it easy for them to upgrade to the new version of the feature when we break it...
Feature Dependencies

A feature should be explicitly decoupled from other peer features.
To simplify testing a feature, implementations of its feature dependencies, like other effects, should be explicitly provided to the feature.
An example benefit is it becomes relatively simple to change a dependency's implementation from an in-memory function call to a network function call.
-- Injected as function arguments (or record of arguments).
someFeature ::
  (String -> m (List String)) -- depFeatureA
  -> (String -> m Integer) -- depFeatureB
  -> ArgA
  -> ArgB
  -> Effect FeatureResult

-- Provided by a reader monad.
someFeature ::
  ArgA
  -> ArgB
  -> ReaderT ({depFeatureA :: String -> m (List String), depFeatureB :: String -> m Integer}) Eff FeatureResult

-- Maybe provided by environment via extensible effects or coeffects?
someFeature ::
  ArgA
  -> ArgB
  -> Run (depFeatureA :: String -> m (List String), depFeatureB :: String -> m Integer) FeatureResult