haf/gist:4423968

## gistfile1.txt
Inconsistency Roboustness in Software Systems of the Future

It is a known fundamental problem to the people who do programming
for a living, that languages do not match the nature of the universe in which
those languages compute. Computer languages are based a rather romantic notion
of sequential processing that is not in line with how the real world operates;
the real world operates in a continuoum of space-time with multiple concurrent
threads of reality always ongoing, being acted upon by actors.

As such we must strive to model our information systems and programming
languages in the same shape.

Writing programs is dealing with information and information processing; algorithms
operating on data. It is the case, however, that the data can come from external parties
over network or over direct memory access with interrupts or some other means of
out-of-bound delivery, non-present at the start of the computation.

It means that not only do we as programmers, using a programming language need to
be able to handle the concurrent events of reality, but we also need a way to
reason about the state, which equals the data that our algorithms process, as
that data is being concurrently updated.

If programming language is not explicit with what time is, it will lead its
programmers into the pit of despair, because the users won't be able to
reason about events from the outside.

Yet, most programming languages of today don't let programmers readily reason
about how time passes by, nor about what data other actors in the systems
interacted with have seen or have created. And when hardware fails, there are
vague presumptions about having atomicity and consistency as in ACID on
that part of the system: presumptions that are hard to test and reason about.

Those of us who are language designers, software architects, framework builders
and plain old programmers need to provide ways to reason about the invariants of
out systems as they change over time, and therefore need a way to know when points
of known values occur. But there is no programming language out there that does it.

Instead the knowledge is embedded in the brains of whoever is architect or
lead developer at the moment, subject to office politics and plain old
human mistakes.

The actor model allows supervisor trees, actor linking and handling environmental
chaos (failing harddrives, failing network cards etc), software transactional
memory allows us to reason about the program state when an actor or a tree of
actors fail and reliance on fsync to disk allows us to reason about transaction
logs in crash-only systems.

I posit that we need a few features not already seen in programming languages:

 * A compare operator that acts on data's temporal nature, as in "given datas a, b;
   a < b if b was based on the data in a" - similar to what interval tree clocks
   can give us
 * A feature that facilitates a logical fiber of computation that will only let out
   time-stampted data
 * The "regular actor framework" with garantueed devlivery on a message level,
   supervision trees, fibers and context switching, software transactional memory,
   'send my last will (containing this data)'
 * A core library that gives actors/objects a strong convergent-data-type flavor
   as seen in Bloom/Bud
 * A core library that gives us strong insight into the execution context with
   metrics; timers, counters, gauges, histograms in a distributed setting
 * A core library that gives us a way of expressing two "sorts" of monads;
   explicit monads similar to the Haskell monads that allow the compiler to reason
   with side effects and do magic -- and -- an implicit, ML-style IO monad that
   lets the programmers easily step out-side to do some side effects (logging,
   metrics, plugin-architectures and p/invoke/ffi are obvious examples of this)
   -- but giving the power to the programmer to switch between these modalities
   of writing code.
 * A profileable asynchronous language core alike F#'s async that lets the
   programmer run the software with a profiler attached that also automatically
   correlates all from before-trampolin (before async call is made) to after-
   trampoline (when IO-completion port/epoll/kqueue signals,
   or async exception happens), that works on the same explicit message/unit
   of work identity that would be in a message between processes.

The paper will discuss these features in depth and suggest how they might overlap
given examples from existing research and production systems and languages.

***

Happy new year 2013!
Henrik


***

Post Scriptum;
These are some of the ideas floating around inside my head; I'd like some feedback
-- if anyone would like me to continue down this path, I'll spend some time gathering
references and improving the abstract.
	Inconsistency Roboustness in Software Systems of the Future

	It is a known fundamental problem to the people who do programming
	for a living, that languages do not match the nature of the universe in which
	those languages compute. Computer languages are based a rather romantic notion
	of sequential processing that is not in line with how the real world operates;
	the real world operates in a continuoum of space-time with multiple concurrent
	threads of reality always ongoing, being acted upon by actors.

	As such we must strive to model our information systems and programming
	languages in the same shape.

	Writing programs is dealing with information and information processing; algorithms
	operating on data. It is the case, however, that the data can come from external parties
	over network or over direct memory access with interrupts or some other means of
	out-of-bound delivery, non-present at the start of the computation.

	It means that not only do we as programmers, using a programming language need to
	be able to handle the concurrent events of reality, but we also need a way to
	reason about the state, which equals the data that our algorithms process, as
	that data is being concurrently updated.

	If programming language is not explicit with what time is, it will lead its
	programmers into the pit of despair, because the users won't be able to
	reason about events from the outside.

	Yet, most programming languages of today don't let programmers readily reason
	about how time passes by, nor about what data other actors in the systems
	interacted with have seen or have created. And when hardware fails, there are
	vague presumptions about having atomicity and consistency as in ACID on
	that part of the system: presumptions that are hard to test and reason about.

	Those of us who are language designers, software architects, framework builders
	and plain old programmers need to provide ways to reason about the invariants of
	out systems as they change over time, and therefore need a way to know when points
	of known values occur. But there is no programming language out there that does it.

	Instead the knowledge is embedded in the brains of whoever is architect or
	lead developer at the moment, subject to office politics and plain old
	human mistakes.

	The actor model allows supervisor trees, actor linking and handling environmental
	chaos (failing harddrives, failing network cards etc), software transactional
	memory allows us to reason about the program state when an actor or a tree of
	actors fail and reliance on fsync to disk allows us to reason about transaction
	logs in crash-only systems.

	I posit that we need a few features not already seen in programming languages:

	* A compare operator that acts on data's temporal nature, as in "given datas a, b;
	a < b if b was based on the data in a" - similar to what interval tree clocks
	can give us
	* A feature that facilitates a logical fiber of computation that will only let out
	time-stampted data
	* The "regular actor framework" with garantueed devlivery on a message level,
	supervision trees, fibers and context switching, software transactional memory,
	'send my last will (containing this data)'
	* A core library that gives actors/objects a strong convergent-data-type flavor
	as seen in Bloom/Bud
	* A core library that gives us strong insight into the execution context with
	metrics; timers, counters, gauges, histograms in a distributed setting
	* A core library that gives us a way of expressing two "sorts" of monads;
	explicit monads similar to the Haskell monads that allow the compiler to reason
	with side effects and do magic -- and -- an implicit, ML-style IO monad that
	lets the programmers easily step out-side to do some side effects (logging,
	metrics, plugin-architectures and p/invoke/ffi are obvious examples of this)
	-- but giving the power to the programmer to switch between these modalities
	of writing code.
	* A profileable asynchronous language core alike F#'s async that lets the
	programmer run the software with a profiler attached that also automatically
	correlates all from before-trampolin (before async call is made) to after-
	trampoline (when IO-completion port/epoll/kqueue signals,
	or async exception happens), that works on the same explicit message/unit
	of work identity that would be in a message between processes.

	The paper will discuss these features in depth and suggest how they might overlap
	given examples from existing research and production systems and languages.

	***

	Happy new year 2013!
	Henrik


	***

	Post Scriptum;
	These are some of the ideas floating around inside my head; I'd like some feedback
	-- if anyone would like me to continue down this path, I'll spend some time gathering
	references and improving the abstract.