mietek/strongly-normalising-agents.md

## strongly-normalising-agents.md

      
    Raw
  

              strongly-normalising-agents.md
            
          
    Strongly-normalising agents

Miëtek Bak
This is a draft of a type-theoretic formalism for self-modifying agents, written as a follow-up to “On the Löbian obstacle”, in response to work performed by the Machine Intelligence Research Institute.
Let L denote a dependently-typed λ-calculus that does not admit general recursion, and which has the strong normalisation property.  This means each term of L is normalising, and each reduction sequence terminates in a normal form.
Consequently, typechecking for L is decidable.
Additionally, it follows from the Curry-Howard isomorphism that the language L corresponds to the consistent logic L.
1. Knowledge

In which we choose to say that an agent knows a proposition to be provable if-and-only-if the agent knows a proof of the proposition.  We save the notions of ‘trust’, ‘belief’, ‘actions’, and ‘goals’ for use at higher reasoning levels.
Let Aⁿ represent an agent that is the n-th successor of the agent A⁰.
Let T(Aⁿ) denote a set of terms of L.  This represents the set of propositions which Aⁿ knows to be provable.  We will say Aⁿ has accepted each proposition in T(Aⁿ).
Note that by Curry-Howard, each term x : X of the language L corresponds to the proof x of the proposition X in the logic L.
Assume Aⁿ is capable of self-modification by spawning a successor Aⁿ⁺¹ such that T(Aⁿ) ⊆ T(Aⁿ⁺¹), and replacing itself with the successor.
Additionally, assume Aⁿ is capable of generating terms of L, sending terms of L as messages, and receiving messages that may or may not be terms of L.
What about propositions which the agent knows to be not provable, or propositions which the agent has neither accepted or rejected?  See section 2.
1.1 Monotonicity

In which we show that an agent will never reject a proposition which it has already accepted, in the spirit of the 1st Yudkowsky-Herreshoff desideratum.
Assume an agent Aⁿ, its successor Aⁿ⁺¹, and a term x : X of L such that x : X ∈ T(Aⁿ).  This means Aⁿ has accepted X.
By definition, x : X ∈ T(Aⁿ⁺¹).  Therefore, trivially, Aⁿ⁺¹ has accepted X.
1.2 Coherence

In which we show that an agent is always capable of accepting any proposition which it knows another agent has already accepted, in the spirit of the 2nd desideratum.
Assume agents Aⁿ, B^k and terms x : X, y : Y of L such that x : X ∈ T(Aⁿ), where X corresponds to y : Y ∈ T(B^k).  This means Bⁿ has accepted Y, and Aⁿ has accepted the proposition that y : Y ∈ T(B^k).
There are two possibilities:


If y : Y ∈ T(Aⁿ), then by definition, Aⁿ has accepted Y.


Otherwise, if y : Y ∉ T(Aⁿ), then Aⁿ is capable of spawning a successor Aⁿ⁺¹ such that T(Aⁿ) ∪ {y : Y} = T(Aⁿ⁺¹).  By definition, Aⁿ⁺¹ will have accepted Y.


The careful reader will note that this implies our language is capable of treating types as first-class entities, which is true, as our language is dependently-typed.
1.3 Uniformity

In which we show that an agent is always capable of accepting any proposition for which it knows a proof, whether the proof was generated by the agent itself or not, in the spirit of the 4th desideratum.
Assume an agent Aⁿ and a term x : X of L such that x : X ∉ T(Aⁿ).
Consider two possibilities:


Aⁿ has generated x : X.


Aⁿ has received a message that is x : X.


Clearly, in each case Aⁿ is capable of spawning a successor Aⁿ⁺¹ such that T(Aⁿ) ∪ {x : X} = T(Aⁿ⁺¹).  By definition, Aⁿ⁺¹ will have accepted X.
What about the possibility that the agent has received a message that is not a term of our language?  See section 2.
2. Values

In which we require an agent to continue upholding its original values.
Let us extend the definition of an agent:
Let P(Aⁿ) denote a term of L.  This represents the program controlling the agent Aⁿ.
Let V(Aⁿ) denote a function from terms of L to natural numbers.  We will say V(Aⁿ) represents the values of agent Aⁿ.
Assume Aⁿ is capable of self-modification by spawning a successor Aⁿ⁺¹ such that V^†(P(Aⁿ)) ≤ V^†(P(Aⁿ⁺¹)), where, crucially, V^† = V(Aⁿ).
2.1 Reflection

In which we say that an agent is always capable of reflecting upon its values, or the values of any other agent, in the spirit of the 3rd desideratum.

TODO: Explain the ‘Type : Type’ problem.  Introduce cumulative universe hierarchies.

References


Thorsten Altenkirch, Conor McBride, James McKinna, “Why dependent types matter”, in “Proceedings of the 33rd ACM SIGPLAN-SIGACT symposium on principles of programming languages”, p. 1, 2006
Robert Harper, Robert Pollack, “Type checking with universes”, in “Theoretical computer science”, vol. 89, pp. 107-136, 1991
Per Martin-Löf, “On the formalization and proof theoretical analysis of intuitionistic mathematics”, in H.E. Rose, J.C. Shepherdson, ed., “Logic colloquium ’73”, vol. 80, “Studies in logic and the foundations of mathematics”, pp. 73–118, 1975
David Turner, “Church’s thesis and functional programming”, in A. Olszewski, ed., “Church’s thesis after 70 years”, 2006
Philip Wadler, “Propositions as types”, submitted for publication, 2014
Eliezer Yudkowsky, Marcello Herreshoff, “Tiling agents for self-modifying AI, and the Löbian obstacle”, draft, 2013

Yudkowsky-Herreshoff desiderata

Quoted verbatim from Yudkowsky-Herreshoff, p.12:
1. Indefinitely tiling trust


For any system T it should be possible to characterize an infinitely descending sequence of agents constructing agents (equivalently an indefinitely self-modifying agent) whose level of trusted mathematics never drops below T.

2. Reflectively coherent quantified belief


The agent will not encounter any context in which it believes ∀x : □_agent⌈φ(x)⌉ yet cannot bring itself to believe ∀x : φ(x).

3. Reflective trust


An agent reflecting on its beliefs about the program P composing itself, should not find that it distrusts P’s actions or that it rejects P as an acceptable program; the agent’s beliefs about its own construction should seem to license the beliefs it possesses, or at least license the performance of the actions which P has selected.

4. Naturalistic trust


If an agent believes that an environmental system isomorphic to one of its own subroutines has yielded an output, this belief should license the same sort of actions as would be licensed by the corresponding output by the corresponding internal subroutine.