Skip to content

Instantly share code, notes, and snippets.

@mietek
Last active August 29, 2015 14:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save mietek/7ec0f92bec260454d7aa to your computer and use it in GitHub Desktop.
Save mietek/7ec0f92bec260454d7aa to your computer and use it in GitHub Desktop.
Strongly-normalising agents

Strongly-normalising agents

Miëtek Bak

This is a draft of a type-theoretic formalism for self-modifying agents, written as a follow-up to “On the Löbian obstacle”, in response to work performed by the Machine Intelligence Research Institute.

Let L denote a dependently-typed λ-calculus that does not admit general recursion, and which has the strong normalisation property. This means each term of L is normalising, and each reduction sequence terminates in a normal form.

Consequently, typechecking for L is decidable.

Additionally, it follows from the Curry-Howard isomorphism that the language L corresponds to the consistent logic L.

1. Knowledge

In which we choose to say that an agent knows a proposition to be provable if-and-only-if the agent knows a proof of the proposition. We save the notions of ‘trust’, ‘belief’, ‘actions’, and ‘goals’ for use at higher reasoning levels.

Let An represent an agent that is the n-th successor of the agent A0.

Let T(An) denote a set of terms of L. This represents the set of propositions which An knows to be provable. We will say An has accepted each proposition in T(An).

Note that by Curry-Howard, each term x : X of the language L corresponds to the proof x of the proposition X in the logic L.

Assume An is capable of self-modification by spawning a successor An+1 such that T(An) ⊆ T(An+1), and replacing itself with the successor.

Additionally, assume An is capable of generating terms of L, sending terms of L as messages, and receiving messages that may or may not be terms of L.

What about propositions which the agent knows to be not provable, or propositions which the agent has neither accepted or rejected? See section 2.

1.1 Monotonicity

In which we show that an agent will never reject a proposition which it has already accepted, in the spirit of the 1st Yudkowsky-Herreshoff desideratum.

Assume an agent An, its successor An+1, and a term x : X of L such that x : X ∈ T(An). This means An has accepted X.

By definition, x : X ∈ T(An+1). Therefore, trivially, An+1 has accepted X.

1.2 Coherence

In which we show that an agent is always capable of accepting any proposition which it knows another agent has already accepted, in the spirit of the 2nd desideratum.

Assume agents An, Bk and terms x : X, y : Y of L such that x : X ∈ T(An), where X corresponds to y : Y ∈ T(Bk). This means Bn has accepted Y, and An has accepted the proposition that y : Y ∈ T(Bk).

There are two possibilities:

  1. If y : Y ∈ T(An), then by definition, An has accepted Y.

  2. Otherwise, if y : Y ∉ T(An), then An is capable of spawning a successor An+1 such that T(An) ∪ {y : Y} = T(An+1). By definition, An+1 will have accepted Y.

The careful reader will note that this implies our language is capable of treating types as first-class entities, which is true, as our language is dependently-typed.

1.3 Uniformity

In which we show that an agent is always capable of accepting any proposition for which it knows a proof, whether the proof was generated by the agent itself or not, in the spirit of the 4th desideratum.

Assume an agent An and a term x : X of L such that x : X ∉ T(An).

Consider two possibilities:

  1. An has generated x : X.

  2. An has received a message that is x : X.

Clearly, in each case An is capable of spawning a successor An+1 such that T(An) ∪ {x : X} = T(An+1). By definition, An+1 will have accepted X.

What about the possibility that the agent has received a message that is not a term of our language? See section 2.

2. Values

In which we require an agent to continue upholding its original values.

Let us extend the definition of an agent:

Let P(An) denote a term of L. This represents the program controlling the agent An.

Let V(An) denote a function from terms of L to natural numbers. We will say V(An) represents the values of agent An.

Assume An is capable of self-modification by spawning a successor An+1 such that V(P(An)) ≤ V(P(An+1)), where, crucially, V = V(An).

2.1 Reflection

In which we say that an agent is always capable of reflecting upon its values, or the values of any other agent, in the spirit of the 3rd desideratum.


TODO: Explain the ‘Type : Type’ problem. Introduce cumulative universe hierarchies.


References

  • Thorsten Altenkirch, Conor McBride, James McKinna, “Why dependent types matter”, in “Proceedings of the 33rd ACM SIGPLAN-SIGACT symposium on principles of programming languages”, p. 1, 2006
  • Robert Harper, Robert Pollack, “Type checking with universes”, in “Theoretical computer science”, vol. 89, pp. 107-136, 1991
  • Per Martin-Löf, “On the formalization and proof theoretical analysis of intuitionistic mathematics”, in H.E. Rose, J.C. Shepherdson, ed., “Logic colloquium ’73”, vol. 80, “Studies in logic and the foundations of mathematics”, pp. 73–118, 1975
  • David Turner, “Church’s thesis and functional programming”, in A. Olszewski, ed., “Church’s thesis after 70 years”, 2006
  • Philip Wadler, “Propositions as types”, submitted for publication, 2014
  • Eliezer Yudkowsky, Marcello Herreshoff, “Tiling agents for self-modifying AI, and the Löbian obstacle”, draft, 2013

Yudkowsky-Herreshoff desiderata

Quoted verbatim from Yudkowsky-Herreshoff, p.12:

1. Indefinitely tiling trust

For any system T it should be possible to characterize an infinitely descending sequence of agents constructing agents (equivalently an indefinitely self-modifying agent) whose level of trusted mathematics never drops below T.

2. Reflectively coherent quantified belief

The agent will not encounter any context in which it believes ∀x : □agent⌈φ(x)⌉ yet cannot bring itself to believe ∀x : φ(x).

3. Reflective trust

An agent reflecting on its beliefs about the program P composing itself, should not find that it distrusts P’s actions or that it rejects P as an acceptable program; the agent’s beliefs about its own construction should seem to license the beliefs it possesses, or at least license the performance of the actions which P has selected.

4. Naturalistic trust

If an agent believes that an environmental system isomorphic to one of its own subroutines has yielded an output, this belief should license the same sort of actions as would be licensed by the corresponding output by the corresponding internal subroutine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment