jasonrute/lean-holist-todo.md Secret

## lean-holist-todo.md

      
    Raw
  

              lean-holist-todo.md
            
          
    Everything one needs to do to build a Lean (or Coq) version of HOList which can be used with DeepHOL.  (This is assuming the current DeepHOL/HOList API naturally extends to Lean.  It still needs to be checked that it does.  If not, it might be needed to build a new Lean-specific system.)

API:

Terms:

Figure out how terms and Prop terms (i.e. theorems) are stored internally in Lean.
Figure out the natural encode of them into S-expressions.


Tactics:

Enumerate a list of all the most common and useful tactics
Figure out what type of parameters they each need (terms, theorems, lists of theorems, flags, etc.)
Find a convenient way to represent these tactics in the S-expression format.
It is ok to not faithfully mirror the tactic language exactly.  For example, if a tactic has an optional parameter, that can be ignored if it isn’t that important.


Communication:

Information

Figure out how to communicate the information transferred in the commands ApplyTactic, RegisterProof, VerifyTheorem (found in the protobuf spec) between Lean and an outside process.
I’ve already mapped out the interface in this jupyter notebook
The interface doesn’t have to be very fast, so maybe try the simplest thing first.  For example, I think there are Python bindings to Lean.


gRPC server

Build a gRPC server (in Python, C++, Java, or Scala) which does the following:

Reads in the protobuf command from the client.
Interprets the s-expressions as needed.
Sends the appropriate information and commands to Lean, getting a response.
Converts this response to an appropriate s-expression.
Sends this s-expression to the client via gRPC/protobuf.


Training data:

Get a list of all Lean theorems in mathlib.  Maybe also in other projects as well.  (I think this is stored internally in lean in the “environment”.)
We also need an order of dependencies.  Either a flat list of theorems in order they are loaded or a dependency graph.  The point is that any premises come before any theorem which use that dependency.
Convert the theorems to s-expressions.
Store the theorem list in the appropriate protobufs format.

WE DON’T NEED PROOFS!  It would be better to have them, but there are many reasons why proofs would be difficult to get from Lean.  (have tactics, term mode, no current support for proof recording, etc)   Without proofs, it is still possible to do reinforcement learning.