Everything one needs to do to build a Lean (or Coq) version of HOList which can be used with DeepHOL. (This is assuming the current DeepHOL/HOList API naturally extends to Lean. It still needs to be checked that it does. If not, it might be needed to build a new Lean-specific system.)
- API:
- Terms:
- Figure out how terms and Prop terms (i.e. theorems) are stored internally in Lean.
- Figure out the natural encode of them into S-expressions.
- Tactics:
- Enumerate a list of all the most common and useful tactics
- Figure out what type of parameters they each need (terms, theorems, lists of theorems, flags, etc.)
- Find a convenient way to represent these tactics in the S-expression format.
- It is ok to not faithfully mirror the tactic language exactly. For example, if a tactic has an optional parameter, that can be ignored if it isn’t that important.
- Terms:
- Communication:
- Information
- Figure out how to communicate the information transferred in the commands ApplyTactic, RegisterProof, VerifyTheorem (found in the protobuf spec) between Lean and an outside process.
- I’ve already mapped out the interface in this jupyter notebook
- The interface doesn’t have to be very fast, so maybe try the simplest thing first. For example, I think there are Python bindings to Lean.
- gRPC server
- Build a gRPC server (in Python, C++, Java, or Scala) which does the following:
- Reads in the protobuf command from the client.
- Interprets the s-expressions as needed.
- Sends the appropriate information and commands to Lean, getting a response.
- Converts this response to an appropriate s-expression.
- Sends this s-expression to the client via gRPC/protobuf.
- Build a gRPC server (in Python, C++, Java, or Scala) which does the following:
- Information
- Training data:
- Get a list of all Lean theorems in mathlib. Maybe also in other projects as well. (I think this is stored internally in lean in the “environment”.)
- We also need an order of dependencies. Either a flat list of theorems in order they are loaded or a dependency graph. The point is that any premises come before any theorem which use that dependency.
- Convert the theorems to s-expressions.
- Store the theorem list in the appropriate protobufs format.
- WE DON’T NEED PROOFS! It would be better to have them, but there are many reasons why proofs would be difficult to get from Lean. (
have
tactics, term mode, no current support for proof recording, etc) Without proofs, it is still possible to do reinforcement learning.
- WE DON’T NEED PROOFS! It would be better to have them, but there are many reasons why proofs would be difficult to get from Lean. (
@brando90 The s-expressions are discussed here https://leanprover.zulipchat.com/#narrow/stream/219941-Machine-Learning.20for.20Theorem.20Proving/topic/HOList . I don't know yet how to check. I'm asking around more on Zulip.