Skip to content

Instantly share code, notes, and snippets.

@jasonrute
Last active February 20, 2023 04:37
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save jasonrute/00109af2bdc0974d2e8e79faf26ba556 to your computer and use it in GitHub Desktop.
Save jasonrute/00109af2bdc0974d2e8e79faf26ba556 to your computer and use it in GitHub Desktop.

Everything one needs to do to build a Lean (or Coq) version of HOList which can be used with DeepHOL. (This is assuming the current DeepHOL/HOList API naturally extends to Lean. It still needs to be checked that it does. If not, it might be needed to build a new Lean-specific system.)

  • API:
    • Terms:
      • Figure out how terms and Prop terms (i.e. theorems) are stored internally in Lean.
      • Figure out the natural encode of them into S-expressions.
    • Tactics:
      • Enumerate a list of all the most common and useful tactics
      • Figure out what type of parameters they each need (terms, theorems, lists of theorems, flags, etc.)
      • Find a convenient way to represent these tactics in the S-expression format.
      • It is ok to not faithfully mirror the tactic language exactly. For example, if a tactic has an optional parameter, that can be ignored if it isn’t that important.
  • Communication:
    • Information
      • Figure out how to communicate the information transferred in the commands ApplyTactic, RegisterProof, VerifyTheorem (found in the protobuf spec) between Lean and an outside process.
      • I’ve already mapped out the interface in this jupyter notebook
      • The interface doesn’t have to be very fast, so maybe try the simplest thing first. For example, I think there are Python bindings to Lean.
    • gRPC server
      • Build a gRPC server (in Python, C++, Java, or Scala) which does the following:
        1. Reads in the protobuf command from the client.
        2. Interprets the s-expressions as needed.
        3. Sends the appropriate information and commands to Lean, getting a response.
        4. Converts this response to an appropriate s-expression.
        5. Sends this s-expression to the client via gRPC/protobuf.
  • Training data:
    • Get a list of all Lean theorems in mathlib. Maybe also in other projects as well. (I think this is stored internally in lean in the “environment”.)
    • We also need an order of dependencies. Either a flat list of theorems in order they are loaded or a dependency graph. The point is that any premises come before any theorem which use that dependency.
    • Convert the theorems to s-expressions.
    • Store the theorem list in the appropriate protobufs format.
      • WE DON’T NEED PROOFS! It would be better to have them, but there are many reasons why proofs would be difficult to get from Lean. (have tactics, term mode, no current support for proof recording, etc) Without proofs, it is still possible to do reinforcement learning.
@brando90
Copy link

This is assuming the current DeepHOL/HOList API naturally extends to Lean. It still needs to be checked that it does. If not, it might be needed to build a new Lean-specific system.

this seems like an important assumption. How do we do this check? Is it possible to check without lean4 existing yet?

@brando90
Copy link

brando90 commented Jan 12, 2020

I still don't understand why the formal methods community likes s-expressions so much. I find them annoying and very difficult to read (for humans at least). Isn't something like json possible? I know Emilio for Serapi/Coq was planning to switch to that but I never found out what happened in the end.

Is there something advantageous about s-expressions that I am missing? Isn't Json (like dictionaries) much better (e.g. there are names for things etc)?

@jasonrute
Copy link
Author

@brando90 The s-expressions are discussed here https://leanprover.zulipchat.com/#narrow/stream/219941-Machine-Learning.20for.20Theorem.20Proving/topic/HOList . I don't know yet how to check. I'm asking around more on Zulip.

@sloos
Copy link

sloos commented Jan 28, 2020

S-expressions are more of an implementation detail. They are easy to parse, so we can convert between formats quickly e.g. hol light terms, graph representations in tensorflow, images of graphs using python, etc. They are a little denser than json format, so maybe more memory efficient. Also, I grew up on scheme, so I find lots of parentheses comforting. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment