Skip to content

Instantly share code, notes, and snippets.

@shagunsodhani
Created July 16, 2016 10:09
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save shagunsodhani/d8387256f2bb08f39509600f9d7db498 to your computer and use it in GitHub Desktop.
Save shagunsodhani/d8387256f2bb08f39509600f9d7db498 to your computer and use it in GitHub Desktop.
Notes for Deep Math paper

Deep Math: Deep Sequence Models for Premise Selection

Introduction

  • Automated Theorem Proving (ATP) - Attempting to prove mathematical theorems automatically.
  • Bottlenecks in ATP:
    • Autoformalization - Semantic or formal parsing of informal proofs.
    • Automated Reasoning - Reasoning about already formalised proofs.
  • Paper evaluates the effectiveness of neural sequence models for premise selection (related to automated reasoning) without using hand engineered features.
  • Link to the paper

Premise Selection

  • Given a large set of premises P, an ATP system A with given resource limits, and a new conjecture C, predict those premises from P that will most likely lead to an automatically constructed proof of C by A

Dataset

  • Mizar Mathematical Library (MML) used as the formal corpus.
  • The premise length varies from 5 to 84299 characters and over 60% if the words occur fewer than 10 times (rare word problem).

Approach

  • The model predicts the probability that a given axiom is useful for proving a given conjecture.
  • Conjecture and axiom sequences are separately embedded into fixed length real vectors, then concatenated and passed to a third network with few fully connected layers and logistic loss.
  • The two embedding networks and the joint predictor path are trained jointly.

Stage 1: Character-level Models

  • Treat premises on character level using an 80-dimensional one hot encoding vector.
  • Architectures for embedding:
    • pure recurrent LSTM and GRU Network
    • CNN (with max pooling)
    • Recurrent-convolutional network that shortens the sequence using convolutional layer before feeding it to LSTM.

Stage 2: Word-level Models

  • MML dataset contains both implicit and explicit definitions.
  • To avoid manually encoding the implicit definitions, the entire statement defining an identifier is embedded and the definition embeddings are used as word level embeddings.
  • This is better than recursively expanding and embedding the word definition as the definition chains can be very deep.
  • Once word level embeddings are obtained, the architecture from Character-level models can be reused.

Experiments

Metrics

  • For each conjecture, the model ranks the possible premises.
  • Primary metric is the number of conjectures proved from top-k premises.
  • Average Max Relative Rank (AMMR) is more sophisticated measure based on the motivation that conjectures are easier to prove if all their dependencies occur earlier in ranking.
  • Since it is very costly to rank all axioms for a conjecture, an approximation is made and a fixed number of random false dependencies are used for evaluating AMMR.

Network Training

  • Asynchronous distributed stochastic gradient descent using Adam optimizer.
  • Clipped vs Non-clipped Gradients.
  • Max Sequence length of 2048 for character-level sequences and 500 for word-level sequences.

Results

  • Best Selection Pipeline - Stage 1 character-level CNN which produces word-level embeddings for the next stage.
  • Best models use simple CNNs followed by max pooling and two-stage definition-based def-CNN outperforms naive word-CNN (where word embeddings are learnt in a single pass).
@Autoformalization
Copy link

Thank you for your interest to the concept "autoformalization" (some times spelled also as "Autoformalisation".

Just thought that you might be interested how it was described when I coined the term about 30 years ago ...
http://www.netvalley.com/cgi-bin/library/autoform_en.pl

@shagunsodhani
Copy link
Author

Ohh wow thanks @Autoformalization for sharing the link. I am sorry for replying so late - gists do not notify the user when a new comment is made 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment