Skip to content

Instantly share code, notes, and snippets.

@brando90
brando90 / gist:c55c74e840d42c952d4aec7b74e0be6c
Created August 14, 2024 19:01
vllm_python_potential_way_to_resolve_issues.txt
# # 'torch==2.1.2', # 2.2 not supported due to vllm see: https://github.com/vllm-project/vllm/issues/2747
...
# 'transformers>=4.40',
# 'accelerate==0.29.2',
...
# 'datasets==2.14.7',
# 'evaluate==0.4.1',
# 'bitsandbytes== 0.43.0',
# 'einops',
# 'flash-attn>=2.5.8',
@brando90
brando90 / maf_dual_backtranslation_self_improving.md
Created June 7, 2024 17:53
MAF dual backtranslation self-improving loop

Simplified version of Dually Ground BackTranslation for AutoFormalization:

def train_to_af_for_maf(mdl : causal_lm,
                        formal_data_set, # e.g., ITP lib like mathlib
                        informal_data_set,  # e.g., time-tested maths textbook e.g., Rudin, CLRS.
                        ):
    for (nl, fl*) in formal_data_set; for (nl*, fl) in informal_data_set;
        # -- Learn to Formalize: nl_i->fl* from fl* -> [nl_i]_i -> fl*
        [nl_i]_i := mdl("informalize " + fl*, sampling=top_p, num_out=k)  # noise is good for robustness!
from datasets import load_dataset
from transformers import GPT2Tokenizer, GPT2LMHeadModel, Trainer, TrainingArguments, DataCollatorForLanguageModeling
# Load dataset from a JSON file
data_files = {"train": "path/to/your/train.json", "test": "path/to/your/test.json"}
dataset = load_dataset("json", data_files=data_files)
# Load pre-trained GPT-2 tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
# -- HELM prompt, 8 shot, CoT? ref: https://storage.googleapis.com/crfm-helm-public/lite/benchmark_output/runs/v1.0.0/math:subject=algebra,level=1,use_official_examples=False,use_chain_of_thought=True,model=01-ai_yi-34b/scenario_state.json, https://crfm.stanford.edu/helm/lite/latest/#/runs/math:subject=algebra,level=1,use_official_examples=False,use_chain_of_thought=True,model=01-ai_yi-34b
HELM_MATH_PROMPT: str = (
"""Given a mathematics problem, determine the answer. Simplify your answer as much as possible.###
Problem: Let $r=3^s-s$ and $s=2^n+1$. What is the value of $r$ when $n=2$?
Answer: First substitute $n=2$ into the expression for $s$ to find $s=2^2+1=5$. Then substitute $s=5$ into the expression for $r$ to find $r=3^5-5=243-5=\\boxed{238}.###
Problem: If $x^{2y}= 4$ and $x = 4$, what is the value of $y$? Express your answer as a common fraction.
Answer: Plugging $x = 4$ into the first equation, we get $4^{2y} = 4^1 \\Rightarrow 2y = 1 \\Rightarrow y = \\boxed{\\frac{1}{2}}.###
Problem: If $y = \\dis
@brando90
brando90 / minerva_prompt_cot.py
Created May 9, 2024 21:31
minerva_prompt_cot.py
# -- Prompt minerva MATH 3 - better minerva + cot/scratch_pad
# https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/minerva_math/utils.py#L22
H_MATH_MINERVA_PROMPT_TEMPLATE_3_COT = (
r"""Problem:
Find the domain of the expression $\frac{\sqrt{x-2}}{\sqrt{5-x}}$.}
Solution:
Let's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \ge 0$, so $x\ge2$, and $5 - x \ge 0$, so $x \le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $\boxed{[2,5)}$.
Problem:
@brando90
brando90 / gist:769e71f376b4a9a78ad63c6989be6a52
Last active May 9, 2024 05:11
initial_reprover_in_pypentrograph.py
import math
from pantograph.server import Server, ServerError
from pantograph.expr import GoalState, TacticHave, TacticCalc, Tactic
from loguru import logger
from dataclasses import dataclass, field
from typing import Optional, List, Tuple
@dataclass(frozen=True)
class SearchResult:
"""The result of attempting to prove a theorem."""
@brando90
brando90 / vectoring_lowering_uncertainty_research_planning.md
Created April 25, 2024 17:45
vectoring_lowering_uncertainty_research_planning.md

---- Prompts for Assessing & Rubber Ducking discussions on your research plan according to Vectoring ----

references: - https://web.stanford.edu/class/cs197/slides/04-vectoring.pdf

We should have three prompts. One tackling a different way to mitigate risks and uncertainties in projects with uncertainties -- especially in computer science and machine learning research. They are:

  • Prompt 1: attempting to guess/extrapolate/identifying unknown unknowns -- to reduce risks we are unaware of.
  • Prompt 2: asses my vectoring plan with assumptions for effective tackling of the unknown i.e., research
@inproceedings{langley00,
author = {P. Langley},
title = {Crafting Papers on Machine Learning},
year = {2000},
pages = {1207--1216},
editor = {Pat Langley},
booktitle = {Proceedings of the 17th International Conference
on Machine Learning (ICML 2000)},
address = {Stanford, CA},
publisher = {Morgan Kaufmann}
@brando90
brando90 / gist:0323f9e2a659580978ec5daa72f41333
Created March 2, 2024 23:50
reciprocal_has_unbounded_limit.lean
/-
theorem: lim_{x -> c+} f(x) = +infinity
x + infinit = +infinity
lim_{x -> c} f(x) = L
∀ ε > 0, ∃ δ > 0, 0 < |x - c| < δ → 0 < |f(x) - L| < ε
L = + infinity
consider some ε > 0
@brando90
brando90 / gist:0786c13414a510d4652351a2d96b51fc
Created March 2, 2024 20:52
vertical_asymptote_of_1_over_x.lean
/-
-/
import Mathlib.Data.Real.Basic
-- define 1/x (reciprical) for reals
noncomputable def f (x : ℝ ): ℝ := x⁻¹
#check f
-- unit test that f 1 = 1, f 2 = 1/2
theorem test_f1 : f 1 = 1 := by simp[f]