Skip to content

Instantly share code, notes, and snippets.

@envomp
envomp / copy-of-eulers_method.ipynb
Last active May 12, 2025 22:31
Copy of Eulers_Method.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import numpy as np
from torch.functional import F
import torch
import pandas as pd
def labels_to_one_hot(labels, num_classes):
one_hot = torch.zeros(num_classes, dtype=torch.float64, device=labels.device)
one_hot[labels] = 1
return one_hot
default llama(dim=3072, n_layers=28, n_heads=24, n_kv_heads=8, vocab_size=128256, multiple_of=256, ffn_dim_multiplier=1.0, max_seq_len=2048)
default RAdamScheduleFree(lr=1e-4, weight_decay=0.05, betas=(0.9, 0.98))
default AdamWScheduleFree(weight_decay=0.1, betas=(0.9, 0.98), lr=1e-4, warmup_steps=200)
| method (causal) | run | model | optimizer, (scheduler) | batch_size | gdnt_acc | duration | device | dtype | epochs | batches_per_epoch | vloss | correct_lp |
|-------------------------------|-----|---------------------------------------|------------------------|------------|----------|----------|--------|----------|--------|-------------------|--------|------------|
| lora on [wq, wk, wv] | 1 | llama | RAdamScheduleFree | 2 | 6 | 2h 30min | A100 | float32 | 5 | 420 | 0.2881 | 4955 |
| full finetune | 2 | llama | AdamW
Simple example of S4 in temporal logic:
Branching Time Temporal Logic (BTL): simple example using relation "is_accessible_from" and W could be all the possible states during Wednesday. (tree of all different things I do on Wednesday)
- Reflexive: is_accessible_from(eat, eat). If I'm eating, then i'm eating
- Transitive: if is_accessible_from(wake_up, going_to_school) and is_accessible_from(going_to_school, eat) then is_accessible_from(wake_up, eat)
- Connectedness: Some states can remain unreachable. The result is a tree of possible chains of actions (future)
Flash the paint picture
◻ - it is necessary that. Or for the current example, "in every possible future branch from the current moment"
Solve the following reasoning task where given known facts and rules on how to deduce new facts, conclude whether fact being queried can be deduced from known facts and rules. per rule, only one new fact is deduced.
Here are some examples:
1.
facts: ['95']
rules: [[['133'], '110'], [['86'], '146'], [['117', '110', '146'], '113'], [['110'], '117'], [['95'], '142'], [['0'], '133'], [['17'], '110'], [['133'], '86'], [['95', '0'], '86'], [['133', '86', '113'], '110'], [['142'], '17'], [['146'], '113'], [['113'], '0']]
query: 17
results in True
2.
rule_block = 200
deduction_separator = 201
rule_separator = 202
fact_block = 203
query_block = 204
preds_block = 205
end_of_turn = 206
end_of_text = 207
special_tokens = {1: 210, 0: 211}
pad = 208
Dataset:
<s> I exist </s>
<s> Not that I want to </s>
<s> I want food </s>
<s> It is not what I want </s>
[2, 3, 128000, 128002, 128009] [-100, -100, -100, 128002, 128009]
[0, 7, 128000, 128002, 128009] [-100, -100, -100, 128002, 128009]
[1, 7, 128000, 128002, 128009] [-100, -100, -100, 128002, 128009]
[2, 1, 128000, 128003, 128009] [-100, -100, -100, 128003, 128009]
[7, 6, 128000, 128002, 128009] [-100, -100, -100, 128002, 128009]
[0, 3, 128000, 128003, 128009] [-100, -100, -100, 128003, 128009]
[4, 4, 128000, 128002, 128009] [-100, -100, -100, 128002, 128009]
[4, 2, 128000, 128002, 128009] [-100, -100, -100, 128002, 128009]
[3, 7, 128000, 128003, 128009] [-100, -100, -100, 128003, 128009]
[2, 5, 128000, 128002, 128009] [-100, -100, -100, 128002, 128009]
######################################### original :
Solve the following match question by detailing every reasoning step.
Question:
Let's imagine a population of 100 humans. At the start of every epoch, every human gives birth to a child. We have to murder X of the children before they grow up to humans by the end of the epoch. If we want to have exactly 1000 humans after 10 epochs, then what is the value of X?
Answer:
At the start of each epoch, the population increases by 100, so after 10 epochs, the population will be 100 * 10 = 1000.
Pretraining a question answering model
- Selected model size of 70mil is not nearly enough for a model to fully comprehend context.
- It was enough for model to start talking on correct topic and form coherent answers.
- 3 to 4 epochs over smaller dataset is more than enough. Anything beyond that leads to over-training and regression.
- dataset shouldn't allow for "easy win" answers, such as a specific string being the correct answer 50% of the time.
- Having inappropriately large of a vocabulary compared to model size probably had a negative affect also.
MixtralForCausalLM(