Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
from torch.functional import F | |
import torch | |
import pandas as pd | |
def labels_to_one_hot(labels, num_classes): | |
one_hot = torch.zeros(num_classes, dtype=torch.float64, device=labels.device) | |
one_hot[labels] = 1 | |
return one_hot |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
default llama(dim=3072, n_layers=28, n_heads=24, n_kv_heads=8, vocab_size=128256, multiple_of=256, ffn_dim_multiplier=1.0, max_seq_len=2048) | |
default RAdamScheduleFree(lr=1e-4, weight_decay=0.05, betas=(0.9, 0.98)) | |
default AdamWScheduleFree(weight_decay=0.1, betas=(0.9, 0.98), lr=1e-4, warmup_steps=200) | |
| method (causal) | run | model | optimizer, (scheduler) | batch_size | gdnt_acc | duration | device | dtype | epochs | batches_per_epoch | vloss | correct_lp | | |
|-------------------------------|-----|---------------------------------------|------------------------|------------|----------|----------|--------|----------|--------|-------------------|--------|------------| | |
| lora on [wq, wk, wv] | 1 | llama | RAdamScheduleFree | 2 | 6 | 2h 30min | A100 | float32 | 5 | 420 | 0.2881 | 4955 | | |
| full finetune | 2 | llama | AdamW |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Simple example of S4 in temporal logic: | |
Branching Time Temporal Logic (BTL): simple example using relation "is_accessible_from" and W could be all the possible states during Wednesday. (tree of all different things I do on Wednesday) | |
- Reflexive: is_accessible_from(eat, eat). If I'm eating, then i'm eating | |
- Transitive: if is_accessible_from(wake_up, going_to_school) and is_accessible_from(going_to_school, eat) then is_accessible_from(wake_up, eat) | |
- Connectedness: Some states can remain unreachable. The result is a tree of possible chains of actions (future) | |
Flash the paint picture | |
◻ - it is necessary that. Or for the current example, "in every possible future branch from the current moment" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Solve the following reasoning task where given known facts and rules on how to deduce new facts, conclude whether fact being queried can be deduced from known facts and rules. per rule, only one new fact is deduced. | |
Here are some examples: | |
1. | |
facts: ['95'] | |
rules: [[['133'], '110'], [['86'], '146'], [['117', '110', '146'], '113'], [['110'], '117'], [['95'], '142'], [['0'], '133'], [['17'], '110'], [['133'], '86'], [['95', '0'], '86'], [['133', '86', '113'], '110'], [['142'], '17'], [['146'], '113'], [['113'], '0']] | |
query: 17 | |
results in True | |
2. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rule_block = 200 | |
deduction_separator = 201 | |
rule_separator = 202 | |
fact_block = 203 | |
query_block = 204 | |
preds_block = 205 | |
end_of_turn = 206 | |
end_of_text = 207 | |
special_tokens = {1: 210, 0: 211} | |
pad = 208 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Dataset: | |
<s> I exist </s> | |
<s> Not that I want to </s> | |
<s> I want food </s> | |
<s> It is not what I want </s> | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[2, 3, 128000, 128002, 128009] [-100, -100, -100, 128002, 128009] | |
[0, 7, 128000, 128002, 128009] [-100, -100, -100, 128002, 128009] | |
[1, 7, 128000, 128002, 128009] [-100, -100, -100, 128002, 128009] | |
[2, 1, 128000, 128003, 128009] [-100, -100, -100, 128003, 128009] | |
[7, 6, 128000, 128002, 128009] [-100, -100, -100, 128002, 128009] | |
[0, 3, 128000, 128003, 128009] [-100, -100, -100, 128003, 128009] | |
[4, 4, 128000, 128002, 128009] [-100, -100, -100, 128002, 128009] | |
[4, 2, 128000, 128002, 128009] [-100, -100, -100, 128002, 128009] | |
[3, 7, 128000, 128003, 128009] [-100, -100, -100, 128003, 128009] | |
[2, 5, 128000, 128002, 128009] [-100, -100, -100, 128002, 128009] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
######################################### original : | |
Solve the following match question by detailing every reasoning step. | |
Question: | |
Let's imagine a population of 100 humans. At the start of every epoch, every human gives birth to a child. We have to murder X of the children before they grow up to humans by the end of the epoch. If we want to have exactly 1000 humans after 10 epochs, then what is the value of X? | |
Answer: | |
At the start of each epoch, the population increases by 100, so after 10 epochs, the population will be 100 * 10 = 1000. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pretraining a question answering model | |
- Selected model size of 70mil is not nearly enough for a model to fully comprehend context. | |
- It was enough for model to start talking on correct topic and form coherent answers. | |
- 3 to 4 epochs over smaller dataset is more than enough. Anything beyond that leads to over-training and regression. | |
- dataset shouldn't allow for "easy win" answers, such as a specific string being the correct answer 50% of the time. | |
- Having inappropriately large of a vocabulary compared to model size probably had a negative affect also. | |
MixtralForCausalLM( |
NewerOlder