累令直漢刃
累令直漢刃
import json | |
j = json.load(open('./census-reviewed.json', 'r')) | |
headers = None | |
total_vars = { | |
'P1_047N': 0, | |
'P1_063N': 0, | |
'P1_070N': 0, | |
'P2_072N': 0, | |
'Hisp': 0, |
# this should run on a GPU CoLab notebook | |
# pip install langchain xformers transformers datasets bitsandbytes accelerate --quiet | |
# get access to the meta-llama models, accept license, and get a read token | |
hf_auth = '######' | |
from langchain.chains import ConversationChain | |
from langchain.llms import HuggingFacePipeline | |
from langchain.memory import ConversationSummaryBufferMemory | |
from langchain.prompts.prompt import PromptTemplate |
累令直漢刃
累令直漢刃
Date: February 25, 2023
Questions in quotes
My comments in bold italics
Hi, I'm going to ask some questions about New York City as a new visitor, and you should respond as an expert resident.
Sure, I'm happy to help! What would you like to know about New York City?
# All I'm looking for on an ML example: | |
# ! pip install name_of_library | |
from name_of_library import model, other_stuff | |
tdata = load_data_from_file() # not a built-in datasets source where I'd need to write python to add data | |
tdata.apply(changes) # whose dataset is so perfect we don't edit it | |
model.train(tdata, **explained_params) |
May 6 - June 15, 2021
Once a large pre-trained language model is published, it is a snapshot of language when its corpus was collected. What are ways to update models to support new or newly-frequent terms (BIPOC), phrasing (social distancing), or people and events (Fyre Festival)? What are reliable, low-cost ways to test and benchmark these methods of updating?
/* | |
Generally, don't run random JS in your browser console, especially on financial sites, but here we are | |
By default this sorts by Percent Change. If you uncomment the next line it sorts by myDelta (price x your shares) | |
Caveats: | |
- I'm not affiliated with Vanguard or any licensed financial advisor or tax preparer. I don't have a clue what's going on with your finances. | |
- The script assumes you did NOT trade today; it uses today's change and current shares | |
- Delta-sort does not handle penny stocks as well because the UI says 0.01 and we reverse-engineer from current balance | |
*/ | |
let sortRule = 'pct'; |
t5.data.TaskRegistry.add( | |
"byt5_ex", | |
t5.data.TextLineTask, | |
split_to_filepattern={ | |
"train": "gs://BUCKET/train_lines.txt", | |
"validation": "gs://BUCKET/validation_lines.txt", | |
}, | |
text_preprocessor=[ | |
functools.partial( | |
t5.data.preprocessors.parse_tsv, |
Code: https://colab.research.google.com/drive/1vltPI81atzRvlALv4eCvEB0KdFoEaCOb?usp=sharing
Can these scores be improved? YES!
Rerunning with more training data, more epochs of training, or using other libraries to set a learning rate / other hyperparameters before training.
The point of a benchmark is to run these models through a reasonable and identical process; you can tweak hyperparameters on any model to improve results.