Skip to content

Instantly share code, notes, and snippets.

@neubig
neubig / dispatch_openai_requests.py
Last active February 19, 2024 17:55
A simple script to get results from the OpenAI Asynchronous API
# NOTE:
# You can find an updated, more robust and feature-rich implementation
# in Zeno Build
# - Zeno Build: https://github.com/zeno-ml/zeno-build/
# - Implementation: https://github.com/zeno-ml/zeno-build/blob/main/zeno_build/models/providers/openai_utils.py
import openai
import asyncio
from typing import Any
@yoavg
yoavg / LLMs.md
Last active February 17, 2024 18:39

Some remarks on Large Language Models

Yoav Goldberg, January 2023

Audience: I assume you heard of chatGPT, maybe played with it a little, and was imressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.

Intro

Around 2014-2017, right within the rise of neural-network based methods for NLP, I was giving a semi-academic-semi-popsci lecture, revolving around the story that achieving perfect language modeling is equivalent to being as intelligent as a human. Somewhere around the same time I was also asked in an academic panel "what would you do if you were given infinite compute and no need to worry about labour costs" to which I cockily responded "I would train a really huge language model, just to show that it doesn't solve everything!". We

This file has been truncated, but you can view the full file.
https://www.coop.ch/de/navigation/meganav/get?categoryCode=m_0475&_=1629798238733
https://www.amazon.com/dp/B0009R5B3U?th=1
https://play.google.com/_/PlayStoreUi/data/batchexecute?rpcids=qnKhOb&bl=boq_playuiserver_20191117.08_p1&gl=my&hl=ms&authuser&soc-app=121&soc-platform=1&soc-device=1&rt=c
https://www.coop.ch/de/lebensmittel/vorraete/pastasaucen-warme-saucen/saucen-gemischt/c/m_0166
https://www.amazon.com.au/gp/aod/ajax/ref=aod_f_freeShipping?asin=B08CXNTJ89&pageno=1&pc=dp
https://losangeles.craigslist.org/lac/ofc/d/san-diego-market-research-project/7369959933.html
https://www.bestbuy.ca/api/offers/v1/products/10434538/offers
https://www.zillow.com/homes/5266-S-Umatilla-Ave-Boise-ID-83709_rb
https://www.yelp.com/not_recommended_reviews/dannys-rv-repair-williams?not_recommended_start=40
https://www.nytimes.com/1910/05/06/archives/foot-caught-train-hit-her-girls-shoe-fast-in-track-frog-as-an.html
This file has been truncated, but you can view the full file.
The Project Gutenberg EBook of The Adventures of Sherlock Holmes
by Sir Arthur Conan Doyle
(#15 in our series by Sir Arthur Conan Doyle)
Copyright laws are changing all over the world. Be sure to check the
copyright laws for your country before downloading or redistributing
this or any other Project Gutenberg eBook.
This header should be the first thing seen when viewing this Project
Gutenberg file. Please do not remove it. Do not change or edit the
@ogrisel
ogrisel / mean_target_encoding.py
Last active September 29, 2017 15:05
Mean target value encoding for categorical variable using dask
#
# XXX: do not use this code, it's broken!
# Use: https://gist.github.com/ogrisel/b6a97ed87939e3b559568ac2f6599cba
#
# See comments.
import os
import os.path as op
from time import time
import dask.dataframe as ddf
@neubig
neubig / dynet-tagger.py
Last active May 21, 2018 06:01
A small sequence labeler in DyNet
"""
DyNet implementation of a sequence labeler (POS taggger).
This is a translation of this tagger in PyTorch: https://gist.github.com/hal3/8c170c4400576eb8d0a8bd94ab231232
Basic architecture:
- take words
- run though bidirectional GRU
- predict labels one word at a time (left to right), using a recurrent neural network "decoder"
The decoder updates hidden state based on:
- most recent word
@hal3
hal3 / mini_sequence_labeler.py
Last active January 24, 2019 20:56
PyTorch implementation of a sequence labeler (POS taggger).
"""
PyTorch implementation of a sequence labeler (POS taggger).
Basic architecture:
- take words
- run though bidirectional GRU
- predict labels one word at a time (left to right), using a recurrent neural network "decoder"
The decoder updates hidden state based on:
- most recent word
@loretoparisi
loretoparisi / macos-tts-say-voices.txt
Created October 26, 2016 21:40
macOS Sierra Text to Speech (TTS) Voices
Alex en_US # Most people recognize me by my voice.
Alice it_IT # Salve, mi chiamo Alice e sono una voce italiana.
Alva sv_SE # Hej, jag heter Alva. Jag är en svensk röst.
Amelie fr_CA # Bonjour, je m’appelle Amelie. Je suis une voix canadienne.
Anna de_DE # Hallo, ich heiße Anna und ich bin eine deutsche Stimme.
Carmit he_IL # שלום. קוראים לי כרמית, ואני קול בשפה העברית.
Damayanti id_ID # Halo, nama saya Damayanti. Saya berbahasa Indonesia.
Daniel en_GB # Hello, my name is Daniel. I am a British-English voice.
Diego es_AR # Hola, me llamo Diego y soy una voz española.
Ellen nl_BE # Hallo, mijn naam is Ellen. Ik ben een Belgische stem.
@meew0
meew0 / bulba-parser.rb
Created May 7, 2016 15:39
Ruby script to parse a dump of Bulbapedia's Pokémon pages into obtainability data
# This script parses a dump of Bulbapedia's Pokémon pages into a JSON file
# with details about what Pokémon are obtainable in respective regions
# (specifically, the latest series of games set in a specific region).
require 'nokogiri'
require 'json'
# An XML dump of all of Bulbapedia's Pokémon pages is required to exist at
# this path. It can be generated using this special page:
# http://bulbapedia.bulbagarden.net/wiki/Special:Export
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.