Skip to content

Instantly share code, notes, and snippets.

@bollwyvl
bollwyvl / pyolite - contents.ipynb
Last active April 2, 2022 18:28
Accessing JupyterLite contents from pyolite
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@yoavg
yoavg / stochastic-critique.md
Last active November 9, 2023 04:32
A criticism of Stochastic Parrots

A criticism of "On the Dangers of Stochastic Parrots: Can Languae Models be Too Big"

Yoav Goldberg, Jan 23, 2021.

The FAccT paper "On the Dangers of Stochastic Parrots: Can Languae Models be Too Big" by Bender, Gebru, McMillan-Major and Shmitchell has been the center of a controversary recently. The final version is now out, and, owing a lot to this controversary, would undoubtly become very widely read. I read an earlier draft of the paper, and I think that the new and updated final version is much improved in many ways: kudos for the authors for this upgrade. I also agree with and endorse most of the content. This is important stuff, you should read it.

However, I do find some aspects of the paper (and the resulting discourse around it and around technology) to be problematic. These weren't clear to me when initially reading the first draft several months ago, but they became very clear to me now. These points are for the most part

Training a Dutch parser

Steps

  1. Get the text data: wget http://kyoto.let.vu.nl/~miltenburg/public_data/wikicorpus/corpus/wikicorpus.txt.gz
  2. Get the code for the structured n-grams: wget https://github.com/wlin12/wang2vec/archive/master.zip
  3. Run unzip master.zip ; rm master.zip
  4. Build the word vector code: Run cd wang2vec-master/ ; make ; cd ..
  5. Train CBOW vectors: Run ./wang2vec-master/word2vec -train wikicorpus.txt -output cbow.vectors -type 0 -size 50 -window 5 -negative 10 -nce 0 -hs 0 -sample 1e-4 -threads 1 -iter 5 -cap 0 >> training.log 2>&1 &
  6. Train Structured skipngram vectors: Run ./wang2vec-master/word2vec -train wikicorpus.txt -output structured_ngram.vectors -type 3 -size 50 -window 5 -negative 10 -nce 0 -hs 0 -sample 1e-4 -threads 1 -iter 5 -cap 0 >> training_ssg.log 2>&1 &
@arjanelfassed
arjanelfassed / Inventarisatie Gegevensbestanden en koppelingen.csv
Created November 23, 2015 18:59
Inventarisatie Gegevensbestanden en koppelingen.csv
Overheid Organisatie Acronym Naam Proces Wet Persoonsgegevens Ministerie AZ Kabinet der Koningin Ministerie BZK BRP Bv BSN AIVD 112 centrale Ministerie BuZa Ministerie DEF Ivent KMAR MIVD Ministerie EZ ECD KvK LNV NVWA RVO Staatsbosbeheer Ministerie Financiën AFM BD-belastingen BD-toeslagen DNB Domeinen Douane FIOD Ministerie IenM ANVS Bureau CM CBR Dienst verkeer en scheepvaart Inspectie Leefomgeving en Milieu Kadaster RDW Rijkswaterstaat Ministerie OCW CVTE DUO Inspectie vanhet Onderwijs Besturen CITO Inspectie instelingen NUFFIC Onderwijscoöperatie Raden RMC's St. Schoolleidersregister Primair Onderwijs Ministerie SZW BKWI CVZ Inlichtingenbureau Inspectie SZW SIOD SVB UWV brancehorganisaties werkgeversorganisaties werknemersorganisaties Regionale Uitvoeringsdiensten VenJ CJIB COA DAD Dienst SSPC DJI DT&V IND JustID Justis LBIO NIFP NFI NCTV OM Parketten Politie Rechtspraak Rijksrecherche RSJ RvdKinderbescherming RvdRechtsbijstand RN Schaddefonds Geweldsmisdrijven Stichting Verslavingsreclassering GGZ W
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@kyagrd
kyagrd / HM+TCPoly+KindPoly
Last active March 30, 2018 10:49
HM+HigherKindedPoly+KindPoly in 25 lines of hacky Prolog
%%%% Hindley Milner + type constructor polymorphism + rank-1 kind poly
use_module(library(apply)).
use_module(librar(gensym)).
:- set_prolog_flag(occurs_check,true).
:- op(500,yfx,$).
kind(KC,var(Z),K1) :- first(Z:K,KC), instantiate(K,K1).
kind(KC,F $ G, K2) :- kind(KC,F,K1 -> K2), kind(KC,G,K1).
kind(KC,A -> B,o) :- kind(KC,A,o), kind(KC,B,o).
@casallas
casallas / pub-bib.md
Last active September 6, 2021 20:47
Publishing clean bibtex files

These tricks require BibTool, which can be installed using homebrew.

To extract only the used fields call bibtool -x on a .aux file, e.g.

bibtool -x doc.aux -o output.bib

To delete fields that you don't want to share, e.g. notes and file locations,

bibtool -- "delete.field {annote}" -i input.bib -o output.bib
@jaxbot
jaxbot / Vim autocmds
Last active September 3, 2023 11:37
Vim MacBook LEDs
" Assuming keyboard_leds is built and available in your PATH,
" this will make capslock indicate whether or not you are in insert mode.
autocmd InsertEnter * :!keyboard_leds -c1
autocmd InsertLeave * :!keyboard_leds -c0
" To make Vim control the keyboard backlight, use this.
" Note that it's glitchy and you'll probably toss the idea soon after.
" I can see programmatically controlling the lights to be useful in other cases, though.
" Install Lab tick and set a hotkey for Toggle, and one for Brighten.
" http://labtick.proculo.de/
# Obtain the label of a given class (:class1).
SELECT DISTINCT ?c (STR(?l) AS ?lb)
WHERE {
?c a :class1 ;
<http://www.w3.org/2000/01/rdf-schema#label> ?l .
}
# Obtain a list of classes.
SELECT DISTINCT ?c
WHERE {