Skip to content

Instantly share code, notes, and snippets.

@bennokr
bennokr / README.txt
Last active October 16, 2017 12:07
Brown Clusters
Brown clusters
They are induced as described in the paper:
Joseph Turian, Lev-Arie Ratinov and Yoshua Bengio (2010) "WORD
REPRESENTATIONS: A SIMPLE AND GENERAL METHOD FOR SEMI-SUPERVISED
LEARNING",
on the RCV1 corpus, cleaned as described in the paper (roughly 37M words of News text).
brown-rcv1.clean.tokenized-CoNLL03.txt-c*-freq1.txt
Brown clusters for a particular number of induced classes.
@bennokr
bennokr / index.html
Created June 1, 2017 19:46
Eurostat visualizer
<!DOCTYPE html>
<head>
<title>Statistics</title>
<meta charset="utf-8">
<script src="https://code.jquery.com/jquery-3.2.1.min.js"></script>
<script type="text/javascript" src="https://raw.githubusercontent.com/badosa/JSON-stat/master/json-stat.js"></script>
<script src="http://mustache.github.io/extras/mustache.js"></script>
<script src="https://d3js.org/d3.v3.min.js"></script>
\s?(?P<hedge>(
(oorspr\.\s|in\s(oorsprong|aanleg|opzet))?
(wellicht|misschien|vermoedelijke?|waarschijnlijk|mogelijk|omstreeks|verm\.|mog\.|circa|ca|ca\.|tussen|mogelijk\sca\.|rond|tegen|waarsch\.)?
\s?
)+)?
\s?(?P<offset>(korte?\s)?(laat|late|vroege?|vanaf|begin|midden|einde?|na|aanvang|in\sde|voor|na|minstens))?
\s?(?P<ordinal>(
(eerste?|tweede|derde|vierde|laatste)
\s?(en.of|of)?\s?
)+)?
@bennokr
bennokr / Matcher.ipynb
Last active October 17, 2019 08:15
Northix schema matching
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@bennokr
bennokr / correspondence.txt
Last active February 16, 2017 22:22
Eurostat RAMON classification (http://ec.europa.eu/eurostat/ramon/) descriptions as newline-delimited json
# downloadable at http://ec.europa.eu/eurostat/ramon/relations/index.cfm?TargetUrl=ACT_OTH_REL_DLD&StrNomRelCode={}&StrLanguageCode=EN&StrFormat=XML
PRODCOM 2004 - CN 2004
PRODCOM 2005 - CN 2005
PRODCOM 2006 - CN 2006
PRODCOM 2007 - CN 2007
PRODCOM 2008 - CN 2007
PRODCOM 2008 - CN 2008
PRODCOM 2009 - CN 2009
SITC REV. 3 - ISIC REV. 3
CN 2002 - CPA 2002
@bennokr
bennokr / eurostat_descriptions.csv
Last active October 2, 2018 12:21
Eurostat dataset descriptions
We can't make this file beautiful and searchable because it's too large.
id,parent,title,lastUpdate,lastModified,dataStart,dataEnd,values,unit,shortDescription
data,,Database by themes,,,,,,,
general,data,General and regional statistics,,,,,,,
euroind,general,European and national indicators for short-term analysis,,,,,,,
ei_bcs,euroind,Business and consumer surveys (source: DG ECFIN),,,,,,,
ei_bcs_cs,ei_bcs,Consumer surveys (source: DG ECFIN),,,,,,,
ei_bsco_m,ei_bcs_cs,Consumers - monthly data,02.02.2017,30.01.2017,1980M01,2017M01,225387,,
ei_bsco_q,ei_bcs_cs,Consumers - quarterly data,02.02.2017,30.01.2017,1990Q1,2017Q1,15615,,
ei_bcs_bs,ei_bcs,Business surveys (source: DG ECFIN),,,,,,,
ei_bcs_r1,ei_bcs_bs,Business surveys - NACE Rev. 1.1 activity,,,,,,,
@bennokr
bennokr / index.html
Created February 8, 2017 19:42
convert pubmed to embase query
<script type="text/javascript">
function go(){
console.log("foo");
embase = document.getElementById("embase");
query = document.getElementById("pubmed").value;
query = query.replace(/\[title\/abstract\]/g, "!");
query = query.replace(/OR/g, "@");
query = query.replace(/AND/g, "#");
query = query.replace(/NOT/g, "$");
@bennokr
bennokr / wikidata_links.py
Created January 16, 2017 15:19
Download a stream of wikipedia pages in a given language, resolving links to pages as wikidata entity URIs.
@bennokr
bennokr / semcor.ipynb
Last active April 14, 2017 15:14
Supervised Word Sense Disambiguation using SemCor + SVM
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@bennokr
bennokr / wikidata-sitelinks-table.sh
Created December 9, 2016 10:08
Create wikidata sitelinks table