This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Brown clusters | |
They are induced as described in the paper: | |
Joseph Turian, Lev-Arie Ratinov and Yoshua Bengio (2010) "WORD | |
REPRESENTATIONS: A SIMPLE AND GENERAL METHOD FOR SEMI-SUPERVISED | |
LEARNING", | |
on the RCV1 corpus, cleaned as described in the paper (roughly 37M words of News text). | |
brown-rcv1.clean.tokenized-CoNLL03.txt-c*-freq1.txt | |
Brown clusters for a particular number of induced classes. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!DOCTYPE html> | |
<head> | |
<title>Statistics</title> | |
<meta charset="utf-8"> | |
<script src="https://code.jquery.com/jquery-3.2.1.min.js"></script> | |
<script type="text/javascript" src="https://raw.githubusercontent.com/badosa/JSON-stat/master/json-stat.js"></script> | |
<script src="http://mustache.github.io/extras/mustache.js"></script> | |
<script src="https://d3js.org/d3.v3.min.js"></script> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
\s?(?P<hedge>( | |
(oorspr\.\s|in\s(oorsprong|aanleg|opzet))? | |
(wellicht|misschien|vermoedelijke?|waarschijnlijk|mogelijk|omstreeks|verm\.|mog\.|circa|ca|ca\.|tussen|mogelijk\sca\.|rond|tegen|waarsch\.)? | |
\s? | |
)+)? | |
\s?(?P<offset>(korte?\s)?(laat|late|vroege?|vanaf|begin|midden|einde?|na|aanvang|in\sde|voor|na|minstens))? | |
\s?(?P<ordinal>( | |
(eerste?|tweede|derde|vierde|laatste) | |
\s?(en.of|of)?\s? | |
)+)? |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# downloadable at http://ec.europa.eu/eurostat/ramon/relations/index.cfm?TargetUrl=ACT_OTH_REL_DLD&StrNomRelCode={}&StrLanguageCode=EN&StrFormat=XML | |
PRODCOM 2004 - CN 2004 | |
PRODCOM 2005 - CN 2005 | |
PRODCOM 2006 - CN 2006 | |
PRODCOM 2007 - CN 2007 | |
PRODCOM 2008 - CN 2007 | |
PRODCOM 2008 - CN 2008 | |
PRODCOM 2009 - CN 2009 | |
SITC REV. 3 - ISIC REV. 3 | |
CN 2002 - CPA 2002 |
We can't make this file beautiful and searchable because it's too large.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
id,parent,title,lastUpdate,lastModified,dataStart,dataEnd,values,unit,shortDescription | |
data,,Database by themes,,,,,,, | |
general,data,General and regional statistics,,,,,,, | |
euroind,general,European and national indicators for short-term analysis,,,,,,, | |
ei_bcs,euroind,Business and consumer surveys (source: DG ECFIN),,,,,,, | |
ei_bcs_cs,ei_bcs,Consumer surveys (source: DG ECFIN),,,,,,, | |
ei_bsco_m,ei_bcs_cs,Consumers - monthly data,02.02.2017,30.01.2017,1980M01,2017M01,225387,, | |
ei_bsco_q,ei_bcs_cs,Consumers - quarterly data,02.02.2017,30.01.2017,1990Q1,2017Q1,15615,, | |
ei_bcs_bs,ei_bcs,Business surveys (source: DG ECFIN),,,,,,, | |
ei_bcs_r1,ei_bcs_bs,Business surveys - NACE Rev. 1.1 activity,,,,,,, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<script type="text/javascript"> | |
function go(){ | |
console.log("foo"); | |
embase = document.getElementById("embase"); | |
query = document.getElementById("pubmed").value; | |
query = query.replace(/\[title\/abstract\]/g, "!"); | |
query = query.replace(/OR/g, "@"); | |
query = query.replace(/AND/g, "#"); | |
query = query.replace(/NOT/g, "$"); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/env/python3 | |
""" | |
Download a stream of wikipedia pages in a given language, resolving links to | |
pages as wikidata entity URIs. | |
""" | |
import urllib.request, bz2 | |
import urllib.parse | |
import re | |
import xml.etree.cElementTree as cElementTree |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
curl https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2 | bzcat | tail -n +2 | sed 's/,$//' | jq -r '.id as $id | (if .sitelinks!=null then .sitelin | |
ks else [] end) | .[] | [.site, .title, $id] | @tsv' |