Skip to content

Instantly share code, notes, and snippets.

View mhermans's full-sized avatar

Maarten Hermans mhermans

View GitHub Profile

From a reddit-comment:

stupid question on "semanticweb". How do I actucally get data? It says library of congress is on 'link-web-data' now. If I want to get a book name by ISBN (using LOCs 'linked data') how would I do that?

Is there a website for the standardize format of link-data? Are there APIs available?

Also how do I cross correlate link-data? Say Amazon also had a link data set (or other "publisher"). How do I correlate ISBN numbers between Amazon, LOCs, the patent office, etc... to verify the integrity of such data. Lots of stuff on goggle is inaccurate, but that is "ok" because people are verifying it. But with an application, you need a way to insure the data is correct and what you are actucally looking for.

How do I get the data?

@mhermans
mhermans / strat_example.R
Created June 23, 2010 09:55
Basic example for the R strat package
library(strat)
isco <- c(1200, 3131, 9110)
isei <- recode(isco, informat="isco88", outformat="isei")
esec <- recode(isco, informat="isco88", outformat="esec")
table(isei, esec)
esec
isei 1 4 6
29 0 1 0
48 0 0 1
68 1 0 0
# .gitconfig file
# ---------------
[alias]
serve = !git daemon --reuseaddr --verbose --base-path=. --export-all ./.git
# Load required R-libraries
library(OpenMx)
library(psych)
library(polycor)
require(mvtnorm)
# SAS's PROBBNRM in R
# ===================
@mhermans
mhermans / kubuntu_movein.md
Created December 4, 2010 22:59
Kubuntu movein/configuration steps

sudo apt-get install vim curl python-pip python-dev git tmux powertop python-virtualenv virtualenvwrapper pandoc libxslt-dev libxml2-dev

enable third-party repositories; install firefox

rm -r Desktop/ Documents/ Downloads/ Music/ Pictures/ Public/ Templates/ Videos/

default-jre

installeer unison, virtualenv (zie andere gists)

@mhermans
mhermans / BEL20.py
Created December 30, 2010 01:26
BEL20 interlocking directorates: from Freebase to Gephi
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# pip install python-freebase
# pip install git+git://github.com/networkx/networkx.git
# need github version, as write_gexf() is not in current
# networkx 1.3 release
import freebase as fb
import networkx as nx
sometimes the prefix differs, or sometimes the namespace differs
xmlns:c="http://www.nbb.be/be/fr/pfs/ci/vl/2010-04-01"
xmlns:d="http://www.nbb.be/be/fr/pfs/ci/2010-04-01"
xmlns:pfs="http://www.nbb.be/be/fr/pfs/ci/2010-04-01"
xmlns:pfs-gcd="http://www.nbb.be/be/fr/pfs/ci/gcd/2010-04-01"
xmlns:pfs="http://www.nbb.be/be/fr/pfs/ci/2009-04-01"
@mhermans
mhermans / nosql.md
Created April 18, 2011 19:45
NOSQL

NOSQL: subjectieve intro (van iemand zonder DB-ervaring):

Onder de noemer/hype van NOSQL/alternatief voor SQL vallen een aantal gerelateerde benaderingen, twee interessante:

  • key-value stores, met als leukste voorbeeld redis
  • graph-databases, met als koploper neo4j

Key-value stores zijn zeer minimalistische databases, bedoelt voor zeer low-level bewerkingen, zonder schema, indexen etc. Als je heel goed weet wat je (algoritmes) aan het doen zijn kun je er blijkbaar serieus performante, gedistribueerde zaken mee doen. Dit gaat m'n petje/praktische noden te boven, wél interessant is om persistence hebben als je python code aan het schrijven bent.

Om je data te bewaren tussen bv. cleaning-sessies, of door te geven naar andere scripts kun je bv. de modules shelve of pickle gebruiken, of de moeite doen om bv. een sqlite databaseje te gebruiken. De eerste oplossing had ik altijd schrik om iets te corrumperen (bv. door concurrent accesss door twee scripts, te grote besta

@mhermans
mhermans / redis.md
Created April 22, 2011 08:06
redis examples
SDIFF fullstack stack processed error
[toon missings]
SDIFFSTORE missings fullstack stack processed error
SUNIONSTORE stack stack missings

virtualenv

sudo yum install make gcc python-devel
curl -O https://github.com/pypa/virtualenv/raw/master/virtualenv.py

python virtualenv.py redisenv