Skip to content

Instantly share code, notes, and snippets.

View vanatteveldt's full-sized avatar

Wouter van Atteveldt vanatteveldt

  • VU University
  • Amsterdam
View GitHub Profile
library(reshape)
library(foreign)
library(irr)
# prepare data as 'molten' article - coder - issue - variable - value rows
b = read.spss("betrouwbaarheid.sav", to.data.frame=T)
d = b[, c("Coder","ArticleId", "Sourcepartytype", "Subjectpartytype", "Objectpartytype","Sourcedim", "Subjectdim", "Objectdim" )]

Day 1: R basics and data manipulation

  • Getting started
    • Principles of R
    • R studio, scripts, projects
    • How to get help?
  • Your data in R
    • Data types: data frames, vectors, lists, …
    • Reading and writing data
    • Simple descriptives: summary, table, …
from api import AmcatAPI
# connect to amcat, password should be in ~/.amcatauth or AMCAT_PASSWORD
a = AmcatAPI("http://amcat.nl")
r = a.list_articles(project=49, articleset=4635, page_size=1)
art = r["results"][0]
print "Got article {id} : {headline!r} with length {n}:\n {txt!r}...".format(txt=art['text'][:30], n=len(art['text']), **art)
[2014-02-11 09:27:40,428 ERROR django.request:226] Internal Server Error: /annotator/project/254/codingjob/3667/codedarticle/1115281/save
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 114, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python2.7/dist-packages/django/db/transaction.py", line 339, in inner
return func(*args, **kwargs)
File "/home/wva/amcat/annotator/views/codingjob.py", line 70, in save
coded_article.replace_codings(codings["codings"])
File "/home/wva/amcat/amcat/models/coding/codedarticle.py", line 172, in replace_codings
raise ValueError("intval and strval cannot both be None")
Kan je de (python?) code sturen die je gebruikt? Vaak makkelijk is om het op gist.github.com te zetten en de link te sturen.
-- Wouter
from api import AmcatAPI
api = AmcatAPI("http://amcat.vu.nl", "chantalvanson", geheim)
print "Token: ",api.token
print "Host:", api.host
articles = api.list_articles(project=1, articleset=1)
print "#Articles:", len(articles)
source("~/amcat-r/R/amcatr.r")
source("~/amcat-r/R/codebook.r")
source("~/amcat-r/R/query.r")
source("~/amcat-r-tools/codebook_tools.r")
conn = amcat.connect("http://amcat.vu.nl")
h = amcat.gethierarchy(conn=conn, codebook_id=337, languages=c("nl", "dutch"))

A simple annotation format for AmCAT / xtas / NLP-Lab

Annotation formats have a habit of changing with new technologies and frameworks. That said, there are a number of things that most people working in NLP can probably agree on. Codifying such agreement in a technical implementation makes it easier to collaborate on tools and infrastructure.

This document proposes a simple extensible json-based format that is intended to capture the part of NLP representation that we cal all agree on and that is simple to use, store, and extend. It does not force every tool or every user to adhere to this format. Rather, it is intended as a suggestion to developers that if the output of their module can fit in this format, it might be a good idea to use it, so we can all work together better.

from amcat.models import ArticleSet
import re
for aset in [148,149,150]:
articles = ArticleSet.objects.get(pk=aset).articles.filter(date__gte="2014-02-14", medium_id__in=(224,225,226))
for a in articles[:10]:
x = eval(a.metastring)
date = a.date.isoformat()[:10]
fn = "{a.id}_{date}_{a.medium}_{a.headline}".format(**locals())
fn = fn.replace(" ", "_")
$ curl -XDELETE http://localhost:9200/mytest?pretty
{
"ok" : true,
"acknowledged" : true
}
$ curl -XPOST http://localhost:9200/mytest?pretty
{
"ok" : true,
"acknowledged" : true