Skip to content

Instantly share code, notes, and snippets.

View aolieman's full-sized avatar

Alex Olieman aolieman

View GitHub Profile
@aolieman
aolieman / 0_reuse_code.js
Last active August 29, 2015 14:07
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
@aolieman
aolieman / linking_gov_parl_members.md
Last active August 29, 2015 14:07
Strategy for linking mentioned Dutch government and parliament members

Strategy for linking mentioned Dutch government and parliament members

Spotting

Mentions of government and parliament members in a text are found by regular expression. At least three cases can be distinguished here.

  1. A single person is mentioned by name
  2. Multiple persons are mentioned by name
  3. A person is mentioned only by his/her function

In the first two cases, the strategy is to make use of highly regular address styles that are used in parliamentary speech, and thus also in the proceedings. Some examples (in translation) are: sir, madam, member, colleague, minister, and secretary of state. Such an address is immediately followed by a member's last name. The pattern that is used to find these names needs to span as much of the name as possible, without including any other subsequent words.

@aolieman
aolieman / linking_parties.md
Last active August 29, 2015 14:07
Strategy for recognizing and linking Dutch political parties

Strategy for linking political parties

Build a name tree and a lookup map

  1. Retrieve all party identifiers
  2. For each party: parse XML and map known names to its identifier
  3. Build an Aho-Corasick tree from the party names

String matching and linking

The tree is searched with the proceedings input string (case-insensitive, leftmost longest-match), yielding the names that were matched. The lookup map is used to find the party identifier that corresponds with the matched name. The name is linked, unless it has already been recognized as a member's name. Another reason not to link a found name of a (single-member) party, is if it is part of a longer name, such as that of a motion or committee.

Keybase proof

I hereby claim:

  • I am aolieman on github.
  • I am alioli (https://keybase.io/alioli) on keybase.
  • I have a public key ASD9fx_rJMX8LRULGEOq3ymh-4MgCDo30SsGFyj3PIH_dwo

To claim this, I am signing this object:

@aolieman
aolieman / custom_elasticsearch.md
Last active April 15, 2024 14:09
Haystack provides an interface similar to Django's QuerySet, which instead enables easy querying in one or more popular search backends. Because the Haystack API is meant to hook up to several search backends, however, not all the functionality of the backends has been implemented in the API. In this post we show how Haystack's Elasticsearch bac…

Extending Haystack's Elasticsearch backend

Haystack provides an interface similar to Django's QuerySet, which instead enables easy querying in one or more popular search backends. Because the Haystack SearchQuerySet API is meant to hook up to several search backends, however, not all the functionality of the backends has been implemented in the API. In this article we show how Haystack's Elasticsearch backend can be extended with advanced querying functionality.

As an exemplary use case, we'll focus on implementing Elasticsearch's Nested Query in the SearchQuerySetAPI, to enable e.g. weighted tags on documents. The usage of this extended API will be shown first, after which we'll go through the necessary implementation steps.

ConfigurableSearchQuerySet API Usage

import search.custom_elasticsearch as ces
from files import FileObject