Skip to content

Instantly share code, notes, and snippets.

@kleem
kleem / README.md
Last active August 29, 2015 14:02
Linguistic annotations

An experiment on visualizing linguistic annotations of a (small) corpus.

The example uses a nonstandard, super-simple JSON format coded by hand (please forgive me for the errors I surely made from a linguistic standpoint).

This visualization focuses on three different aspects of the analysis: sentence splitting (a gray ■ introduces a new sentence), tokenization and lemmatization (each token has an underline and its lemma written under it) and part-of-speech tagging (the color of the underline and the lemma indicates whether the term is a noun, a verb, etc.).

The original text's spacing, punctuation and line breaking is preserved, as it can be seen by the last two lines.

Various CSS hacks with line heights, relative positioning and stuff are used to create this layout, so functionalities like text selection and similar are broken.

@kleem
kleem / README.md
Last active August 29, 2015 14:02
Ruby annotations

A different take on the previous example: linguistic annotations is represented by using Ruby annotations and their relative CSS properties. See this article by Richard Ishida from W3C for more information.

This implementation should be better than the previous one from a semantic web perspective, since ruby tags more or less describe the semantic of an annotation. It also has the advantage of having no CSS voodoo (with the exception of some -webkit- prefixed property). Unfortunately, browser support is still incomplete, so it may not work on your browser of choice (works on Chrome 31 for sure).

@kleem
kleem / README.md
Last active August 29, 2015 14:02
OpeNER - Text annotation visualization
@kleem
kleem / README.md
Last active August 29, 2015 14:02
Clavius - Latin text annotation visualization
@kleem
kleem / README.md
Last active February 22, 2023 09:52
WordNet noun graph

This experiment converts an SQL version of WordNet 3.0 into a graph, using the python library graph-tool. In order to create a taxonomical structure, only noun synsets, hyponym links and hypernym links are considered.

The result of the conversion is saved as GraphML, then rendered as the following hairball:

WordNet 3.0 taxonomy as a graph

Since the graph can be considered a tangled tree, i.e. a tree in which some nodes have multiple parents, two untangled versions (using longest and shortest paths) are also provided as GraphML. Only a few links are lost (about 2%), making the tree a good approximation of the noun taxonomy graph.

@kleem
kleem / README.md
Last active December 12, 2016 18:52
Core WordNet noun graph

This experiment is similar to the previous one, but we attached word senses to the synsets, selecting only core noun senses (less than 5000).

The following image depicts the tree obtained for this graph after a longest path untangling (synsets are shown in red, while senses in blue):

Untangled core noun graph

@kleem
kleem / README.md
Last active April 28, 2020 22:27
WordNet verb graph

This experiment is like the previous one, but focused on verbs rather than nouns. The following picture shows the "islands" of the core verb taxonomy graph (click here to see the one for nouns):

Core verb taxonomy graph

A cycle between synsets is removed (because identified as a human error in Richens 2008), then the graph is fed into a longest-path untangler to produce a tree.

square = (x) -> x * x
console.log square 5
var compromised = true;
@kleem
kleem / README.md
Last active August 29, 2015 14:04
Arc diagram: Italian tongue-twister

An example of arc diagram visualizing repetitions of sequences of two or more characters in an italian tongue-twister.

Arc diagrams were first introduced in Wattenberg 2002. To avoid cluttering, not all repetions of sequences are shown; only the ones that are considered fundamental for the understanding of the structure are displayed. Refer to the paper to have more details. In this example, meaningful matches are manually selected.