Skip to content

Instantly share code, notes, and snippets.


Dmitry Ustalov dustalov

View GitHub Profile
dustalov /
Last active Jan 31, 2021
Answer Aggregation with Dawid-Skene and Bradley-Terry
#!/usr/bin/env python3
__author__ = 'Dmitry Ustalov'
__copyright__ = 'Copyright 2021 Dmitry Ustalov'
__license__ = 'MIT' #
import numpy as np
EPS = 1e-8
dustalov / dirbackup
Last active May 24, 2020
Miscellaneous scripts for nearly everyday use
View dirbackup
CWD=$(basename "$PWD")
XZ_OPT="-T 0" tar --exclude '*~' -C ../ -cJvf "../$CWD.tar.xz" "$CWD"
dustalov / Makefile
Last active Aug 31, 2019
Chinese Whispers and Telephone Game Performance Evaluation
View Makefile
WATSET ?= ../watset-java/target/watset.jar
LCC ?= ../lcc
export LANG:=en_US.UTF-8
export LC_COLLATE:=C
cut -f1,2 $(LCC)/eng_news_2016_10K/eng_news_2016_10K-co_s.txt | sed -re 's/\t/\n/g' | sort -u | wc -l
cut -f1,2 $(LCC)/eng_news_2016_30K/eng_news_2016_30K-co_s.txt | sed -re 's/\t/\n/g' | sort -u | wc -l
dustalov /
Last active May 24, 2020
An implementation of the sigf toolkit for randomization tests in Python 3
#!/usr/bin/env python3
__author__ = 'Dmitry Ustalov'
__credits__ = 'Sebastian Padó'
__license__ = 'MIT'
# This is an MIT-licensed implementation of the sigf toolkit for randomization tests:
import random
dustalov / collocation.groovy
Last active Jun 23, 2019
Watset (Java) Performance Measurement
View collocation.groovy
#!/usr/bin/env groovy
import org.apache.commons.math3.stat.descriptive.moment.Mean
import org.apache.commons.math3.stat.descriptive.moment.StandardDeviation
import org.jgrapht.graph.SimpleWeightedGraph
import org.jgrapht.util.SupplierUtil
import org.nlpub.watset.graph.ChineseWhispers
import org.nlpub.watset.graph.NodeWeighting
import org.nlpub.watset.graph.MaxMax
import org.nlpub.watset.eval.Measurer
import org.nlpub.watset.graph.Watset
dustalov / Makefile
Last active Jan 11, 2018
Extracting and cross-validating the WCL dataset of the 1.0 version
View Makefile
SEED = 1337
WCL_WRAPPER = /srv/definitions/wcl-extract
kfold: wiki_really_all.txt
./ --seed=$(SEED) $<
dustalov /
Last active Jan 2, 2018
Normalized Modified Purity in Python.
#!/usr/bin/env python
# This script computes the normalized modified purity and inverse purity
# as according to this paper:
# In fact, this program is currently quite a rough translation of
# the evaluation-verb-classes.perl script provided by Daisuke Kawahara.
import argparse
import re
import sys
dustalov / ztest.awk
Last active Feb 8, 2017
Pairwise statistical significance test in AWK using Z-test.
View ztest.awk
#!/usr/bin/awk -f
# significance level
if (length(ALPHA) == 0) ALPHA = 0.05;
# standard error estimation method: "basic" or "pooled"
if (length(SE) == 0) SE = "basic";
# one-tailed or two-tailed?
if (TAILS != 2) TAILS = 1;
dustalov / extract-relations.groovy
Last active Jun 5, 2018
Extract semantic relations from Wiktionary using JWKTL.
View extract-relations.groovy
#!/usr/bin/env groovy
import de.tudarmstadt.ukp.jwktl.JWKTL
import de.tudarmstadt.ukp.jwktl.api.filter.WiktionaryEntryFilter
import de.tudarmstadt.ukp.jwktl.api.util.Language
final languages = [en: Language.ENGLISH, ru: Language.RUSSIAN, de: Language.GERMAN]
if (args.length != 2 || !languages.containsKey(args[1] = args[1].toLowerCase())) {
throw new IllegalArgumentException('Required arguments: <PARSED-WIKTIONARY> en|ru|de')
dustalov /
Created Sep 13, 2016
A brute force decoder of Cyrillic strings with unknown charset combination.
#!/bin/bash -e
S=$(head -1)
CHARSETS=(utf8 cp1251 cp1252 koi8r koi8u iso-8859-5 maccyrillic)
for c1 in ${CHARSETS[*]}; do
for c2 in ${CHARSETS[*]}; do
for c3 in ${CHARSETS[*]}; do
for c4 in ${CHARSETS[*]}; do
echo -ne "$c1\t$c2\t$c3\t$c4\t"
<<<$S iconv -f=$c1 -t=$c2 -c | iconv -f=$c3 -t=$c4 -c