Skip to content

Instantly share code, notes, and snippets.


Dmitry Ustalov dustalov

Block or report user

Report or block dustalov

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
dustalov / dirbackup
Last active May 24, 2020
Miscellaneous scripts for nearly everyday use
View dirbackup
CWD=$(basename "$PWD")
XZ_OPT="-T 0" tar --exclude '*~' -C ../ -cJvf "../$CWD.tar.xz" "$CWD"
dustalov / Makefile
Last active Aug 31, 2019
Chinese Whispers and Telephone Game Performance Evaluation
View Makefile
WATSET ?= ../watset-java/target/watset.jar
LCC ?= ../lcc
export LANG:=en_US.UTF-8
export LC_COLLATE:=C
cut -f1,2 $(LCC)/eng_news_2016_10K/eng_news_2016_10K-co_s.txt | sed -re 's/\t/\n/g' | sort -u | wc -l
cut -f1,2 $(LCC)/eng_news_2016_30K/eng_news_2016_30K-co_s.txt | sed -re 's/\t/\n/g' | sort -u | wc -l
dustalov /
Last active May 24, 2020
An implementation of the sigf toolkit for randomization tests in Python 3
#!/usr/bin/env python3
__author__ = 'Dmitry Ustalov'
__credits__ = 'Sebastian Padó'
__license__ = 'MIT'
# This is an MIT-licensed implementation of the sigf toolkit for randomization tests:
import random
dustalov / collocation.groovy
Last active Jun 23, 2019
Watset (Java) Performance Measurement
View collocation.groovy
#!/usr/bin/env groovy
import org.apache.commons.math3.stat.descriptive.moment.Mean
import org.apache.commons.math3.stat.descriptive.moment.StandardDeviation
import org.jgrapht.graph.SimpleWeightedGraph
import org.jgrapht.util.SupplierUtil
import org.nlpub.watset.graph.ChineseWhispers
import org.nlpub.watset.graph.NodeWeighting
import org.nlpub.watset.graph.MaxMax
import org.nlpub.watset.eval.Measurer
import org.nlpub.watset.graph.Watset
dustalov / Makefile
Last active Jan 11, 2018
Extracting and cross-validating the WCL dataset of the 1.0 version
View Makefile
SEED = 1337
WCL_WRAPPER = /srv/definitions/wcl-extract
kfold: wiki_really_all.txt
./ --seed=$(SEED) $<
dustalov /
Last active Jan 2, 2018
Normalized Modified Purity in Python.
#!/usr/bin/env python
# This script computes the normalized modified purity and inverse purity
# as according to this paper:
# In fact, this program is currently quite a rough translation of
# the evaluation-verb-classes.perl script provided by Daisuke Kawahara.
import argparse
import re
import sys
dustalov / ztest.awk
Last active Feb 8, 2017
Pairwise statistical significance test in AWK using Z-test.
View ztest.awk
#!/usr/bin/awk -f
# significance level
if (length(ALPHA) == 0) ALPHA = 0.05;
# standard error estimation method: "basic" or "pooled"
if (length(SE) == 0) SE = "basic";
# one-tailed or two-tailed?
if (TAILS != 2) TAILS = 1;
dustalov / extract-relations.groovy
Last active Jun 5, 2018
Extract semantic relations from Wiktionary using JWKTL.
View extract-relations.groovy
#!/usr/bin/env groovy
import de.tudarmstadt.ukp.jwktl.JWKTL
import de.tudarmstadt.ukp.jwktl.api.filter.WiktionaryEntryFilter
import de.tudarmstadt.ukp.jwktl.api.util.Language
final languages = [en: Language.ENGLISH, ru: Language.RUSSIAN, de: Language.GERMAN]
if (args.length != 2 || !languages.containsKey(args[1] = args[1].toLowerCase())) {
throw new IllegalArgumentException('Required arguments: <PARSED-WIKTIONARY> en|ru|de')
dustalov /
Created Sep 13, 2016
A brute force decoder of Cyrillic strings with unknown charset combination.
#!/bin/bash -e
S=$(head -1)
CHARSETS=(utf8 cp1251 cp1252 koi8r koi8u iso-8859-5 maccyrillic)
for c1 in ${CHARSETS[*]}; do
for c2 in ${CHARSETS[*]}; do
for c3 in ${CHARSETS[*]}; do
for c4 in ${CHARSETS[*]}; do
echo -ne "$c1\t$c2\t$c3\t$c4\t"
<<<$S iconv -f=$c1 -t=$c2 -c | iconv -f=$c3 -t=$c4 -c
dustalov / ruscorpora.rb
Last active Apr 20, 2016
Fetch sentences from the Russian National Corpus.
View ruscorpora.rb
#!/usr/bin/env ruby
require 'net/http'
require 'uri'
require 'nokogiri'
Example =, :source)
def ruscorpora(word)
uri = URI('')
You can’t perform that action at this time.