Skip to content

Instantly share code, notes, and snippets.

View Kaljurand's full-sized avatar

Kaarel Kaljurand Kaljurand

View GitHub Profile
@Kaljurand
Kaljurand / estonian_tts_test.txt
Last active April 22, 2018 21:58
Estonian text for testing a text-to-speech engine. Save it into Pocket on Android.
Täita eesmärk.
Koolidirektor Johannes Paulus II, kes tänavuaastal peab kooli vahetama, lubas, et tema aastapalk saab olema suurem kui kooli puupalk: 123.456,50 EUR. Liina sai lina eest 5 pudelit viina. Kus on kass? Kas läks kooli või kooli taha?
Vihma- ja lörtsisadu levib keskpäevaks saartelt mandrile ja laieneb edasi ida-kirde suunas, sisemaal tuleb sekka ka lund. Puhub lõuna- ja kagutuul 5-12, rannikul puhanguti 15, saartel ja Liivi lahe ümbruses kuni 20 m/s. Õhutemperatuur on 0..+4°C.
Žiguli (VAZ-2101) kaalub 100kg, kasutab bensiini AИ-93 ГОСТ 2084-77, saavutab 2 sek-iga kiiruse 100 km/h. Töömaht dm3 (l): 1,198.
Aoäia õe uue oaõieaia õueaua ööau aoäia õe uue oaõieaia õueaua ööau. Anna õlu üle Ülo õe õla.
@Kaljurand
Kaljurand / my-server.py
Created August 3, 2017 23:06
Simple web server that allows talking to Mycroft via HTTP queries like "curl http://192.168.0.23:8000?q=tell+me+a+joke" and getting a JSON response
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Simple web server that allows talking to Mycroft via HTTP queries like
# curl http://192.168.0.23:8000?q=tell+me+a+joke
# The response is a JSON object. It could be something general or something
# caller specific. (The examples are callbacks for K6nele.)
#
# Installation:
# 1. Allow access to the HTTP port: sudo ufw allow 8000
#! /usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import division, unicode_literals, print_function
import sys
import re
import argparse
from collections import *
Verifying that +kaarel is my Bitcoin username. You can send me #bitcoin here: https://onename.io/kaarel
@Kaljurand
Kaljurand / gist:8333643
Created January 9, 2014 12:50
Experiment with segmenting Estonian placenames. Main goal was to split off a meaningful suffix. In the experiment we kept all parts which were at least 4 characters long.
$ wc placenames.txt
4416 4422 36452 placenames.txt
$ morfessor -t placenames.txt -s placenames_model.pickled
INFO:morfessor.io:Reading corpus from 'placenames.txt'...
INFO:morfessor.io:Detected utf-8 encoding
INFO:morfessor.io:Detected utf-8 encoding
INFO:morfessor.io:Done.
INFO:morfessor.baseline:Compounds in training data: 4417 types / 4417 tokens
INFO:morfessor.baseline:Starting batch training