Skip to content

Instantly share code, notes, and snippets.

View IlnarSelimcan's full-sized avatar

Ilnar Salimzianov IlnarSelimcan

View GitHub Profile
selimcan@patroclus:~/src/apertium/incubator/apertium-tat-eng$ echo "социалистир" | hfst-ospell -S ../../languages/apertium-tat/tat.zhfst
"социалистир" is NOT in the lexicon:
Corrections for "социалистир":
социалистик 1.000000
selimcan@patroclus:~/src/apertium/incubator/apertium-tat-eng$ echo "социалистир" | hfst-ospell -S tat-eng.zhfst
"социалистир" is NOT in the lexicon:
Unable to correct "социалистир"!
selimcan@patroclus:~/src/apertium/incubator/apertium-tat-eng$ echo "гйынвар" | hfst-ospell -S tat-eng.zhfst
#!/usr/bin/env bash
git clone https://github.com/apertium/apertium-quality.git
cd apertium-quality/mwtools/python3
sudo python3 ./setup.py install
cd ../../
./autogen.sh && make && sudo make install
cd ../../../
#!/usr/bin/env bash
git clone https://github.com/IlnarSelimcan/apertium-quality.git
cd apertium-quality/mwtools/python3
sudo python3 ./setup.py install
cd ../../
./autogen.sh && make && sudo make install
cd ../../../
## A little script to test morphology/morphophonology, originally written by spectie
## for apertium-chv.
##
## USAGE: python3 test.py <lang code>
## python3 test.py <lang code> <tsv file>
## where tsv file is a tsv file with three columns:
## 1. direction restriction, which is either _ (no restriction), > (test generation only) and
## < test analysis only.
## 2. lexcial form
## 3. surface form
#!/usr/bin/env bash
## Downloads a "pages-articles-multistream.xml.bz2" Wikipedia dump:
## - for the language LANG (iso2 or iso3 code),
## - from day DATE (in yyyymmdd format or "latest")
## makes a frequency list out of it,
## measures MODE's coverage on that freqeuncy list,
## and compares it with coverage of the previos revision of that mode.
##
## USAGE: ./test-cov-on-wiki.sh <lang> <date> <mode>
nog: commit 6f65e512b45e04ef9f177ea8e1adf6ba26cb648e
stems: 1367
bible coverage
Number of tokenised words in the corpus: 189329
Coverage: 81.88%
Top unknown words in the corpus:
343 Масих
341 а
306 Раббий
233 Кие
#lang rash
;; ASSUME: - this script is placed into apertium-all/
;; - hfst-covtest is in the PATH
;; USAGE: racket apertium-turkic-bilingual-stats.rkt > /tmp/bilingual 2>&1
;; REQUIRES: racket
;; rash (install with "raco pkg install rash")
(provide MONOLINGUAL BILINGUAL)
@IlnarSelimcan
IlnarSelimcan / scrape_coos_county.py
Created February 7, 2020 06:00
An example of me scraping a website using Python3 (with Requests & BeautifulSoup libraries)
## A script to scrape all listings on this site:
## https://www.point2homes.com/US/Land-For-Sale/NH/Coos-County.html
##
## into the following csv format:
##
## Name,Address,Amount,Acres,Type,Misc
##
## e.g.
## Name,Address,Amount,Acres,Type,Misc
## "L52 Cloutier, Stark, NH","Stark, NH","$27,500","5.16","5 days on Point2 Homes"
@IlnarSelimcan
IlnarSelimcan / conll18_ud_eval_lax.py
Last active March 9, 2020 22:00
conll18_ud_eval_lax.py
#!/usr/bin/env python3
# Compatible with Python 2.7 and 3.2+, can be used either as a module
# or a standalone executable.
#
# Copyright 2017, 2018 Institute of Formal and Applied Linguistics (UFAL),
# Faculty of Mathematics and Physics, Charles University, Czech Republic.
#
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
apertium-kaz$ echo "Біздің компания да осы процеске белсене қатысқысы келеді." | apertium-destxt -n | apertium -f none -d . kaz-morph | cg-conv -la | apertium-retxt | python3 ~/src/sourceforge-apertium/branches/kaz-tagger/kaz_tagger.py | vislcg3 -g apertium-kaz.kaz.rlx | python3 ../ud-scripts/vislcg3-to-conllu.py "" 2> /dev/null | python3 ../ud-scripts/conllu-feats.py apertium-kaz.kaz.udx 2> /dev/null | python3 ../ud-scripts/conllu-nospaceafter.py 2> /dev/null
# sent_id = :1:0
# text = Біздің компания да осы процеске белсене қатысқысы келеді.
1 Біздің біз NOUN n Case=Gen 2 nmod:poss _ _
2-3 компания да _ _ _ _ _ _ _ _
2 компания компания NOUN n Case=Nom 7 nsubj _ _
3 да да ADV postadv _ 7 X _ _
4 осы осы NOUN n Case=Nom 5 obj _ _
5 процеске процесс NOUN n Case=Dat 7 obl _ _
6 белсене белсен VERB v Aspect=Imp|VerbForm=Cov 7 X _ _