Skip to content

Instantly share code, notes, and snippets.

View veer66's full-sized avatar

Vee Satayamas veer66

View GitHub Profile
@veer66
veer66 / read_expanded_dix.rb
Last active August 29, 2015 14:01
reading expanded (Apertium) dix
module LDIX
class Parser
def initialize
end
def parse_tag(raw)
if raw.nil?
[]
else
raw.map{|t| t[1..-2]}
@veer66
veer66 / stream_parser.rb
Last active August 29, 2015 14:01
Parsing Apertium stream without chunk
# Copyright (C) 2014 Vee Satayamas
#
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to
# the following conditions:
#
@veer66
veer66 / unk.t1x
Created July 20, 2014 14:13
unkown words in apertium
<spectre> <def-cat n="unknown">
<spectre> <cat-item tags=""/>
<spectre> </def-cat>
@veer66
veer66 / apertium.rb
Created July 23, 2014 12:38
Partial Apertium's stream parser (for parsing biltrans result)
module Apertium
OUTSIDE_WORD = 0
INSIDE_WORD = 1
class B
attr_reader :text
def initialize(text)
@text = text
end
@veer66
veer66 / apertium_dix_merger.rb
Last active August 29, 2015 14:06
Dictionaries merger for Apertium
require "nokogiri"
require "pp"
include Nokogiri
class Extra
def initialize
end
def child2txt(t)
@veer66
veer66 / rbmt.bash
Created September 18, 2014 09:20
running apertium with step-by-step logging
#!/bin/bash
LOGPREFIX=log_rbmt200_$$
ENGBIN=data/agri200p.dix.eng.trimmed.bin
BIDIX="data/agri200p.dix.engtha.bin"
LRX="data/eng-tha.lrx.bin"
T1X=data/_eng-tha.t1x
T1XBIN=data/_eng-tha.t1x.bin
T2X=data/eng-tha.t2x
T2XBIN=data/eng-tha.t2x.bin
@veer66
veer66 / chart_to_trees.py
Created September 21, 2014 16:37
Convert Earley's chart to trees (brute force)
import json
import sys
import re
import pprint
import copy
pp = pprint.PrettyPrinter(indent=4)
class Node(object):
def __init__(self, label, s=None, e=None, is_terminal=False, maxe=None):
@veer66
veer66 / Cargo.toml
Created November 16, 2014 06:31
Unlike some package managers, in Cargo, we can create multiple commands easily ^^.
[package]
name = "titi"
version = "0.0.1"
authors = ["Vee Satayamas <v@v.v>"]
[[bin]]
name = "x"
test = false
doc = false
@veer66
veer66 / server.rb
Last active August 29, 2015 14:09
A morphological tagging server using Sinatra.rb and Apertium
require "sinatra"
$p = IO.popen("lt-proc -z eng.automorf.bin | cg-proc -z eng-tha.rlx.bin", "r+")
def escape_stream(t)
t.gsub /([\^\$\/])/, '\\\\\1'
end
def tag(line)
$p.write escape_stream(line.chomp)
pw = nil
pc = 0
$stdin.each do |line|
line.chomp!
w, year, c1, c2 = line.split(/\t/)
if not pw.nil? and pw != w
puts "#{pw}\t#{pc}"
pw = nil
end