Skip to content

Instantly share code, notes, and snippets.

View veer66's full-sized avatar

Vee Satayamas veer66

View GitHub Profile
# encoding: UTF-8
require 'thailang4r/word_breaker'
word_breaker = ThaiLang::WordBreaker.new
File.open("data1.txt", "r:UTF-8") do |file|
txt = file.read
puts word_breaker.break_into_words(txt)
end
@veer66
veer66 / pali_thai_roman
Last active August 29, 2015 13:57
Thai character and latin alphabet mapping for Pali
อ a
อิ i
อุ u
อา ā
อี ī
อู ū
เอ e
โอ o
@veer66
veer66 / server.js
Created March 19, 2014 12:25
JSON express.js
var getBody = require('raw-body');
// ...
app.post("/do_sth_with_json", function(req, res) {
getBody(req, {
limit: '1mb',
length: req.headers['content-length'],
encoding: 'utf8'
}, function (err, buf) {
require "nokogiri"
require "pp"
class EngDix
def initialize(monodix_path)
@word_hash = {}
File.open(monodix_path) do |file|
while file.gets
line = $_.chomp
if line =~ /^\s+<e/
@veer66
veer66 / thwiktionary_extract.go
Created April 17, 2014 17:54
There are still so many errors.
package main
// Based on https://github.com/dps/go-xml-parse/blob/master/go-xml-parse.go
import (
"fmt"
"os"
"flag"
"encoding/xml"
"strings"
@veer66
veer66 / thaidict.json
Created April 17, 2014 18:26
คำศัพท์ที่ดึงมาจาก th.wikipedia.org License น่าจะเป็น GPL?
{"Li":"mathematics","Gloss":["คณิตศาสตร์"]}
{"Li":"calculus","Gloss":["แคลคูลัส","กรวด","หิน"]}
{"Li":"a","Gloss":["สัทอักษรสากล"]}
{"Li":"car","Gloss":["รถราง"]}
{"Li":"nose","Gloss":["จมูก"]}
{"Li":"I love you","Gloss":["ฉันรักคุณ"]}
{"Li":"poet","Gloss":["กวี"]}
{"Li":"eat","Gloss":["กิน","รับประทาน"]}
{"Li":"consume","Gloss":["ใช้","กิน","เผลาผลาญ"]}
{"Li":"sweet","Gloss":["หวาน","น่ารัก","ยอดเยี่ยม","ขั้นเทพ"]}
@veer66
veer66 / gcide_extract.rb
Last active August 29, 2015 14:00
Parsing and extracting headwords and part-of-speech from GCIDE and save them to GDBM
require "nokogiri"
require "json"
require 'gdbm'
class LiPosFromGcideExtractor
def parse_each_file(filename)
File.open(filename, "r:ISO-8859-1") do |file|
chunks = file.read
.split(/\n\n/)
.select{|chunk| chunk =~ /^[<\[]\w/}

Not

case 1

I am not a man

ฉันไม่ใช่ผู้ชาย

chan/prn mai-chai/adv phu-chai/n

case 2

<?xml version="1.0" encoding="UTF-8"?>
<!-- -*- nxml -*- -->
<transfer default="chunk">
<section-def-cats>
<def-cat n="nom">
<cat-item tags="n.*"/>
</def-cat>
<def-cat n="adj">
<cat-item tags="adj.*"/>
</def-cat>
@veer66
veer66 / metadix_expand.rb
Created May 10, 2014 04:32
I try to expand (like lt-expand) English metadix (in apertium-en-es)
require "nokogiri"
include Nokogiri::XML
def main
en_dix_path = File.join(File.dirname(__FILE__),
"apertium-dix",
"apertium-en-es.en.metadix")