- ruby stemmer (expose libstemmer_c to Ruby)
- stuff classifier (text classifier; naive bayes & tf/idf)
- ve linguistic framework (base form of words, sentence detection, POS, tranliterations, 日本語, english, mecab, freeling)
- ruby nlp (n-grams extraction, corpus extractor, brown corpus)
- lexeme ( simple lexical analyzer)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require "bundler/inline" | |
# allows for declaring a Gemfile inline in a ruby script | |
# optionally installing any gems that aren't already installed | |
gemfile(true) do | |
source "https://rubygems.org" | |
gem "rails", "6.1.4.1" | |
gem "sqlite3" | |
gem "graphql", "~> 1.12" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// WebL Website (archived): http://web.archive.org/web/20070507043202/http://www.hpl.hp.com/downloads/crl/webl/index.html | |
// get unique countrynames for the 500 top world universities | |
// get number of universities per country, pretty print statistics with a horizontal bar chart | |
// data from 2009 version of http://www.arwu.org/ARWU2009.jsp | |
import Str; | |
var startpage = GetURL("http://www.arwu.org/ARWU2009.jsp"); |
The challenge was to write a short script in a scripting language of choice that takes in an image of mixed up uniform sized shreds and pieces them back into the unshredded and reconstituted image. That means the original image was divided into an even number of columns of same size and than those columns were shuffled randomly. Additionally the script should auto-detecting how wide the uniform strips are.
Before I started on my solution, I made some quick assumptions to simplify things:
- I wanted to code it in Ruby, simply because it's a great language and I'm quite productive in it.
- I wanted to define a simple distance measure which can be used for auto-detecting the column size and putting the shredded image back into its original state.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/perl -w | |
use strict; | |
use Mail::MboxParser; | |
use constant MAILBOX_DIR => "/var/mail/mymailboxdir"; | |
use constant STORAGE_DIR => "/data"; | |
if ( -z MAILBOX_DIR ) { | |
print STDERR "no mail.\n"; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require './lib/indexer.rb' | |
# rake task to build index | |
task :buildindex do | |
# mixin to overwrite get_class method | |
Indexer.class_eval do | |
def get_class f | |
f.split('/')[-2]# get class assignment implicitly with folder structure (first hierarchy level) | |
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# from http://www.matasano.com/log/basic-uncommented-crappy-binary-radix-trie/ | |
class Fixnum | |
def to_b(l = 8) | |
"0′" + self.to_s(2).rjust(l, "0") | |
end | |
def set?(i) | |
if((self & (1 << i)) != 0) | |
return true | |
else |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# source: http://stackoverflow.com/questions/3002650/parsing-dictionary-entries-with-regex | |
# | |
# example_url: http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1ZUJ%E5%85%88%E7%94%9F | |
# output: | |
# 先生 [せんせい] /(n) (1) teacher/master/doctor/(suf) (2) with names of teachers, etc. as an honorific/(P)/ | |
# 先生に就く [せんせいにつく] /(exp,v5k) to study under (a teacher)/ | |
# 先生の述 [せんせいのじゅつ] /(n) teachers statement (expounding)/ | |
# 先生方 [せんせいがた] /(n) doctors/teachers/ | |
# regexp |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class Time | |
def self.random(years_back=5) | |
year = Time.now.year - rand(years_back) - 1 | |
month = rand(12) + 1 | |
day = rand(31) + 1 | |
Time.local(year, month, day) | |
end | |
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# from http://snippets.dzone.com/posts/show/4638 | |
require 'rubygems' | |
require 'open-uri' | |
require 'net/http' | |
def remote_file_exists?(url) | |
url = URI.parse(url) | |
Net::HTTP.start(url.host, url.port) do |http| | |
return http.head(url.request_uri).code == "200" |
NewerOlder