Skip to content

Instantly share code, notes, and snippets.

@jimregan
jimregan / apertium_aprilfirst.cc
Created February 25, 2012 11:20
April 1st filter for Apertium
/*
* Copyright (C) 2005 Universitat d'Alacant / Universidad de Alicante
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
@jimregan
jimregan / expected-output.xml
Created February 25, 2012 11:39
Generates XML rules for LanguageTool for words that can be written separately, but ought to be written together. LGPL. (From this thread: https://sourceforge.net/mailarchive/forum.php?thread_name=4F2E7385.5070404%40wp.pl&forum_name=languagetool-devel)
<rule id="NA_WZAJEM" name="„na wzajem” (nawzajem)">
<pattern>
<token>na</token>
<token>wzajem</token>
</pattern>
<message>Ten wyraz zwykle pisze się łącznie: <suggestion>\1\2</suggestion>.</message>
<short>Prawdopodobna literówka</short>
<example correction="nawzajem" type="incorrect">Oni kochają się <marker>na wzajem</marker>.</example>
<example type="correct">Oni kochają się nawzajem.</example>
</rule>
@jimregan
jimregan / rect64tomagick.pl
Created February 25, 2012 12:21
Perl subs for getting an ImageMagick region from a Picasa rect64
sub padrect {
my $rin = shift;
my $rout;
if (length($rin) == 16) {
return $rin;
} elsif (length($rin) > 16) {
return $rout; # can't process, return undef
} else {
my $diff = (16 - length($rin));
for (my $i = 0; $i < $diff; $i++) {
@jimregan
jimregan / spotlight.diff
Created February 26, 2012 01:50
diff of dbpedia spotlight for Polish/to get it to run.
Index: demo/pom.xml
===================================================================
--- demo/pom.xml (revision 367)
+++ demo/pom.xml (working copy)
@@ -24,7 +24,8 @@
<parent>
<artifactId>spotlight</artifactId>
<groupId>org.dbpedia.spotlight</groupId>
- <version>${dbpedia.spotlight.version}</version>
+ <version>0.6</version>
@jimregan
jimregan / blacklistedURIPatterns.pl.txt
Created February 26, 2012 15:01
Blacklisted URI patterns for pl.wikipedia
^Lista_.+
^Wikiprojekt:.+
^Portal:.+
.+(ujednoznacznienie)$
@jimregan
jimregan / hallucination.ttl
Created March 1, 2012 15:42
RDFa data mirage
@prefix og: <http://ogp.me/ns#> .
@prefix fb: <http://ogp.me/ns/fb#> .
@prefix zimbiofb: <http://ogp.me/ns/fb/zimbiofb#> .
<http://www.zimbio.com/photos/Aiste+Paskeviciute/Luck+Attitude+Launch+Party+3/XFhu3CPOWRW>
fb:app_id "137068566357971" ;
og:site_name "Zimbio";
og:type zimbiofb:photostream ;
og:url <http://www.zimbio.com/photos/Aiste+Paskeviciute/Luck+Attitude+Launch+Party+3/XFhu3CPOWRW> ;
og:title "Aiste Paskeviciute Photostream" ;
#!/usr/bin/perl
use warnings;
use strict;
use Encode::Escape;
use utf8;
binmode STDIN, ":utf8";
binmode STDERR, ":utf8";
@jimregan
jimregan / facedetect.pl
Created March 14, 2012 23:22
Face detector demo.
#!/usr/bin/perl
use warnings;
use strict;
use Image::ObjectDetect;
use Digest::MD5 qw(md5_hex);
use Image::ExifTool qw(:Public);
use POSIX qw/strftime/;
use Image::Magick;
@jimregan
jimregan / ru-cawikipedia.translit.txt
Created March 15, 2012 13:31
An attempt at the Russian transliteration scheme used by ca.wikipedia
А <> A;
а <> a;
Б <> B;
б <> b;
В <> V;
в <> v;
Г } [А] <> GU;
г } [а] <> Gu;
Г <> G;
г <> g;
@jimregan
jimregan / czech_light.Unicode.sbl
Created March 15, 2012 16:34
Trying to figure out snowball
routines (
RV R1
palatalise
mark_regions
do_possessive
do_case
do_comparative
do_diminutive
do_augmentative
do_derivational