Skip to content

Instantly share code, notes, and snippets.

@lukhnos
lukhnos / whoosh-cjk-analyser.md
Created February 4, 2014 09:12
How to Use Whoosh to Index Documents that Contain CJK Characters (First Take)
View whoosh-cjk-analyser.md

Whoosh's default analyzer does not handle CJK characters (in particular Chinese and Japanese) well. If you pass typical Chinese or Japanese paragraphes, often you'll find an entire sentence is treated as one token.

A Whoosh analyzer is consists of one tokenizer and zero or more filters. As a result, we can easily use this recipe from Lucene's CJKAnalyzer:

An Analyzer that tokenizes text with StandardTokenizer, normalizes content with CJKWidthFilter, folds case with LowerCaseFilter, forms bigrams of CJK with CJKBigramFilter, and filters stopwords with StopFilter

Which inspired me to make this first take:

class CJKFilter(Filter):
    def __call__(self, tokens):
@lukhnos
lukhnos / otpbox.py
Created February 6, 2015 08:24
A simple command line pyotp wrapper
View otpbox.py
import argparse
import json
import pyotp
def main():
parser = argparse.ArgumentParser()
parser.add_argument('json', nargs=1)
parser.add_argument('key', nargs=1)
@lukhnos
lukhnos / PreadBug.java
Last active August 29, 2015 14:23
libcore.io.Posix.preadBytes bug reproducible sample
View PreadBug.java
// To reproduce the bug:
//
// j2objc PreadBug.java
// j2objcc PreadBug.m
// ./a.out PreadBug
//
// You'll see the following exception:
//
// java.io.IOException: pread failed: EBADF (Bad file descriptor)
View NSString+ LocalizedDoubleValue.m
@interface NSString (LocalizedDoubleValue)
- (double)localizedDoubleValue;
@end
@implementation NSString (LocalizedDoubleValue)
- (double)localizedDoubleValue
{
NSScanner *scanner = [NSScanner localizedScannerWithString:self];
double newValue = 0.0;
View gist:180813
rdar://7198300
04-Sep-2009 05:29 PM Lukhnos D. Liu:
Summary:
Terminal.app in 10B503 often crashes when it wraps a line of overlong text, most often when an IME is in effect and the overlong text includes uncommitted buffer (composing buffer/reading text).
Steps to Reproduce:
1. Switch to Kotoeri (Hiragana) input method
2. Open Terminal.app under 10B503
3. Open irssi, a popular IRC client
View gist:180814
rdar://7198283
Terminal.app pukes huge amount of HIToolbox exceptions onto Console, CJK-related
04-Sep-2009 05:13 PM Lukhnos D. Liu:
Summary:
Terminal.app in 10B503 pukes huge amount of HIToolbox exceptions onto Conosle. Often it's input method-related.
Steps to Reproduce:
1. Switch to Kotoeri (Hiragana) input method
View gist:181045
Hi Mr. Schiller,
I would like to inform you a defect in the current shipping Snow Leopard that I think is sub-Apple standard and afflicts many of your loyal users, particularly visual designers, web designers and people who care about typefaces, in Taiwan and Hong Kong.
My name is Lukhnos. I'm a Taiwanese Mac and iPhone software developer. I would like to congratulate on your successful release of Mac OS X Snow Leopard. For many of us the upgrade process was smooth, and we appreciate that Apple continues to deliver a high-quality, high-performance operating system that is suitable for both daily life and professional demands.
There is unfortunately, however, a defect in the currently shipping Snow Leopard. It causes daily visual pain for many of us Traditional Chinese users.
Snow Leopard ships with a new set of sans serif Chinese fonts. One of them, Hei TC (TC stands for Traditional Chinese), seems to be Apple's official replacement of the long-serving LiHei Pro. I can see Apple has been spending efforts
View gist:182734
Hi Mr. Jobs:
I would like to inform you that an erroneous font shipped with Snow Leopard is sub-Apple standard. The font afflicts many of your loyal users, particularly visual designers, web designers and people who care about typefaces, in Taiwan and Hong Kong.
My name is Lukhnos. I'm a Taiwanese Mac and iPhone software developer. I would like to congratulate on your successful release of Mac OS X Snow Leopard. For many of us, the upgrade process was smooth. We appreciate that Apple continues delivering such a high-quality, high-performance operating system that is suitable for both daily life and professional needs.
There is, however, one problem that gravely undermines such experience. It's a new font that is causing daily visual pain to many Traditional Chinese users.
Snow Leopard ships with a new set of sans serif Chinese fonts. One of them, Hei TC ("TC" for Traditional Chinese), seems to be Apple's official replacement of the long-serving LiHei Pro. I can see Apple has been spending efforts in provi
@lukhnos
lukhnos / ThrowableLeaks.java
Last active September 4, 2015 15:34
Demonstrates that Throwable leaks memory in j2objc, see https://github.com/google/j2objc/issues/601
View ThrowableLeaks.java
import com.google.j2objc.annotations.AutoreleasePool;
public class ThrowableLeaks {
public static void main(String args[]) {
for (int i = 0; ; i++) {
foo(i);
bar(i);
try {
Thread.sleep(1000);
} catch (Exception e) {
@lukhnos
lukhnos / NIOLeaks.java
Last active September 4, 2015 15:34
Demonstrates that FileChannelImpl and FileChannel leak memory in j2objc, see https://github.com/google/j2objc/issues/603
View NIOLeaks.java
import com.google.j2objc.annotations.AutoreleasePool;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.channels.FileChannel;
import java.nio.channels.FileLock;
public class NIOLeaks {
public static void main(String args[]) {