Skip to content

Instantly share code, notes, and snippets.

@lukhnos
lukhnos / whoosh-cjk-analyser.md
Created February 4, 2014 09:12
How to Use Whoosh to Index Documents that Contain CJK Characters (First Take)

Whoosh's default analyzer does not handle CJK characters (in particular Chinese and Japanese) well. If you pass typical Chinese or Japanese paragraphes, often you'll find an entire sentence is treated as one token.

A Whoosh analyzer is consists of one tokenizer and zero or more filters. As a result, we can easily use this recipe from Lucene's CJKAnalyzer:

An Analyzer that tokenizes text with StandardTokenizer, normalizes content with CJKWidthFilter, folds case with LowerCaseFilter, forms bigrams of CJK with CJKBigramFilter, and filters stopwords with StopFilter

Which inspired me to make this first take:

class CJKFilter(Filter):
    def __call__(self, tokens):
@lukhnos
lukhnos / otpbox.py
Created February 6, 2015 08:24
A simple command line pyotp wrapper
import argparse
import json
import pyotp
def main():
parser = argparse.ArgumentParser()
parser.add_argument('json', nargs=1)
parser.add_argument('key', nargs=1)
@lukhnos
lukhnos / PreadBug.java
Last active August 29, 2015 14:23
libcore.io.Posix.preadBytes bug reproducible sample
// To reproduce the bug:
//
// j2objc PreadBug.java
// j2objcc PreadBug.m
// ./a.out PreadBug
//
// You'll see the following exception:
//
// java.io.IOException: pread failed: EBADF (Bad file descriptor)
@interface NSString (LocalizedDoubleValue)
- (double)localizedDoubleValue;
@end
@implementation NSString (LocalizedDoubleValue)
- (double)localizedDoubleValue
{
NSScanner *scanner = [NSScanner localizedScannerWithString:self];
double newValue = 0.0;
rdar://7198300
04-Sep-2009 05:29 PM Lukhnos D. Liu:
Summary:
Terminal.app in 10B503 often crashes when it wraps a line of overlong text, most often when an IME is in effect and the overlong text includes uncommitted buffer (composing buffer/reading text).
Steps to Reproduce:
1. Switch to Kotoeri (Hiragana) input method
2. Open Terminal.app under 10B503
3. Open irssi, a popular IRC client
rdar://7198283
Terminal.app pukes huge amount of HIToolbox exceptions onto Console, CJK-related
04-Sep-2009 05:13 PM Lukhnos D. Liu:
Summary:
Terminal.app in 10B503 pukes huge amount of HIToolbox exceptions onto Conosle. Often it's input method-related.
Steps to Reproduce:
1. Switch to Kotoeri (Hiragana) input method
Hi Mr. Schiller,
I would like to inform you a defect in the current shipping Snow Leopard that I think is sub-Apple standard and afflicts many of your loyal users, particularly visual designers, web designers and people who care about typefaces, in Taiwan and Hong Kong.
My name is Lukhnos. I'm a Taiwanese Mac and iPhone software developer. I would like to congratulate on your successful release of Mac OS X Snow Leopard. For many of us the upgrade process was smooth, and we appreciate that Apple continues to deliver a high-quality, high-performance operating system that is suitable for both daily life and professional demands.
There is unfortunately, however, a defect in the currently shipping Snow Leopard. It causes daily visual pain for many of us Traditional Chinese users.
Snow Leopard ships with a new set of sans serif Chinese fonts. One of them, Hei TC (TC stands for Traditional Chinese), seems to be Apple's official replacement of the long-serving LiHei Pro. I can see Apple has been spending efforts
Hi Mr. Jobs:
I would like to inform you that an erroneous font shipped with Snow Leopard is sub-Apple standard. The font afflicts many of your loyal users, particularly visual designers, web designers and people who care about typefaces, in Taiwan and Hong Kong.
My name is Lukhnos. I'm a Taiwanese Mac and iPhone software developer. I would like to congratulate on your successful release of Mac OS X Snow Leopard. For many of us, the upgrade process was smooth. We appreciate that Apple continues delivering such a high-quality, high-performance operating system that is suitable for both daily life and professional needs.
There is, however, one problem that gravely undermines such experience. It's a new font that is causing daily visual pain to many Traditional Chinese users.
Snow Leopard ships with a new set of sans serif Chinese fonts. One of them, Hei TC ("TC" for Traditional Chinese), seems to be Apple's official replacement of the long-serving LiHei Pro. I can see Apple has been spending efforts in provi
@lukhnos
lukhnos / ThrowableLeaks.java
Last active September 4, 2015 15:34
Demonstrates that Throwable leaks memory in j2objc, see https://github.com/google/j2objc/issues/601
import com.google.j2objc.annotations.AutoreleasePool;
public class ThrowableLeaks {
public static void main(String args[]) {
for (int i = 0; ; i++) {
foo(i);
bar(i);
try {
Thread.sleep(1000);
} catch (Exception e) {
@lukhnos
lukhnos / NIOLeaks.java
Last active September 4, 2015 15:34
Demonstrates that FileChannelImpl and FileChannel leak memory in j2objc, see https://github.com/google/j2objc/issues/603
import com.google.j2objc.annotations.AutoreleasePool;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.channels.FileChannel;
import java.nio.channels.FileLock;
public class NIOLeaks {
public static void main(String args[]) {