Skip to content

Instantly share code, notes, and snippets.

View gumblex's full-sized avatar

Dingyuan Wang gumblex

View GitHub Profile
@gumblex
gumblex / getparallel.py
Last active June 4, 2022 21:58
Convert Tatoeba dumps into a SQLite database.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
'''
Get parallel corpus in Moses-style text from converted Tatoeba SQLite database.
Copyright (c) 2016 gumblex
This work is free. You can redistribute it and/or modify it under the
terms of the Do What The Fuck You Want To Public License, Version 2,
@gumblex
gumblex / findbadlines.py
Last active January 19, 2016 03:00
Find lines with encoding errors.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
'''
This script tries to find encoding errors in stdin and prints out the bad lines.
Usage:
python3 findbadlines.py [encoding]
The default encoding is utf-8.
@gumblex
gumblex / coffeescript.txt
Created January 13, 2016 02:08
CoffeeScript Grammar
/* converted on Wed Jan 13, 2016, 10:06 (UTC+08) by jison-to-w3c v0.35.1152 which is Copyright (c) 2011-2015 by Gunther Rademacher <grd@gmx.net> */
Root ::= Body?
Body ::= Line ( TERMINATOR Line | TERMINATOR )*
Line ::= Expression
| Statement
Statement
::= Return
| Comment
| STATEMENT
@gumblex
gumblex / 词性标记.md
Last active July 12, 2022 07:05 — forked from luw2007/词性标记.md
词性标记: 包含 ICTPOS3.0词性标记集、ICTCLAS 汉语词性标注集、jieba 字典中出现的词性、simhash 中可以忽略的部分词性

词的分类

  • 实词:名词、动词、形容词、状态词、区别词、数词、量词、代词
  • 虚词:副词、介词、连词、助词、拟声词、叹词。

ICTPOS3.0词性标记集

n 名词

nr 人名

@gumblex
gumblex / zip64.py
Created November 14, 2015 11:16
Simple Python command line utility to create Zip64 files.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Simple command line utility to create Zip64 files.
For Python 3.3+
Most code are from the standard library `zipfile` and `shutil`.
"""

Keybase proof

I hereby claim:

  • I am gumblex on github.
  • I am gumblex (https://keybase.io/gumblex) on keybase.
  • I have a public key whose fingerprint is 8C4E 0F2D B084 A9FB E50B 4AE3 B3E4 D83E 3F3E 5FDC

To claim this, I am signing this object:

@gumblex
gumblex / PathFitter.py
Created April 4, 2015 12:01
Path fitter in Python - An Algorithm for Automatically Fitting Digitized Curves
"""
Ported from Paper.js - The Swiss Army Knife of Vector Graphics Scripting.
http://paperjs.org/
Copyright (c) 2011 - 2014, Juerg Lehni & Jonathan Puckey
http://scratchdisk.com/ & http://jonathanpuckey.com/
Distributed under the MIT license. See LICENSE file for details.
All rights reserved.
@gumblex
gumblex / figcaptcha.py
Created March 28, 2015 11:45
Use FIGlet (ASCII art) as CAPTCHA, with a noise generator
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright 2015 Gumble
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
@gumblex
gumblex / num2chinese.py
Created February 8, 2015 02:46
Numbers to Chinese representations converter in Python. 中文数字转换
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Licensed under WTFPL or the Unlicense or CC0.
# This uses Python 3, but it's easy to port to Python 2 by changing
# strings to u'xx'.
import itertools
def num2chinese(num, big=False, simp=True, o=False, twoalt=False):
@gumblex
gumblex / 65-source-sans-fonts.conf
Created August 18, 2014 08:28
Fontconfig file for Source Sans Pro and Source Han Sans series.
<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
<match target="pattern">
<test qual="any" name="family">
<string>Source Sans</string>
</test>
<edit name="family" mode="assign">
<string>Source Sans Pro</string>
</edit>