Skip to content

Instantly share code, notes, and snippets.

View hankcs's full-sized avatar
🐼
Dark circles everyday

hankcs

🐼
Dark circles everyday
View GitHub Profile

词的分类

  • 实词:名词、动词、形容词、状态词、区别词、数词、量词、代词
  • 虚词:副词、介词、连词、助词、拟声词、叹词。

ICTPOS3.0词性标记集

n 名词

nr 人名

@hankcs
hankcs / bdcurl.sh
Created October 18, 2015 03:59 — forked from meoow/bdcurl.sh
百度云命令行(bash)上传下载脚本
#!/bin/bash
# Baidu Yun Command Line Interface
# Depends: bash, curl, grep, awk, sed, od
# (They are basicly builtin tools of any *nix system.)
# Additionally, fastupload depends: head, wc, md5sum or md5, cksum
# (Which are also builtin tools)
#### Variables ####
@hankcs
hankcs / gist:53a20ef3cd5f246b9dd1
Created November 23, 2015 07:54
bpnn with hidden layer bias node
# coding=utf-8
# 反向传播神经网络
#
# Written in Python. See http://www.python.org/
# Placed in the public domain.
# Neil Schemenauer <nas@arctrix.com>
import math
import random
@hankcs
hankcs / alpha blend图层合并算法
Created May 6, 2016 03:49 — forked from ruilin/alpha blend图层合并算法
带alpha通道的图层合并算法,实现多个图层叠加后产生的新图层rgba数据
int clamp(int val) {
if (val < 0) return 0;
if (val > 255) return 255;
return val;
}
unsigned char layerMerge(unsigned char **layers, unsigned int layerCount, unsigned int layerWidth, unsigned int layerHeight) {
if (1 >= layerCount) return 0;
unsigned char isNewLayer = 0;
unsigned int i, j, byteCount;
@hankcs
hankcs / iso2usb.sh
Last active May 6, 2016 22:20
Bootable ISO to USB disk for Mac OSX
#!/bin/bash
##
# AUTHOR: Andy Savage <andy@savage.hk>
# GITHUB: www.github.com/hongkongkiwi
# DESCRIPTION: This script is for converting ISO files and burning them to a USB drive
##
HELP="USAGE: iso2usb blah.iso /dev/disk#"
cmake -DCMAKE_BUILD_TYPE=RELEASE ..
make
@hankcs
hankcs / OOV.py
Created November 23, 2017 03:26
OOV recognition trick in convseg
# -*- coding:utf-8 -*-
# Filename: OOV.py
# Author:hankcs
# Date: 2017-11-21 17:51
def load_words(path, dict):
with open(path) as src:
for line in src:
dict.update(line.split())
@hankcs
hankcs / restore_collapse_edges.py
Created May 6, 2020 16:02
Script to restore empty nodes for IWPT 2020
# -*- coding:utf-8 -*-
def load_conll_to_str(path):
"""
Load a conll file to a list of strings, each string represents a sentence in conll format
:rtype: list
"""
with open(path) as src:
text = src.read()
@hankcs
hankcs / compile_coref_data.sh
Created July 10, 2020 01:04
CoNLL Ontonotes 2012 data preprocessing, adopted from AllenNLP
#!/bin/bash
# This script downloads and compiles the Ontonotes 2012 data in a helpful format
# for co-reference resolution. It generates 3 files: {train, dev, test}.english.v4_gold_conll,
# as well as a directory 'conll-2012' which contains the raw extracted data.
# The script downloads and runs some python scripts which require python 2.X.
ONTONOTES_PATH=$1
if [ ! -n "$ONTONOTES_PATH" ] ; then
@hankcs
hankcs / ontonotes_to_conll.sh
Last active August 17, 2022 10:41
This script downloads and compiles the Ontonotes 2012 data into conll format. Modified from https://github.com/allenai/allennlp/blob/c4c532d25e012dbe6ab1ac14bca75e53e0acc621/scripts/compile_coref_data.sh
#!/bin/bash
# This script downloads and compiles the Ontonotes 2012 data in a helpful format
# for co-reference resolution. It generates 3 files: {train, dev, test}.english.v4_gold_conll,
# as well as a directory 'conll-2012' which contains the raw extracted data.
# The script downloads and runs some python scripts which require python 2.X.
ONTONOTES_PATH=$1
LANGUAGE=$2