Skip to content

Instantly share code, notes, and snippets.

View tamuhey's full-sized avatar
🏠
Working from home

Yohei Tamura tamuhey

🏠
Working from home
View GitHub Profile
default_language_version:
python: python3.7
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.4.0
hooks:
- id: check-added-large-files
args: ['--maxkb=1000']
- id: check-merge-conflict

Camphr: spaCy plugin for Transformers, Udify, Elmo, etc.

Hi, I'm Yohei Tamura, a software engineer at PKSHA Technology. I recently published a spaCy plugin called Camphr, which helps in seamless integration for a wide variety of techniques from state-of-the-art to conventional ones. You can use Transformers, Udify, ELmo, etc. on spaCy.

This post introduces how to use Camphr in a nutshell.

Why I chose spaCy

spaCy is an awesome NLP framework and in my opinion has following advantages:

"""Script to convert bccwj NER dataset to jsonl
Usage:
$ python bccwj2jsonl xml/ output/
# convert to irex
$ pythonn bccwj2jsonl xml/ output/ irex
"""
@tamuhey
tamuhey / oss_license.md
Last active February 4, 2020 10:26
OSSライセンス検討

MITではなくApache2にする.

参考

@tamuhey
tamuhey / install_mecab.sh
Last active February 10, 2020 06:24
install mecab
export MECAB_URL="https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7cENtOXlicTFaRUE" && \
export IPADIC_URL="https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7MWVlSDBCSXZMTXM" && \
cd /tmp && \
wget --no-check-certificate ${MECAB_URL} -O mecab.tar.gz && \
tar xzvf mecab.tar.gz && cd mecab-0.996 && ./configure && make && make check && make install && \
rm -rf /tmp/* && \
cd /tmp && \
wget --no-check-certificate ${IPADIC_URL} -O ipadic.tar.gz && \
tar xzvf ipadic.tar.gz && cd mecab-ipadic-2.7.0-20070801 && ./configure --with-charset=utf8 && ldconfig && make && make install && \
rm -rf /tmp/*
[tool.poetry]
name = "foo"
version = "0.1.0"
description = ""
[tool.poetry.dependencies]
python = "^3.7"
bar = {path = "bar"}
[tool.poetry.dev-dependencies]
@tamuhey
tamuhey / mecab_setup.sh
Last active January 16, 2020 12:01
mecab installation one-liner
pip install gdown --user
gdown https://drive.google.com/uc?id=0B4y35FiV1wh7cENtOXlicTFaRUE
gdown https://drive.google.com/uc?id=0B4y35FiV1wh7MWVlSDBCSXZMTXM
tar xzvf mecab-0.996.tar.gz
cd mecab-0.996
./configure
make
make check
sudo make install
@tamuhey
tamuhey / tokenizations.md
Last active January 2, 2020 16:38
tokenization alignment
@tamuhey
tamuhey / torch_beemsearch.py
Last active January 21, 2020 10:44
simple and efficient beamsearch function for pytorch
import torch
def beamsearch(probs: torch.Tensor, k: int) -> torch.Tensor:
"""Beam search for sequential probabilities.
Args:
data: tensor of shape (length, d). requires d > 0. Assumed all items in `probs` in range [0, 1].
k: beam width
Returns: (k, length) tensor
"""
@tamuhey
tamuhey / ldcc2spacygold.py
Created November 19, 2019 17:57
convert livedoor news corpus to spacy gold jsonl
from pathlib import Path
import itertools as it
import copy
import srsly
from tqdm.notebook import tqdm
labels = [
"movie-enter",
"it-life-hack",