Skip to content

Instantly share code, notes, and snippets.

Takuya Kitazawa takuti

Block or report user

Report or block takuti

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@takuti
takuti / td-wf-partial-upload.sh
Created Jun 16, 2019
Uploading specific files in a folder to TD Workflow
View td-wf-partial-upload.sh
#!/bin/bash
project_name=foo
project_files=(
"config/"
"scripts/"
"queries/"
"workflow1.dig"
"workflow2.dig"
View what-is-auc.md

Assume we have a binary classifier that gives the probability of being a positive sample in the [0.0, 1.0] range. Area Under the ROC Curve (AUC) quantitatively measures the accuracy of prediction made by such a classification model. Intuitively, what AUC does is to make sure if positive (i.e., label=1) samples in a validation set get higher probability of being positive than negative ones.

The AUC metric eventually gives a single value in [0.0, 1.0]. When we have five test samples sorted by their prediction results as follows, we can see that the classifier put higher probability to all positive samples, #1, #2, and #4, than the others. We define the best scenario as an AUC of 1.0.

Test sample # Probability of label=1 True label
1 0.8 1
2 0.7 1
4 0.6 1
3 0.5 0
View hivemall.md

Generic functions

  • convert_label(const int|const float) - Convert from -1|1 to 0.0f|1.0f, or from 0.0f|1.0f to -1|1

  • each_top_k(int K, Object group, double cmpKey, *) - Returns top-K values (or tail-K values when k is less than 0)

  • generate_series(const int|bigint start, const int|bigint end) - Generate a series of values, from start to end. A similar function to PostgreSQL's generate_serics. http://www.postgresql.org/docs/current/static/functions-srf.html

    select generate_series(1,9);
    
    
View pandas-td-sample.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@takuti
takuti / Dockerfile
Created Jan 21, 2018
Mock dockerfile for takuti.me
View Dockerfile
FROM node:alpine
ENV HUGO_VERSION=0.30.2
ADD https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/hugo_${HUGO_VERSION}_Linux-64bit.tar.gz /tmp
ADD . /src
WORKDIR /src
RUN \
# install hugo
View euroscipy-2017.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View anompy.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@takuti
takuti / jawikicorpus.py
Last active May 27, 2019 — forked from yuku/jawikicorpus.py
gensimに日本語Wikipediaを取り込むためのスクリプト
View jawikicorpus.py
# coding: utf-8
"""USAGE: %(program)s WIKI_XML_DUMP OUTPUT_PREFIX
"""
import logging
import os.path
import sys
import gensim.corpora.wikicorpus as wikicorpus
View lucene-user-dict.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 1 column, instead of 4. in line 1.
# Custom segmentation for long entries
日本経済新聞,日本 経済 新聞,ニホン ケイザイ シンブン,カスタム名詞
関西国際空港,関西 国際 空港,カンサイ コクサイ クウコウ,テスト名詞
# Custom reading for sumo wrestler
朝青龍,朝青龍,アサショウリュウ,カスタム人名
View hivemall-churn-rf.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
You can’t perform that action at this time.