Skip to content

Instantly share code, notes, and snippets.

View takuti's full-sized avatar
🏃‍♂️
𓈒 𓂂𓏸𓋪‪

Takuya Kitazawa takuti

🏃‍♂️
𓈒 𓂂𓏸𓋪‪
View GitHub Profile
@takuti
takuti / activities.csv
Last active January 28, 2021 22:20
Fitbit exported data (2020)
Date Calories Burned Steps Distance Floors Minutes Sedentary Minutes Lightly Active Minutes Fairly Active Minutes Very Active Activity Calories
2020-01-01 2,313 9,179 6.52 4 901 152 4 51 916
2020-01-02 2,367 10,634 7.56 17 718 206 24 23 1,032
2020-01-03 2,366 10,002 7.01 5 789 241 15 10 1,033
2020-01-04 2,740 15,201 10.72 15 622 315 42 16 1,542
2020-01-05 2,346 9,737 6.83 13 847 232 20 9 1,023
2020-01-06 2,714 15,106 10.68 12 662 238 29 67 1,461
2020-01-07 2,486 12,440 8.83 5 587 218 11 49 1,158
2020-01-08 2,869 19,895 14 6 648 181 48 107 1,599
2020-01-09 2,495 9,844 6.97 5 766 251 7 22 1,136
We can't make this file beautiful and searchable because it's too large.
time,user_id,source,conversion
1590863740,xobepz7opw,direct,0
1590863754,vpo60mcha1,facebook,0
1590864169,89u9knmqni,direct,0
1590864169,cdmgdvf6oo,google,0
1590864380,h0czqgvxbg,google,0
1590864409,cj98eurd91,google,0
1590864574,fqu9t0sd02,facebook,0
1590864646,89pp6xf2pb,google,0
1590864929,k1jtp0bz2j,facebook,0
@takuti
takuti / td-wf-partial-upload.sh
Created June 16, 2019 07:03
Uploading specific files in a folder to TD Workflow
#!/bin/bash
project_name=foo
project_files=(
"config/"
"scripts/"
"queries/"
"workflow1.dig"
"workflow2.dig"

Assume we have a binary classifier that gives the probability of being a positive sample in the [0.0, 1.0] range. Area Under the ROC Curve (AUC) quantitatively measures the accuracy of prediction made by such a classification model. Intuitively, what AUC does is to make sure if positive (i.e., label=1) samples in a validation set get higher probability of being positive than negative ones.

The AUC metric eventually gives a single value in [0.0, 1.0]. When we have five test samples sorted by their prediction results as follows, we can see that the classifier put higher probability to all positive samples, #1, #2, and #4, than the others. We define the best scenario as an AUC of 1.0.

Test sample # Probability of label=1 True label
1 0.8 1
2 0.7 1
4 0.6 1
3 0.5 0

Generic functions

  • convert_label(const int|const float) - Convert from -1|1 to 0.0f|1.0f, or from 0.0f|1.0f to -1|1

  • each_top_k(int K, Object group, double cmpKey, *) - Returns top-K values (or tail-K values when k is less than 0)

  • generate_series(const int|bigint start, const int|bigint end) - Generate a series of values, from start to end. A similar function to PostgreSQL's generate_serics. http://www.postgresql.org/docs/current/static/functions-srf.html

    select generate_series(1,9);
    
    
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@takuti
takuti / Dockerfile
Created January 21, 2018 23:12
Mock dockerfile for takuti.me
FROM node:alpine
ENV HUGO_VERSION=0.30.2
ADD https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/hugo_${HUGO_VERSION}_Linux-64bit.tar.gz /tmp
ADD . /src
WORKDIR /src
RUN \
# install hugo
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@takuti
takuti / jawikicorpus.py
Last active May 27, 2019 02:49 — forked from yuku/jawikicorpus.py
gensimに日本語Wikipediaを取り込むためのスクリプト
# coding: utf-8
"""USAGE: %(program)s WIKI_XML_DUMP OUTPUT_PREFIX
"""
import logging
import os.path
import sys
import gensim.corpora.wikicorpus as wikicorpus