Skip to content

Instantly share code, notes, and snippets.

Avatar

Takuya Kitazawa takuti

View GitHub Profile
@takuti
takuti / activities.csv
Last active January 28, 2021 22:20
Fitbit exported data (2020)
View activities.csv
Date Calories Burned Steps Distance Floors Minutes Sedentary Minutes Lightly Active Minutes Fairly Active Minutes Very Active Activity Calories
2020-01-01 2,313 9,179 6.52 4 901 152 4 51 916
2020-01-02 2,367 10,634 7.56 17 718 206 24 23 1,032
2020-01-03 2,366 10,002 7.01 5 789 241 15 10 1,033
2020-01-04 2,740 15,201 10.72 15 622 315 42 16 1,542
2020-01-05 2,346 9,737 6.83 13 847 232 20 9 1,023
2020-01-06 2,714 15,106 10.68 12 662 238 29 67 1,461
2020-01-07 2,486 12,440 8.83 5 587 218 11 49 1,158
2020-01-08 2,869 19,895 14 6 648 181 48 107 1,599
2020-01-09 2,495 9,844 6.97 5 766 251 7 22 1,136
View touchpoints.csv
We can't make this file beautiful and searchable because it's too large.
time,user_id,source,conversion
1590863740,xobepz7opw,direct,0
1590863754,vpo60mcha1,facebook,0
1590864169,89u9knmqni,direct,0
1590864169,cdmgdvf6oo,google,0
1590864380,h0czqgvxbg,google,0
1590864409,cj98eurd91,google,0
1590864574,fqu9t0sd02,facebook,0
1590864646,89pp6xf2pb,google,0
1590864929,k1jtp0bz2j,facebook,0
@takuti
takuti / td-wf-partial-upload.sh
Created June 16, 2019 07:03
Uploading specific files in a folder to TD Workflow
View td-wf-partial-upload.sh
#!/bin/bash
project_name=foo
project_files=(
"config/"
"scripts/"
"queries/"
"workflow1.dig"
"workflow2.dig"
View what-is-auc.md

Assume we have a binary classifier that gives the probability of being a positive sample in the [0.0, 1.0] range. Area Under the ROC Curve (AUC) quantitatively measures the accuracy of prediction made by such a classification model. Intuitively, what AUC does is to make sure if positive (i.e., label=1) samples in a validation set get higher probability of being positive than negative ones.

The AUC metric eventually gives a single value in [0.0, 1.0]. When we have five test samples sorted by their prediction results as follows, we can see that the classifier put higher probability to all positive samples, #1, #2, and #4, than the others. We define the best scenario as an AUC of 1.0.

Test sample # Probability of label=1 True label
1 0.8 1
2 0.7 1
4 0.6 1
3 0.5 0
View hivemall.md

Generic functions

  • convert_label(const int|const float) - Convert from -1|1 to 0.0f|1.0f, or from 0.0f|1.0f to -1|1

  • each_top_k(int K, Object group, double cmpKey, *) - Returns top-K values (or tail-K values when k is less than 0)

  • generate_series(const int|bigint start, const int|bigint end) - Generate a series of values, from start to end. A similar function to PostgreSQL's generate_serics. http://www.postgresql.org/docs/current/static/functions-srf.html

    select generate_series(1,9);
    
    
View pandas-td-sample.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@takuti
takuti / Dockerfile
Created January 21, 2018 23:12
Mock dockerfile for takuti.me
View Dockerfile
FROM node:alpine
ENV HUGO_VERSION=0.30.2
ADD https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/hugo_${HUGO_VERSION}_Linux-64bit.tar.gz /tmp
ADD . /src
WORKDIR /src
RUN \
# install hugo
View euroscipy-2017.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View anompy.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@takuti
takuti / jawikicorpus.py
Last active May 27, 2019 02:49 — forked from yuku/jawikicorpus.py
gensimに日本語Wikipediaを取り込むためのスクリプト
View jawikicorpus.py
# coding: utf-8
"""USAGE: %(program)s WIKI_XML_DUMP OUTPUT_PREFIX
"""
import logging
import os.path
import sys
import gensim.corpora.wikicorpus as wikicorpus