Skip to content

Instantly share code, notes, and snippets.

@soonraah
soonraah / CalcGaussianDistance.cpp
Created February 9, 2013 11:37
Distance calculation between multi-dimension gaussian distributions (diagonal co-variance). 多次元正規分布(対角共分散)間の距離計算方法
#include <cmath>
// ----------------------------------------------------------------------------
// Kullback-Leiblier Divergence (KLD-divergence, asymmetric)
// ----------------------------------------------------------------------------
float calcKld(
const int nDim,
const float* mean1,
const float* var1,
const float* mean2,
@soonraah
soonraah / LanceWilliamsUpdatingFormula.cpp
Created February 9, 2013 12:03
Distance calculation between clusters by Lance-Williams Updating Formula. Lance-Williams更新式によるクラスタ間の距離計算
#include <cmath>
// クラスタ C1 = C1a∪C1b のとき、クラスタC1とC2の距離をC2とC1a, C1bの距離情報から求めることができる。
// 計算時間はクラスタのメンバ数に依存しない。
// 参考 http://ibisforest.org/index.php?Lance-Williams%20updating%20formula
//! Lance-Williams Updating Formula
/*!
@param[in] dist_1a2 distance between C1a and C2
@param[in] dist_1b2 distance between C1b and C2

git-svnで頻度は高くないがときどき使うコマンドまとめ


SVNリポジトリの変更をGitローカルリポジトリへと反映

git svn rebase だと指定のブランチのみだが、こちらは関連付けたリポジトリ全体の変更を取得する。

$ git svn fetch
@soonraah
soonraah / DrawColoredPointDiagram
Created January 5, 2014 06:13
散布図で各点の色を指定する方法
library(ggplot2)
# CSV 読み込み
color_plot_data <- read.csv("points.csv")
# 座標データとして x, y 列、色データとして r, g, b 列を与えて散布図を描画
ggplot(data=color_plot_data, aes(x=x, y=y, col=rgb(r, g, b))) +
geom_point() +
scale_color_identity()
@soonraah
soonraah / do_skmeans.R
Last active August 29, 2015 14:00
skmeans パッケージによるコサイン距離ベースの k-means 実行
library(slam)
library(skmeans)
# CSV からデータの読み込み
x <- read.csv("data.csv")
# skmeans で扱えるデータ構造(simple triplet matrix)に変換
s <- as.simple_triplet_matrix(x)
# k-means 実行
@soonraah
soonraah / multi_dimensional_gmm.stan
Last active August 29, 2015 14:05
Stan code for multi dimension GMM with full covariance
data {
int<lower=1> D; // number of dimensions
int<lower=1> N; // number of samples
int<lower=1> M; // number of mixture components
vector[D] X[N]; // data to train
}
parameters {
simplex[M] weights; // mixture weights
vector[D] mu[M]; // means
cov_matrix[D] sigma[M]; // covariance matrix
@soonraah
soonraah / train_gmm.py
Created August 30, 2014 11:59
Python code to train GMM by PyStan.
# -*- coding: utf-8 -*-
from sklearn.datasets import make_classification
import numpy as np
import matplotlib.pyplot as plt
import pystan
NUM_MIXTURE_COMPONENTS = 4
NUM_DIMENSIONS = 2
@soonraah
soonraah / multi_dimension_gmm_diagonal.stan
Created October 5, 2014 16:22
Stan code to train multi dimensional GMM (Gaussian Mixture Model) with diagonal covariance.
data {
int<lower=1> D; // number of dimensions
int<lower=1> N; // number of samples
int<lower=1> M; // number of mixture components
vector[D] X[N]; // data to train
}
parameters {
simplex[M] weights; // mixture weights
vector[D] mu[M]; // means
vector<lower=0.0>[D] sigma[M]; // standard deviation
@soonraah
soonraah / em_vs_mcmc.py
Created October 5, 2014 18:11
To compare EM algorithm and MCMC on GMM training.
import numpy as np
from sklearn import cross_validation, mixture
import pickle
import os
import pystan
import time
import matplotlib.pyplot as plt
def dump_stan_model(stan_model, compiled_file_name):
@soonraah
soonraah / LinearClassifier.scala
Created June 5, 2016 11:07
A base class of online learning for binary linear classification
package mlp.onlineml.classification.binary
import breeze.linalg.{DenseMatrix, DenseVector}
/**
* A base class of binary linear classification
*
* @param w weight vector
* @param sigma covariance matrix
*/