Skip to content

Instantly share code, notes, and snippets.

@ymurase
Created January 5, 2015 14:53
Show Gist options
  • Save ymurase/2eb7c264f31fdf90ae7a to your computer and use it in GitHub Desktop.
Save ymurase/2eb7c264f31fdf90ae7a to your computer and use it in GitHub Desktop.
generate test data for K-means clustering
require 'pp'
unless ARGV.size == 2
$stderr.puts "Usage: ruby #{__FILE__} <n> <expected_k>"
raise "invalid arguments"
end
VARIANCE = 0.05
def draw_gaussian
r1 = rand
r2 = rand
x = Math.sqrt( -2 * Math.log(r1) ) * Math.cos(2*Math::PI*r2)
y = Math.sqrt( -2 * Math.log(r1) ) * Math.sin(2*Math::PI*r2)
[x,y]
end
def draw_random_from_gaussian(n,k)
center = Array.new(k) {|i| [rand, rand] }
points = Array.new(n) do |i|
cx, cy = center[ rand(k) ]
dx, dy = draw_gaussian.map {|x| x * VARIANCE }
[cx+dx, cy+dy].map {|x| x - x.floor }
end
points
end
points = draw_random_from_gaussian( ARGV[0].to_i , ARGV[1].to_i)
$stdout.puts points.map {|point| point.join(' ') }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment