Skip to content

Instantly share code, notes, and snippets.

@mrdrozdov
Created April 29, 2015 19:00
Show Gist options
  • Save mrdrozdov/b709392b8a36ccbe9ae1 to your computer and use it in GitHub Desktop.
Save mrdrozdov/b709392b8a36ccbe9ae1 to your computer and use it in GitHub Desktop.
Chapter 8: Distance Based Models
# Chapter 8: Distance Based Models
---
# K Nearest Neighbors
Time to train: O(N)
Time to classify: O(N)
The curse of dimensionality!
Python Implementation: http://machinelearningmastery.com/tutorial-to-implement-k-nearest-neighbors-in-python-from-scratch/
---
# K Means
NP-Complete Clustering Algorithm
Two Biggest Issues:
1. Are the convergence points accurate?
2. What should K be?
Great Links
1. http://www.onmyphd.com/?p=k-means.clustering
2. http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
3. http://papers.nips.cc/paper/2526-learning-the-k-in-k-means.pdf
Other Links
1. https://datasciencelab.wordpress.com/2013/12/12/clustering-with-k-means-in-python/
2. http://stackoverflow.com/questions/9847026/plotting-output-of-kmeanspycluster-impl
3. https://spark-summit.org/2013/exercises/machine-learning-with-spark.html
---
# Hierarchical Clustering
Dendrogram
Simple/Complete/Average/Centroid Linkage
Links
1. http://stackoverflow.com/questions/11917779/how-to-plot-and-annotate-hierarchical-clustering-dendrograms-in-scipy-matplotlib
2 (?). http://brandonrose.org/clustering
---
# Kernels
WTF
---
Slides built using http://remarkjs.com/#1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment