Basic problems: NN and ANN.
Locality sensitive hashing is a family of algorithms for NN.
Given a collection of
n
points, build a data structure which, given any query point, reports the data point, that is closest to the query.
An important case is where the points live in a d-dimensional space under some (Euclidean) distance function.
Objects of interest (documents, images, ...) are represented as a point
in R^d
. d
ranges from tens to millions.
For low dimensions, there are traditional data structures, such as k-d-trees (k-dimensional tree, Bentley, 1975).
A k-d tree is a binary tree, in which every node is a k-dimensional point.
More on k-d trees:
- http://www.autonlab.org/autonweb/14665/version/2/part/5/data/moore-tutorial.pdf
- https://www.cs.umd.edu/class/spring2008/cmsc420/L19.kd-trees.pdf
Space partitioning data structure.
Cool viz: