Clustering groups samples that are similar within the same cluster. The more similar the samples belonging to a cluster group are (and conversely, the more dissimilar samples in separate groups), the better the clustering algorithm has performed. Since clustering is an unsupervised algorithm, this similarity metric must be measured automatically and based solely on your data.
The implementation details and definition of similarity are what differentiate the many clustering algorithms.
K-Neighbours is a supervised classification algorithm. If clustering is the process of separating your samples into groups, then classification would be the process of assigning samples into those groups. Given a set of groups, take a set of samples and mark each sample as being a member of a group. Each group being the correct answer, label, or classification of the sample.
The K-Nearest Neighbours - or K-Neighbours - classifier, is one of the simplest machine learning algorithms.
K-Nearest Neighbours works by first simply storing all of your training data samples.
Then in the future, when you attempt to check the classification of a new, never-before seen sample, it finds the nearest "K" number of samples to it from within your training data. You must have numeric features in order for 'nearest' to be meaningful. There are other methods you can use for categorical features. For example you can use bag of words to vectorize your data. SciKit-Learn's K-Nearest Neighbours only supports numeric features, so you'll have to do whatever has to be done to get your data into that format before proceeding. The distance will be measures as a standard Euclidean
With the nearest neighbors found, K-Neighbours looks at their classes and takes a mode vote to assign a label to the new data point. Further extensions of K-Neighbours can take into account the distance to the samples to weigh their voting power. Each new prediction or classification made, the algorithm has to again find the nearest neighbors to that sample in order to call a vote for it. This process is where a majority of the time is spent, so instead of using brute force to search the training data as if it were stored in a list, tree structures are used instead to optimize the search times. Due to this, the number of classes in dataset doesn't have a bearing on its execution speed. Only the number of records in your training data set.
A unique feature of supervised classification algorithms are their decision boundaries, or more generally, their n-dimensional decision surface: a threshold or region where if superseded, will result in your sample being assigned that class.
The decision surface isn't always spherical. In fact, it can take many different types of shapes depending on the algorithm that generated it.
For K-Neighbours, generally the higher your "K" value, the smoother and less jittery your decision surface becomes. Higher K values also result in your model providing probabilistic information about the ratio of samples per each class. There is a tradeoff though, as higher K values mean the algorithm is less sensitive to local fluctuations since farther samples are taken into account. This causes it to only model the overall classification function without much attention to detail, and increases the computational complexity of the classification.
K-Neighbours is particularly useful when no other model fits your data well, as it is a parameter free approach to classification. So for example, you don't have to worry about things like your data being linearly separable or not.
Some of the caution-points to keep in mind while using K-Neighbours is that your data needs to be measurable. If there is no metric for discerning distance between your features, K-Neighbours cannot help you. As with all algorithms dependent on distance measures, it is also sensitive to feature scaling. K-Neighbours is also sensitive to perturbations and the local structure of your dataset, particularly at lower "K" values.