JoeKurokawa

Learning ML part VIII. Nearest nieghbor

Learning ML part VIII. Nearest Neighbor Algorithm.

The Nearest Neighbor algorithm is an instance based learning algorithm than can handle both classification and regression.  It is one of the simplest  of algorithms because any data point  can be inferred from it’s nearest neighbor. See the diagram below: Let’s say you have a graph of red and blue points and we want to figure out what the green point is.

 

 

 

 

 

We can see that the nearest point to the green circle is a red, so logically we could assume green is also a red. There is also Kth Nearest Neighbor, method which takes into account not just the closest but also the kth nearest points. For example in the same graph, if we want the 3rd Nearest Neighbor to the green circle, there are two reds and one blue.  What should the green circle be then? Well if we are using classification we implement a voting system (count up the number of blues and reds), or if we were doing regression we would compare the mean of the distances.

So this algorithm is helpful when you are looking at a graph with geographical information such as finding out what the cost of a house is in a neighborhood, or mapping a certain classification to a graph of data.  For example, If we want to assign credit scores according to financial data of people in a database, we can graph them and find their nearest neighbor.

 

Lazy Learning 

So then, we come to the idea of Instance based learning or lazy learning. Kth Nearest Neighbor is considered an instance based learning method because learning takes a relatively short amount of time but the querying takes a relatively longer amount of time. What is meant by that is Kth Nearest Neighbor does not produce a function to predict outputs like a regression. That makes Kth Nearest Neighbors extremely efficient because the training data is the model for learning. However, KNN takes a longer time on the back end to compute all the neighbors, as well as store all the data points in memory. The opposite of this is Eager learning methods like regression, decision trees and neural networks who produce models first.

Non-Parametric

KNN can also be described as non-parametric method because it does not produce a function which it maps to. It does not make assumptions about the form of the function of the data, the data is the model.

That’s It for Today and the Kth Learning Algorithm

Leave a Comment