KNN Algorithm - PyiHub

Data Science and Machine Learning Using Python

About Lesson

A KNN (k-nearest neighbors) is a supervised machine learning model used mostly for classification tasks. It uses the distance formula to classify the testing dataset. The model is suitable for both binary and multi-class classification tasks. It is mostly efficient when you have a clustered dataset. This method might not work best for overlapping classes.

What is the KNN Model?

A KNN model is a traditional method of classifying objects. To understand the workings of the model, we need to understand the distance formula.

You can read a more comprehensive article on the KNN model from here: KNN hyperparameter tuning.

When the KNN model has to make a prediction, it will find the distance from the testing/incoming data to every data point in the training data and sort the distances in ascending order. Based on the K value, the model will check which is the top nearest distance. Then, the model will predict the testing dataset based on the majority voting.

Why is KNN a Lazy model?

Among all the machine learning models, the KNN model is considered to be a lazy model. The reason is very simple. Unlike many other machine learning models which try to understand the dataset and find the relation between the input and the target variables when training the model. The KNN does nothing in the training part. It just stores the training dataset. When the model has to make predictions, it then starts calculating the distances and arranging them. For every prediction, it will calculate the distances again and again and that is the reason some people called the KNN as lazy learner.