Instance based learning (and KNN) - Part 1
Vivek Yadav, PhD
Overview
Instance-based algorithms are a class of machine learning algorithms that do not rely on developing a parametric model to make predictions, instead they store the oberved data and retrieve them from memory when asked to generalize or perform predictions on unseen data. As no specific parametric model is trained to fit to model relation between the data and output variable of interest, the training step is very fast and involves merely storing the data and target pairs. However, when a new prediction is to be made, the incoming data point is compared to all the data points in the training set and some score of ‘nearness’ is used to make predictions. The predicted data depends heavily on the scoring metric used to identify ‘nearness’.
In this 2 part post, I will illustrate instance based learning method with a few simple examples, and then apply them to classify images from CIFAR-10 data set.
1. ‘Regression’ example
Here first I generate some test data, and then will apply concepts in instance-based learning. First step is to import some packages.
# import packages
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
%matplotlib inline
# Simple example
np.random.seed(2016)
def my_fun(x):
y = x + x**2
return y
num_pts = 23
x = np.linspace(1,10,num_pts)
y_m = my_fun(x)
y = y_m + 10*np.random.randn(num_pts)
plt.plot(x,y_m,x,y,'rs')
Now to make predictions on a new point x_pred, I will measure distance of x_pred to each of the points and use the value of the nearest value as guess.
x_pred = 5.0
x_dist = [(x[i]-x_pred)**2 for i in np.arange(0,num_pts)]
min_dist = np.argmin(x_dist)
y_pred = y[min_dist]
err = np.sqrt((my_fun(x_pred) -y_pred)**2)
print 'Predicted value for %f is %f.'% (x_pred,y_pred)
print 'Prediction error is %f.'% (err)
Predicted value for 5.000000 is 15.992849.
Prediction error is 14.007151.
plt.plot(x,y_m,x,y,'rs',x_pred,y_pred,'go')
x_pred = 5.0
x_dist = [(x[i]-x_pred)**2 for i in np.arange(0,num_pts)]
min_dist = np.argmin(x_dist)
y_pred = y[min_dist]
err = np.sqrt((my_fun(x_pred) -y_pred)**2)
print 'Predicted value for %f is %f.'% (x_pred,y_pred)
print 'Prediction error is %f.'% (err)
Predicted value for 5.000000 is 15.992849.
Prediction error is 14.007151.
As evident, using just 1 point to predict can result in large deviations from the desired value. One way to reduce sensitivity to the noise is to take the average of multiple points around the value. This method is formally called k-NN, or k-Nearest Neighbor. Where, first a distance vector is computed from each of the training points, then points with k-smallest distance are identified. The value for new point is predicted by taking average of these k-predictions. Below is example for predicting value using 3 nearest neighbors.
num_nn = 4
x_dist = [(x[i]-x_pred)**2 for i in np.arange(0,num_pts)]
x_dist = np.asarray(x_dist)
arg_nn = x_dist.argsort()[:num_nn]
y_pred_nn = np.mean(y[arg_nn])
plt.plot(x,y_m,x,y,'rs',alpha = 0.5)
plt.plot(x_pred,y_pred,'go',x_pred,y_pred_nn,'bs')
num_nn = 4
x_pred_arr = np.linspace(1,10,9)
y_pred_arr = []
for x_val in x_pred_arr:
x_dist = [(x[i]-x_val)**2 for i in np.arange(0,num_pts)]
x_dist = np.asarray(x_dist)
arg_nn = x_dist.argsort()[:num_nn]
y_val = np.mean(y[arg_nn])
y_pred_arr.append(y_val)
y_pred_arr = np.asarray(y_pred_arr)
plt.plot(x,y_m,x,y,'rs',alpha = 0.5)
plt.plot(x_pred,y_pred,'go',x_pred_arr,y_pred_arr,'bs')
Therefore, taking average from more values results in closer prediction. However, one must be careful in choosing the number of nearest neighbors. As an extreme example, when number of nearest neighbors is equal to the data points in the training set, the output is the mean of all values.
num_nn = 23
x_pred_arr = np.linspace(1,10,9)
y_pred_arr = []
for x_val in x_pred_arr:
x_dist = [(x[i]-x_val)**2 for i in np.arange(0,num_pts)]
x_dist = np.asarray(x_dist)
arg_nn = x_dist.argsort()[:num_nn]
y_val = np.mean(y[arg_nn])
y_pred_arr.append(y_val)
y_pred_arr = np.asarray(y_pred_arr)
plt.plot(x,y_m,x,y,'rs',alpha = 0.5)
plt.plot(x_pred,y_pred,'go',x_pred_arr,y_pred_arr,'bs')
Important take aways
- Instance based algorithms are fast to learn, however are computationally expensive during prediction phase
- Using appropriate number of neighbors is important in kNN algroithm. Small K results in noisy predictions and large k results in over-simplified predictions.
- Large K has high bias and low variance, and small K has low bias and high variance. (To be added later)
2. Multiclass classification
x1 = 1+1.2*np.random.rand(100)
y1 = 1+1.2*np.random.rand(100)
c1 = 0*np.ones(100)
x2 = 2+1.2*np.random.rand(100)
y2 = 1+1.2*np.random.rand(100)
c2 = 1*np.ones(100)
x3 = 1.5+1.2*np.random.rand(100)
y3 = 2+1.2*np.random.rand(100)
c3 = 2*np.ones(100)
train_data = np.asarray([np.hstack((x1,x2,x3)),np.hstack((y1,y2,y3))])
c = np.asarray(np.hstack((c1,c2,c3)))
col_class = ['rs','bs','gs']
plt.plot(x1,y1,col_class[0])
plt.plot(x2,y2,col_class[1])
plt.plot(x3,y3,col_class[2])
plt.ylim(0.8,3.3)
plt.xlim(0.8,3.3)
Now I will predict which class a new point belongs to. The way of doing is similar to before. I will first measure distance from the new point to all the points in the training data set, and identify the point nearest to it, and assign the point to that class.
def get_class_knn(train_data,new_data,num_nn):
dist = np.sqrt( (train_data[0] - new_data[0])**2\
+(train_data[1] - new_data[1])**2)
arg_nn = dist.argsort()[:num_nn]
c_pred = c[arg_nn]
count = Counter(c_pred)
most_common = count.most_common(1)[0][0]
most_common = most_common.astype('Int64')
return most_common
num_nn = 1
new_data = [2,3]
most_common = get_class_knn(train_data,new_data,4)
plt.plot(new_data[0],new_data[1],'o')
plt.plot(new_data[0],new_data[1],col_class[most_common])
plt.plot(x1,y1,'bs',alpha = 0.15)
plt.plot(x2,y2,'gs',alpha = 0.15)
plt.plot(x3,y3,'rs',alpha = 0.15)
plt.ylim(0.8,3.3)
plt.xlim(0.8,3.3)
If the new point is well within the boundary separating the points, the k-nearest neighbor algroithm works well. However, in cases when the points are close to boundary, k-nearest algorithm can result in ‘noisy’ predictions. Below I illustrate this by drawing decision boundaries for each class. First I generated a grid of points and calculated the corresponding class for each point in the grid. I then used contour function to compute decision boundaries as the lines across which the value of class changes.
num_neigh=1
# Plotting decision regions
x_min, x_max = train_data[0,:].min() - 1, train_data[ 0,:].max() + 1.1
y_min, y_max = train_data[1,:].min() - 1, train_data[ 1,:].max() + 1.1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.05),
np.arange(y_min, y_max, 0.05))
Z = np.zeros(np.shape(xx))
for i in np.arange(0,xx.shape[0]):
for j in np.arange(0,xx.shape[1]):
Z[i,j] = get_class_knn(train_data,[xx[i,j],yy[i,j]],num_neigh)
plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.contourf(xx, yy, Z, alpha=0.4,c = c)
plt.plot(x1,y1,'bs',alpha = 0.5)
plt.plot(x2,y2,'gs',alpha = 0.5)
plt.plot(x3,y3,'rs',alpha = 0.5)
plt.ylim(0.8,3.3)
plt.xlim(0.8,3.3)
plt.title('1-NN')
num_neigh=20
# Plotting decision regions
x_min, x_max = train_data[0,:].min() - 1, train_data[ 0,:].max() + 1.1
y_min, y_max = train_data[1,:].min() - 1, train_data[ 1,:].max() + 1.1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
np.arange(y_min, y_max, 0.02))
Z = np.zeros(np.shape(xx))
for i in np.arange(0,xx.shape[0]):
for j in np.arange(0,xx.shape[1]):
Z[i,j] = get_class_knn(train_data,[xx[i,j],yy[i,j]],num_neigh)
plt.subplot(1,2,2)
plt.contourf(xx, yy, Z, alpha=0.4,c = c)
plt.plot(x1,y1,'bs',alpha = 0.5)
plt.plot(x2,y2,'gs',alpha = 0.5)
plt.plot(x3,y3,'rs',alpha = 0.5)
plt.ylim(0.8,3.3)
plt.xlim(0.8,3.3)
plt.title('20-NN')
Conclusion
Instance based algorithms (or KNN) are simple algorithms that do not try to learn any parametric model of the data, instead they simply store all the values seen in the data set, and when a new data is seen they simply identify the ‘most similar’ data seen in the training set and use values of that data set for prediction. Although instance based algorithms are simple, there are several shortcomings. One must consier the following important guidelines while designing a kNN algorithm.
- Choice of K is crucial in KNN algorithms
- Large K results in smoother boundaries and is prone to have high bias and low variance.
- Small K resilts in noisy boundaries and is prone to low bias and high variance.
- Instance based algorithms are memory intensive as the ‘model building’ part involves storing all the data.
- Instance based models are very fast in the learning part, but tend to get computationally expensive during the prediction part.
In the next post, I will test some modified instance based algorithms that were designed to address some of the shortcomings mentioned above.