Vivek Yadav, PhD

In previous posts, we saw how instance based methods can be used for classification and regression. In this post, we will investigate the performance of the k-nearest neighbor (KNN) algorithm for classifying images. The kNN classifier consists of two stages: - During training, the classifier takes the training data and simply remembers it - During testing, kNN classifies every test image by comparing to all training images and transfering the labels of the k most similar training examples - The value of k is cross-validated

** This post is based on CS231n course on using convolutional neural network for image classification. **

# Run some setup code for this notebook.

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the notebook
# rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print 'Training data shape: ', X_train.shape
print 'Training labels shape: ', y_train.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape
Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)
# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()
# Subsample the data for more efficient code execution in this exercise
num_training = 5000
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

num_test = 500
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]
# Reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
print X_train.shape, X_test.shape
from cs231n.classifiers import KNearestNeighbor

# Create a kNN classifier instance. 
# Remember that training a kNN classifier is a noop: 
# the Classifier simply remembers the data and does no further processing 
classifier = KNearestNeighbor()
classifier.train(X_train, y_train)

kNN learning:

Two important hyperparameters to be defined in a kNN algorithm are the k and an appropriate distance metric. In this post, the distance is defined as the sum of squared differences in the pixel values. The kNN algorithm is then applied in the following 2 steps, 1. Compute the distances between all test examples and all train examples. 2. For each test image, find the k nearest examples and have them vote for the label

Next a distance matrix is computed between 500 test images and 5000 training images, this results in a 500 X 5000 distance matrix. These distances are plotted, where white indicates higher distance and black indicates lower distance. A white (or black) row indicates that the image in the test set is very different (or similar) from all the images in the training set. Whereas, a white (or black) column indicates that the image in the training set is very different (or similar) from all the images in the testing set.

dists = classifier.compute_distances_two_loops(X_test)
print dists.shape
# We can visualize the distance matrix: each row is a single test example and
# its distances to training examples
plt.imshow(dists, interpolation='none')
plt.show()

Prediction:

The k or number of neighbors is very important hyperparameter to determing in kNN algorithms. Choosing a very small k can result in an algorithm with high variance and low bias, whereas a large k can result in high bias and low variance. With k=1, a 27.4% accuracy is obtained, and with k = 5 27.8%.

y_test_pred = classifier.predict_labels(dists, k=1)
num_correct = np.sum(y_test_pred == y_test)
accuracy = float(num_correct) / num_test
print 'For k = 1,'
print 'Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy)
For k = 1,
Got 137 / 500 correct => accuracy: 0.274000
y_test_pred = classifier.predict_labels(dists, k=5)
num_correct = np.sum(y_test_pred == y_test)
accuracy = float(num_correct) / num_test
print 'For k = 5,'
print 'Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy)
For k = 5,
Got 139 / 500 correct => accuracy: 0.278000

Calculating distance.

Next effect of using different metrics for distance calculations is investigated. The vectorized scheme without any for loops took 0.35 seconds when compared to two or one loop versions that took above 40s each.

dists_one = classifier.compute_distances_one_loop(X_test)
difference = np.linalg.norm(dists - dists_one, ord='fro')
print 'Difference was: %f' % (difference, )
if difference < 0.001:
  print 'Good! The distance matrices are the same'
else:
  print 'Uh-oh! The distance matrices are different'
Difference was: 0.000000
Good! The distance matrices are the same
dists_two = classifier.compute_distances_no_loops(X_test)

# check that the distance matrix agrees with the one we computed before:
difference = np.linalg.norm(dists - dists_two, ord='fro')
print 'Difference was: %f' % (difference, )
if difference < 0.001:
  print 'Good! The distance matrices are the same'
else:
  print 'Uh-oh! The distance matrices are different'
Difference was: 0.000000
Good! The distance matrices are the same
# Let's compare how fast the implementations are
def time_function(f, *args):
  """
  Call a function f with args and return the time (in seconds) that it took to execute.
  """
  import time
  tic = time.time()
  f(*args)
  toc = time.time()
  return toc - tic

two_loop_time = time_function(classifier.compute_distances_two_loops, X_test)
print 'Two loop version took %f seconds' % two_loop_time

one_loop_time = time_function(classifier.compute_distances_one_loop, X_test)
print 'One loop version took %f seconds' % one_loop_time

no_loop_time = time_function(classifier.compute_distances_no_loops, X_test)
print 'No loop version took %f seconds' % no_loop_time

# you should see significantly faster performance with the fully vectorized implementation
Two loop version took 40.809804 seconds
One loop version took 42.902647 seconds
No loop version took 0.354949 seconds

Cross-validation

Next 5 fold validation is performed to determine the best k for the classification task.

num_folds = 5
k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]

num_data = len(y_train)
ind_split = np.array_split(np.arange(num_data), num_folds)
X_train_folds = np.array_split(X_train, num_folds)
y_train_folds = np.array_split(y_train, num_folds)
#print np.shape(X_train_folds)
k_to_accuracies = {}
for k in k_choices:
    #print 'Processing kNN for k = %d' %(k)
    k_to_accuracies[k] = []
    for j in range(0,num_folds):
        X_train_v = np.vstack(X_train_folds[0:j]+X_train_folds[j+1:])
        y_train_v = np.hstack(y_train_folds[0:j]+y_train_folds[j+1:])
        
        X_test_v = X_train_folds[j]
        y_test_v = y_train_folds[j]
        
        classifier.train(X_train_v, y_train_v)
        dists_cv = classifier.compute_distances_no_loops(X_test_v)
        dists_cv = np.asarray(dists_cv)
        #print 'predicting now'
        y_test_pred = classifier.predict_labels(dists_cv, k)
        num_correct = np.sum(y_test_pred == y_test_v)
        accuracy = float(num_correct) / float(len(y_test_v))
        k_to_accuracies[k].append(accuracy)
        
        


# Print out the computed accuracies
for k in sorted(k_to_accuracies):
    print 'k = %d, accuracy = %f' % (k, np.mean(k_to_accuracies[k]))
(5, 1000, 3072)
k = 1, accuracy = 0.265600
k = 3, accuracy = 0.249600
k = 5, accuracy = 0.273200
k = 8, accuracy = 0.276000
k = 10, accuracy = 0.280200
k = 12, accuracy = 0.279400
k = 15, accuracy = 0.275000
k = 20, accuracy = 0.279000
k = 50, accuracy = 0.274400
k = 100, accuracy = 0.261600
# plot the raw observations
for k in k_choices:
  accuracies = k_to_accuracies[k]
  plt.scatter([k] * len(accuracies), accuracies)

# plot the trend line with error bars that correspond to standard deviation
accuracies_mean = np.array([np.mean(v) for k,v in sorted(k_to_accuracies.items())])
accuracies_std = np.array([np.std(v) for k,v in sorted(k_to_accuracies.items())])
plt.errorbar(k_choices, accuracies_mean, yerr=accuracies_std)
plt.title('Cross-validation on k')
plt.xlabel('k')
plt.ylabel('Cross-validation accuracy')
plt.show()
best_k = 10

classifier = KNearestNeighbor()
classifier.train(X_train, y_train)
y_test_pred = classifier.predict(X_test, k=best_k)

# Compute and display the accuracy
num_correct = np.sum(y_test_pred == y_test)
accuracy = float(num_correct) / num_test
print 'Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy)
Got 141 / 500 correct => accuracy: 0.282000

Conclusion

Based on a simple kNN algorithm, a 28% accuracy is obtained for image classification. This performance is better than chance (10%). However, the kNN algorithm fails more than 72% of the times. Therefore, a kNN algorithm is not suited for image classification. Next posts will investigate other methods for image classification.