← back

Implement K-Nearest Neighbors

#173 · Machine Learning · Medium

⊣ Solve on deep-ml.com

Problem

Implement the K-Nearest Neighbors (KNN) algorithm for classification. Given training data with labels, classify a new point by finding its K closest neighbors and returning the majority label.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import numpy as np
from collections import Counter
from typing import List

def knn_classify(X_train: np.ndarray, y_train: np.ndarray,
                 X_query: np.ndarray, k: int = 3) -> List:
    predictions = []
    for query in X_query:
        distances = np.sqrt(np.sum((X_train - query) ** 2, axis=1))
        nearest_indices = np.argsort(distances)[:k]
        nearest_labels = y_train[nearest_indices].tolist()
        counts = Counter(nearest_labels)
        predictions.append(counts.most_common(1)[0][0])
    return predictions

Explanation

  1. For each query point, compute the Euclidean distance to every training point.
  2. Sort distances and select the indices of the K nearest neighbors.
  3. Collect the labels of these K neighbors.
  4. The prediction is the most common label among the neighbors (majority vote).
  5. KNN is a lazy learner: it stores all training data and does all computation at prediction time.

Complexity

  • Time: O(q (n d + n * log n)) where q is the number of queries, n is the training set size, and d is the feature dimension
  • Space: O(n * d) for storing the training data