Implement K-Nearest Neighbors

#173 · Machine Learning · Medium

Problem

Implement the K-Nearest Neighbors (KNN) algorithm for classification. Given training data with labels, classify a new point by finding its K closest neighbors and returning the majority label.

Solution

import numpy as np
from collections import Counter
from typing import List

def knn_classify(X_train: np.ndarray, y_train: np.ndarray,
                 X_query: np.ndarray, k: int = 3) -> List:
    predictions = []
    for query in X_query:
        distances = np.sqrt(np.sum((X_train - query) ** 2, axis=1))
        nearest_indices = np.argsort(distances)[:k]
        nearest_labels = y_train[nearest_indices].tolist()
        counts = Counter(nearest_labels)
        predictions.append(counts.most_common(1)[0][0])
    return predictions

Explanation

For each query point, compute the Euclidean distance to every training point.
Sort distances and select the indices of the K nearest neighbors.
Collect the labels of these K neighbors.
The prediction is the most common label among the neighbors (majority vote).
KNN is a lazy learner: it stores all training data and does all computation at prediction time.

Complexity

Time: O(q (n d + n * log n)) where q is the number of queries, n is the training set size, and d is the feature dimension
Space: O(n * d) for storing the training data

← #172 #174 →