← back

One-Hot Encoding of Nominal Values

#34 · Machine Learning · Easy

⊣ Solve on deep-ml.com

Problem

Implement one-hot encoding for nominal (categorical) values. Given a 1D array of labels, return a 2D binary matrix where each row corresponds to a sample and each column corresponds to a unique class.

Solution

1
2
3
4
5
6
7
8
9
10
11
import numpy as np

def one_hot_encode(labels):
    unique = sorted(set(labels))
    label_to_idx = {label: i for i, label in enumerate(unique)}
    n_samples = len(labels)
    n_classes = len(unique)
    encoded = np.zeros((n_samples, n_classes))
    for i, label in enumerate(labels):
        encoded[i, label_to_idx[label]] = 1
    return encoded

Explanation

  1. Find all unique labels and sort them for deterministic column ordering.
  2. Map each label to a column index.
  3. Create a zero matrix of shape (n_samples, n_classes).
  4. For each sample, set the entry at the corresponding class column to 1.

Complexity

  • Time: O(n) where n is the number of samples
  • Space: O(n * k) where k is the number of unique classes