← back

Measure Disorder in Apple Colors

#108 · Machine Learning · Easy

⊣ Solve on deep-ml.com

Problem

Measure the disorder (entropy) in apple colors. Given a collection of items with categorical labels (e.g., apple colors: red, green, yellow), calculate the Shannon entropy to quantify the impurity or disorder of the distribution.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import math
from collections import Counter

def entropy(labels: list) -> float:
    n = len(labels)
    if n == 0:
        return 0.0

    counts = Counter(labels)
    h = 0.0
    for count in counts.values():
        p = count / n
        if p > 0:
            h -= p * math.log2(p)

    return round(h, 4)

Explanation

  1. Count frequencies: Use a Counter to tally how many times each label appears.
  2. Compute probabilities: Divide each count by the total number of items.
  3. Shannon entropy formula: H = -sum(p_i * log2(p_i)) for all categories.
  4. Interpretation: Entropy is 0 when all items are the same class (pure) and maximized at log2(k) when items are uniformly distributed across k classes.
  5. This is used in decision trees (ID3, C4.5) to determine the best feature to split on.

Complexity

  • Time: O(n) where n is the number of labels
  • Space: O(k) where k is the number of unique labels