← back

Implement Gini Impurity Calculation for a Set of Classes

#64 · Machine Learning · Easy

⊣ Solve on deep-ml.com

Problem

Calculate the Gini Impurity for a set of classes. Gini impurity measures the probability of incorrectly classifying a randomly chosen element if it were labeled according to the class distribution.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def gini_impurity(labels):
    n = len(labels)
    if n == 0:
        return 0.0

    counts = {}
    for label in labels:
        counts[label] = counts.get(label, 0) + 1

    impurity = 1.0
    for count in counts.values():
        prob = count / n
        impurity -= prob ** 2

    return round(impurity, 4)

Explanation

  1. Count occurrences of each class label.
  2. Compute the proportion p_i for each class.
  3. Gini Impurity = 1 - sum(p_i^2) for all classes.
  4. A value of 0 means all elements belong to a single class (pure).
  5. Maximum impurity occurs when classes are equally distributed.

Complexity

  • Time: O(n) where n is the number of elements
  • Space: O(k) where k is the number of unique classes