← back

Silhouette Score for Clustering Evaluation

#254 · Machine Learning · Medium

⊣ Solve on deep-ml.com

Problem

Compute the Silhouette Score for a set of clustered data points. For each point, measure how similar it is to its own cluster versus the nearest other cluster, and return the mean silhouette value across all points.

Solution

For each point, compute a(i) (mean intra-cluster distance) and b(i) (mean distance to nearest other cluster). The silhouette for point i is (b(i) - a(i)) / max(a(i), b(i)).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import math

def euclidean_distance(p1: list[float], p2: list[float]) -> float:
    return math.sqrt(sum((a - b) ** 2 for a, b in zip(p1, p2)))

def silhouette_score(X: list[list[float]], labels: list[int]) -> float:
    n = len(X)
    if n <= 1:
        return 0.0

    unique_labels = list(set(labels))
    if len(unique_labels) <= 1:
        return 0.0

    # Group indices by cluster
    clusters: dict[int, list[int]] = {}
    for i, lab in enumerate(labels):
        clusters.setdefault(lab, []).append(i)

    silhouettes = []
    for i in range(n):
        own_cluster = labels[i]
        own_members = clusters[own_cluster]

        # a(i): mean distance to own cluster (excluding self)
        if len(own_members) <= 1:
            a_i = 0.0
        else:
            a_i = sum(euclidean_distance(X[i], X[j]) for j in own_members if j != i) / (len(own_members) - 1)

        # b(i): min mean distance to any other cluster
        b_i = float("inf")
        for lab in unique_labels:
            if lab == own_cluster:
                continue
            members = clusters[lab]
            mean_dist = sum(euclidean_distance(X[i], X[j]) for j in members) / len(members)
            b_i = min(b_i, mean_dist)

        if max(a_i, b_i) == 0:
            silhouettes.append(0.0)
        else:
            silhouettes.append((b_i - a_i) / max(a_i, b_i))

    return round(sum(silhouettes) / len(silhouettes), 4)

Explanation

  1. For each point, compute the average distance to every other point in the same cluster — this is a(i).
  2. For each other cluster, compute the average distance from the point to all members of that cluster. Take the minimum — this is b(i).
  3. Silhouette value: (b - a) / max(a, b). Ranges from -1 (bad) to +1 (good).
  4. The overall score is the mean across all points.

Complexity

  • Time: O(n^2) — pairwise distance computation
  • Space: O(n) for storing cluster memberships and silhouette values