← back

K-Means Clustering

#17 · Machine Learning · Medium

⊣ Solve on deep-ml.com

Problem

Implement the K-Means clustering algorithm from scratch. Given a dataset and K initial centroids, iteratively assign points to the nearest centroid and update centroids until convergence or a maximum number of iterations.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def k_means_clustering(points: list[tuple[float, float]], k: int,
                        initial_centroids: list[tuple[float, float]],
                        max_iterations: int) -> list[tuple[float, float]]:
    centroids = [list(c) for c in initial_centroids]
    num_features = len(points[0])

    for _ in range(max_iterations):
        # Assign each point to nearest centroid
        clusters = [[] for _ in range(k)]
        for point in points:
            min_dist = float('inf')
            best = 0
            for i, centroid in enumerate(centroids):
                dist = sum((point[d] - centroid[d]) ** 2 for d in range(num_features))
                if dist < min_dist:
                    min_dist = dist
                    best = i
            clusters[best].append(point)

        # Update centroids
        new_centroids = []
        for i in range(k):
            if clusters[i]:
                new_centroid = [
                    sum(p[d] for p in clusters[i]) / len(clusters[i])
                    for d in range(num_features)
                ]
                new_centroids.append(new_centroid)
            else:
                new_centroids.append(centroids[i])

        # Check convergence
        if new_centroids == centroids:
            break
        centroids = new_centroids

    return [tuple(round(c, 4) for c in centroid) for centroid in centroids]

Explanation

  1. Initialize centroids from the provided starting positions.
  2. Assignment step: For each point, find the nearest centroid using Euclidean distance (squared).
  3. Update step: Recompute each centroid as the mean of all points assigned to it.
  4. Repeat until centroids no longer change or max iterations is reached.

Complexity

  • Time: O(iterations n k * f) where n is points, k is clusters, f is features
  • Space: O(n) for cluster assignments