← back

Implement 2D Average Pooling

#265 · Deep Learning · Easy

⊣ Solve on deep-ml.com

Problem

Implement 2D Average Pooling from scratch. Given a 2D input matrix, a kernel size, and a stride, slide the pooling window across the input and compute the average of each region.

Solution

Iterate over valid positions with the given stride, extract the window, and compute the mean.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def avg_pool2d(
    matrix: list[list[float]],
    kernel_size: int | tuple[int, int] = 2,
    stride: int | tuple[int, int] | None = None,
    padding: int = 0,
) -> list[list[float]]:
    if isinstance(kernel_size, int):
        kh, kw = kernel_size, kernel_size
    else:
        kh, kw = kernel_size

    if stride is None:
        sh, sw = kh, kw
    elif isinstance(stride, int):
        sh, sw = stride, stride
    else:
        sh, sw = stride

    h = len(matrix)
    w = len(matrix[0]) if h > 0 else 0

    # Apply padding
    if padding > 0:
        padded = []
        padded_w = w + 2 * padding
        for _ in range(padding):
            padded.append([0.0] * padded_w)
        for row in matrix:
            padded.append([0.0] * padding + row + [0.0] * padding)
        for _ in range(padding):
            padded.append([0.0] * padded_w)
        matrix = padded
        h = len(matrix)
        w = len(matrix[0])

    out_h = (h - kh) // sh + 1
    out_w = (w - kw) // sw + 1

    output = []
    for i in range(out_h):
        row = []
        for j in range(out_w):
            total = 0.0
            count = 0
            for ki in range(kh):
                for kj in range(kw):
                    total += matrix[i * sh + ki][j * sw + kj]
                    count += 1
            row.append(round(total / count, 6))
        output.append(row)

    return output

Explanation

  1. Parse kernel size and stride (default stride equals kernel size).
  2. Optionally pad the input with zeros.
  3. Compute output dimensions: (input_size - kernel_size) // stride + 1.
  4. Slide the window across the input, computing the mean of values in each window position.
  5. Average pooling reduces spatial dimensions and provides translation invariance.

Complexity

  • Time: O(out_h out_w kh * kw)
  • Space: O(out_h * out_w) for the output