Implement 2D Average Pooling

#265 · Deep Learning · Easy

Problem

Implement 2D Average Pooling from scratch. Given a 2D input matrix, a kernel size, and a stride, slide the pooling window across the input and compute the average of each region.

Solution

Iterate over valid positions with the given stride, extract the window, and compute the mean.

def avg_pool2d(
    matrix: list[list[float]],
    kernel_size: int | tuple[int, int] = 2,
    stride: int | tuple[int, int] | None = None,
    padding: int = 0,
) -> list[list[float]]:
    if isinstance(kernel_size, int):
        kh, kw = kernel_size, kernel_size
    else:
        kh, kw = kernel_size

    if stride is None:
        sh, sw = kh, kw
    elif isinstance(stride, int):
        sh, sw = stride, stride
    else:
        sh, sw = stride

    h = len(matrix)
    w = len(matrix[0]) if h > 0 else 0

    # Apply padding
    if padding > 0:
        padded = []
        padded_w = w + 2 * padding
        for _ in range(padding):
            padded.append([0.0] * padded_w)
        for row in matrix:
            padded.append([0.0] * padding + row + [0.0] * padding)
        for _ in range(padding):
            padded.append([0.0] * padded_w)
        matrix = padded
        h = len(matrix)
        w = len(matrix[0])

    out_h = (h - kh) // sh + 1
    out_w = (w - kw) // sw + 1

    output = []
    for i in range(out_h):
        row = []
        for j in range(out_w):
            total = 0.0
            count = 0
            for ki in range(kh):
                for kj in range(kw):
                    total += matrix[i * sh + ki][j * sw + kj]
                    count += 1
            row.append(round(total / count, 6))
        output.append(row)

    return output

Explanation

Parse kernel size and stride (default stride equals kernel size).
Optionally pad the input with zeros.
Compute output dimensions: (input_size - kernel_size) // stride + 1.
Slide the window across the input, computing the mean of values in each window position.
Average pooling reduces spatial dimensions and provides translation invariance.

Complexity

Time: O(out_h out_w kh * kw)
Space: O(out_h * out_w) for the output

← #264 #266 →