← back

Overlapping Max Pooling

#190 · Deep Learning · Medium

⊣ Solve on deep-ml.com

Problem

Implement Overlapping Max Pooling. Unlike standard max pooling where stride equals kernel size, overlapping pooling uses a stride smaller than the kernel size, so pooling windows overlap.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import numpy as np

def overlapping_max_pool(x: np.ndarray, kernel_size: int = 3,
                         stride: int = 2) -> np.ndarray:
    if x.ndim == 2:
        H, W = x.shape
        out_h = (H - kernel_size) // stride + 1
        out_w = (W - kernel_size) // stride + 1
        output = np.zeros((out_h, out_w))
        for i in range(out_h):
            for j in range(out_w):
                h_start = i * stride
                w_start = j * stride
                region = x[h_start:h_start + kernel_size,
                           w_start:w_start + kernel_size]
                output[i, j] = np.max(region)
        return output

    # 4D: (batch, channels, H, W)
    batch, channels, H, W = x.shape
    out_h = (H - kernel_size) // stride + 1
    out_w = (W - kernel_size) // stride + 1
    output = np.zeros((batch, channels, out_h, out_w))

    for b in range(batch):
        for c in range(channels):
            for i in range(out_h):
                for j in range(out_w):
                    h_start = i * stride
                    w_start = j * stride
                    region = x[b, c,
                               h_start:h_start + kernel_size,
                               w_start:w_start + kernel_size]
                    output[b, c, i, j] = np.max(region)
    return output

Explanation

  1. Compute the output spatial dimensions: out = (input_size - kernel_size) // stride + 1.
  2. Slide the pooling window across the input with the given stride. Since stride < kernel_size, adjacent windows overlap.
  3. For each window position, take the maximum value.
  4. Handles both 2D (single feature map) and 4D (batch of multi-channel feature maps) inputs.

Complexity

  • Time: O(B C out_h out_w kernel_size^2)
  • Space: O(B C out_h * out_w) for the output