← back

Multi-term Memory Patchification for Video

#456 · Deep Learning · Medium

⊣ Solve on deep-ml.com

Problem

Implement multi-term memory patchification for video frames. Given a sequence of video frames and a patch size, divide each frame into non-overlapping spatial patches and then group patches across multiple temporal terms (short-term, mid-term, long-term) based on provided term boundaries.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def patchify_video(
    frames: list, patch_size: int, term_boundaries: list[int]
) -> list[list]:
    T, H, W, C = len(frames), len(frames[0]), len(frames[0][0]), len(frames[0][0][0])
    pH, pW = H // patch_size, W // patch_size
    patches = []
    for t in range(T):
        frame_patches = []
        for pi in range(pH):
            for pj in range(pW):
                patch = []
                for di in range(patch_size):
                    for dj in range(patch_size):
                        patch.extend(frames[t][pi * patch_size + di][pj * patch_size + dj])
                frame_patches.append(patch)
        patches.append(frame_patches)
    boundaries = [0] + term_boundaries + [T]
    return [patches[boundaries[i]:boundaries[i + 1]] for i in range(len(boundaries) - 1)]

Explanation

  1. Patchification: Reshape each frame of size (H, W, C) into a grid of (pH, pW) patches, each of size (patch_size, patch_size, C), then flatten each patch into a vector.
  2. Temporal grouping: Use the provided term_boundaries to split the temporal axis into segments representing short-term, mid-term, and long-term memory.
  3. Each returned array has shape (num_frames_in_term, num_patches, patch_dim).

Complexity

  • Time: O(T H W * C) for reshaping all frames
  • Space: O(T pH pW * patch_dim) for the patchified representation