Implement multi-term memory patchification for video frames. Given a sequence of video frames and a patch size, divide each frame into non-overlapping spatial patches and then group patches across multiple temporal terms (short-term, mid-term, long-term) based on provided term boundaries.
def patchify_video(
frames: list, patch_size: int, term_boundaries: list[int]
) -> list[list]:
T, H, W, C = len(frames), len(frames[0]), len(frames[0][0]), len(frames[0][0][0])
pH, pW = H // patch_size, W // patch_size
patches = []
for t in range(T):
frame_patches = []
for pi in range(pH):
for pj in range(pW):
patch = []
for di in range(patch_size):
for dj in range(patch_size):
patch.extend(frames[t][pi * patch_size + di][pj * patch_size + dj])
frame_patches.append(patch)
patches.append(frame_patches)
boundaries = [0] + term_boundaries + [T]
return [patches[boundaries[i]:boundaries[i + 1]] for i in range(len(boundaries) - 1)](H, W, C) into a grid of (pH, pW) patches, each of size (patch_size, patch_size, C), then flatten each patch into a vector.term_boundaries to split the temporal axis into segments representing short-term, mid-term, and long-term memory.(num_frames_in_term, num_patches, patch_dim).