← back

Rotary Positional Embeddings (RoPE)

#381 · Deep Learning · Medium

⊣ Solve on deep-ml.com

Problem

Implement Rotary Positional Embeddings (RoPE), which encode position information by rotating query and key vectors in pairs of dimensions. This allows the dot product between queries and keys to naturally encode relative position.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import numpy as np

def precompute_freqs(d_model: int, max_len: int, base: float = 10000.0) -> tuple[np.ndarray, np.ndarray]:
    freqs = 1.0 / (base ** (np.arange(0, d_model, 2) / d_model))
    positions = np.arange(max_len)
    angles = np.outer(positions, freqs)
    cos = np.cos(angles)
    sin = np.sin(angles)
    return cos, sin

def apply_rope(x: np.ndarray, cos: np.ndarray, sin: np.ndarray) -> np.ndarray:
    # x shape: (batch, seq_len, d_model) or (seq_len, d_model)
    d = x.shape[-1]
    x1 = x[..., :d // 2]
    x2 = x[..., d // 2:]

    seq_len = x.shape[-2]
    c = cos[:seq_len]
    s = sin[:seq_len]

    # Rotate pairs: (x1, x2) -> (x1*cos - x2*sin, x1*sin + x2*cos)
    out1 = x1 * c - x2 * s
    out2 = x1 * s + x2 * c
    return np.concatenate([out1, out2], axis=-1)

Explanation

  1. Precompute rotation frequencies: each pair of dimensions rotates at a different frequency determined by theta_i = 1 / base^(2i/d).
  2. For each position, compute cos and sin of position * theta_i.
  3. Apply rotation to pairs of dimensions: split the vector into first half (x1) and second half (x2), then apply 2D rotation: (x1*cos - x2*sin, x1*sin + x2*cos).
  4. When computing QK^T, the rotation naturally encodes relative position since `R(m)^T R(n) = R(n-m)`.

Complexity

  • Time: O(seq_len * d_model) for applying rotations
  • Space: O(max_len * d_model/2) for precomputed frequencies