← back

Dynamic Tanh: Normalization-Free Transformer Activation

#128 · Deep Learning · Easy

⊣ Solve on deep-ml.com

Problem

Implement the Dynamic Tanh (DyTanh) activation function, a normalization-free alternative for transformers. Given input x and learnable parameter alpha, compute tanh(alpha * x). This provides adaptive nonlinearity without requiring layer normalization.

Solution

1
2
3
4
5
6
7
8
import numpy as np

def dynamic_tanh(x: np.ndarray, alpha: float = 1.0) -> np.ndarray:
    return np.tanh(alpha * x)

def dynamic_tanh_with_params(x: np.ndarray, alpha: np.ndarray, gamma: np.ndarray, beta: np.ndarray) -> np.ndarray:
    # Full version: gamma * tanh(alpha * x) + beta
    return gamma * np.tanh(alpha * x) + beta

Explanation

  1. The core operation is tanh(alpha * x) where alpha controls the steepness of the activation.
  2. When alpha is large, DyTanh approaches a step function; when small, it's nearly linear.
  3. The full parameterized version adds learnable scale (gamma) and shift (beta) to give the model more expressiveness.
  4. This can replace LayerNorm + activation in transformer blocks.

Complexity

  • Time: O(n) where n is the number of elements in x
  • Space: O(n) for the output