Dynamic Tanh: Normalization-Free Transformer Activation

#128 · Deep Learning · Easy

Problem

Implement the Dynamic Tanh (DyTanh) activation function, a normalization-free alternative for transformers. Given input x and learnable parameter alpha, compute tanh(alpha * x). This provides adaptive nonlinearity without requiring layer normalization.

Solution

import numpy as np

def dynamic_tanh(x: np.ndarray, alpha: float = 1.0) -> np.ndarray:
    return np.tanh(alpha * x)

def dynamic_tanh_with_params(x: np.ndarray, alpha: np.ndarray, gamma: np.ndarray, beta: np.ndarray) -> np.ndarray:
    # Full version: gamma * tanh(alpha * x) + beta
    return gamma * np.tanh(alpha * x) + beta

Explanation

The core operation is tanh(alpha * x) where alpha controls the steepness of the activation.
When alpha is large, DyTanh approaches a step function; when small, it's nearly linear.
The full parameterized version adds learnable scale (gamma) and shift (beta) to give the model more expressiveness.
This can replace LayerNorm + activation in transformer blocks.

Complexity

Time: O(n) where n is the number of elements in x
Space: O(n) for the output

← #127 #129 →