← back

Implement the Square ReLU Activation Function

#373 · Deep Learning · Easy

⊣ Solve on deep-ml.com

Problem

Implement the Squared ReLU activation function, defined as f(x) = max(0, x)^2. This activation has been shown to improve performance in certain transformer architectures by producing sparser activations.

Solution

1
2
3
4
5
6
7
import numpy as np

def squared_relu(x: np.ndarray) -> np.ndarray:
    return np.maximum(0, x) ** 2

def squared_relu_derivative(x: np.ndarray) -> np.ndarray:
    return 2 * np.maximum(0, x)

Explanation

  1. Apply standard ReLU: zero out all negative values.
  2. Square the result: positive values become their squares, zeros remain zero.
  3. The derivative is 2 * max(0, x) for the backward pass — twice the ReLU output.
  4. Squared ReLU creates sparser activations than standard ReLU since small positive values become very small after squaring.

Complexity

  • Time: O(n) where n is the number of elements
  • Space: O(n) for the output