Implement the Square ReLU Activation Function

#373 · Deep Learning · Easy

Problem

Implement the Squared ReLU activation function, defined as f(x) = max(0, x)^2. This activation has been shown to improve performance in certain transformer architectures by producing sparser activations.

Solution

1

2

3

4

5

6

7

import numpy as np

def squared_relu(x: np.ndarray) -> np.ndarray:
    return np.maximum(0, x) ** 2

def squared_relu_derivative(x: np.ndarray) -> np.ndarray:
    return 2 * np.maximum(0, x)

Explanation

Apply standard ReLU: zero out all negative values.
Square the result: positive values become their squares, zeros remain zero.
The derivative is 2 * max(0, x) for the backward pass — twice the ReLU output.
Squared ReLU creates sparser activations than standard ReLU since small positive values become very small after squaring.

Complexity

Time: O(n) where n is the number of elements
Space: O(n) for the output

← #372 #374 →