#373 · Deep Learning · Easy
⊣ Solve on deep-ml.comImplement the Squared ReLU activation function, defined as f(x) = max(0, x)^2. This activation has been shown to improve performance in certain transformer architectures by producing sparser activations.
import numpy as np
def squared_relu(x: np.ndarray) -> np.ndarray:
return np.maximum(0, x) ** 2
def squared_relu_derivative(x: np.ndarray) -> np.ndarray:
return 2 * np.maximum(0, x)2 * max(0, x) for the backward pass — twice the ReLU output.