Implement the Swish activation function. Swish is defined as f(x) = x * sigmoid(x) = x / (1 + exp(-x)). It was discovered through automated search and often outperforms ReLU.
import numpy as np
def swish(x: np.ndarray) -> np.ndarray:
sigmoid = 1 / (1 + np.exp(-np.clip(x, -500, 500)))
return x * sigmoidx * sigmoid(x), combining the input with a gating mechanism.x, sigmoid approaches 1, so Swish approaches the identity function (like ReLU).x, sigmoid approaches 0, so Swish approaches 0 (like ReLU).np.clip prevents overflow in the exponential computation.