← back

Implement the Swish Activation Function

#102 · Deep Learning · Easy

⊣ Solve on deep-ml.com

Problem

Implement the Swish activation function. Swish is defined as f(x) = x * sigmoid(x) = x / (1 + exp(-x)). It was discovered through automated search and often outperforms ReLU.

Solution

1
2
3
4
5
import numpy as np

def swish(x: np.ndarray) -> np.ndarray:
    sigmoid = 1 / (1 + np.exp(-np.clip(x, -500, 500)))
    return x * sigmoid

Explanation

  1. Swish computes x * sigmoid(x), combining the input with a gating mechanism.
  2. For large positive x, sigmoid approaches 1, so Swish approaches the identity function (like ReLU).
  3. For large negative x, sigmoid approaches 0, so Swish approaches 0 (like ReLU).
  4. Unlike ReLU, Swish is smooth and non-monotonic -- it dips slightly below zero for small negative inputs before returning to zero. This property helps with optimization.
  5. np.clip prevents overflow in the exponential computation.

Complexity

  • Time: O(n) where n is the number of elements
  • Space: O(n) for the output array