← back

Implement the Mish Activation Function

#262 · Deep Learning · Easy

⊣ Solve on deep-ml.com

Problem

Implement the Mish activation function: mish(x) = x * tanh(softplus(x)) where softplus(x) = ln(1 + e^x).

Solution

Apply the Mish formula element-wise. Handle numerical stability for large values.

1
2
3
4
5
6
7
8
9
10
11
import math

def mish(x: float) -> float:
    if x > 20.0:
        return x  # tanh(softplus(x)) -> 1 for large x
    softplus = math.log(1.0 + math.exp(x))
    return x * math.tanh(softplus)


def mish_list(values: list[float]) -> list[float]:
    return [round(mish(v), 6) for v in values]

Explanation

  1. Compute softplus(x) = ln(1 + e^x).
  2. Apply tanh to the softplus result.
  3. Multiply by x: mish(x) = x * tanh(softplus(x)).
  4. For large positive x, softplus(x) is approximately x, tanh(x) is approximately 1, so mish(x) is approximately x.
  5. Mish is a smooth, non-monotonic activation that allows small negative gradients, which can improve training.

Complexity

  • Time: O(n) for n elements
  • Space: O(n) for output