Implement the SwiGLU activation function used in modern transformer architectures (e.g., LLaMA). SwiGLU is defined as SwiGLU(x, W, V, b, c) = Swish(xW + b) * (xV + c) where Swish(t) = t * sigmoid(t).
import numpy as np
def swiglu(x: np.ndarray, W: np.ndarray, V: np.ndarray,
b: np.ndarray, c: np.ndarray) -> np.ndarray:
def sigmoid(t):
return 1.0 / (1.0 + np.exp(-t))
def swish(t):
return t * sigmoid(t)
gate = swish(x @ W + b)
linear = x @ V + c
return gate * linearxW + b, then apply Swish activation (t * sigmoid(t)).xV + c.