Implement a Simple Residual Block with Shortcut Connection

#113 · Deep Learning · Easy

Problem

Implement a simple residual block with a shortcut (skip) connection. Given an input, apply two transformations (e.g., linear layers with activation) and add the original input to the output. This is the fundamental building block of ResNet architectures.

Solution

import numpy as np

def residual_block(x: np.ndarray, W1: np.ndarray, b1: np.ndarray, W2: np.ndarray, b2: np.ndarray) -> np.ndarray:
    def relu(z):
        return np.maximum(0, z)

    # First transformation: linear + ReLU
    out = relu(x @ W1 + b1)

    # Second transformation: linear (no activation before adding residual)
    out = out @ W2 + b2

    # Shortcut connection: add the input
    out = out + x

    # Apply ReLU after addition
    out = relu(out)

    return out

Explanation

First layer: Apply a linear transformation followed by ReLU activation to extract features.
Second layer: Apply another linear transformation without activation.
Skip connection: Add the original input x to the output of the second layer. This creates a shortcut that allows gradients to flow directly through the network.
Final activation: Apply ReLU after the addition.
Key insight: The block learns a residual function F(x) = H(x) - x rather than the full mapping H(x). Learning residuals is easier because the identity mapping is already provided by the shortcut.

Complexity

Time: O(n * d^2) for the matrix multiplications where d is the hidden dimension
Space: O(n * d) for intermediate activations

← #112 #114 →