← back

Implementing a Custom Dense Layer in Python

#40 · Deep Learning · Hard

⊣ Solve on deep-ml.com

Problem

Implement a custom dense (fully connected) layer. The layer performs a linear transformation output = X @ W + b and optionally applies an activation function. Implement both forward pass and backward pass (gradient computation).

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import numpy as np

class DenseLayer:
    def __init__(self, input_size, output_size, activation=None, seed=None):
        if seed is not None:
            np.random.seed(seed)
        self.W = np.random.randn(input_size, output_size) * 0.01
        self.b = np.zeros((1, output_size))
        self.activation = activation
        self.input = None
        self.z = None

    def relu(self, x):
        return np.maximum(0, x)

    def relu_deriv(self, x):
        return (x > 0).astype(float)

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-np.clip(x, -500, 500)))

    def sigmoid_deriv(self, x):
        s = self.sigmoid(x)
        return s * (1 - s)

    def forward(self, X):
        self.input = X
        self.z = X @ self.W + self.b
        if self.activation == 'relu':
            return self.relu(self.z)
        elif self.activation == 'sigmoid':
            return self.sigmoid(self.z)
        return self.z

    def backward(self, d_out, lr=0.01):
        if self.activation == 'relu':
            d_out = d_out * self.relu_deriv(self.z)
        elif self.activation == 'sigmoid':
            d_out = d_out * self.sigmoid_deriv(self.z)

        dW = self.input.T @ d_out / self.input.shape[0]
        db = np.mean(d_out, axis=0, keepdims=True)
        d_input = d_out @ self.W.T

        self.W -= lr * dW
        self.b -= lr * db

        return d_input

Explanation

  1. Initialization: Weights are initialized with small random values; biases start at zero.
  2. Forward pass: Compute the linear transformation z = X @ W + b, then apply the activation function if specified.
  3. Backward pass: Multiply the upstream gradient by the activation derivative, compute gradients for W and b, update parameters, and return the gradient with respect to the input for the previous layer.

Complexity

  • Time: O(n d_in d_out) for both forward and backward passes
  • Space: O(d_in d_out) for weights plus O(n d_out) for cached values