← back

The Pattern Weaver's Code

#89 · Deep Learning · Medium

⊣ Solve on deep-ml.com

Problem

Implement a pattern recognition neural network (The Pattern Weaver's Code). Given input data, build a simple feedforward neural network with one hidden layer that can learn to classify patterns. You need to implement forward propagation with sigmoid activations and train using backpropagation with gradient descent.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import numpy as np

def pattern_recognition_nn(X: np.ndarray, y: np.ndarray, hidden_size: int = 4, lr: float = 0.1, epochs: int = 1000) -> tuple:
    np.random.seed(42)
    input_size = X.shape[1]
    output_size = 1

    W1 = np.random.randn(input_size, hidden_size) * 0.5
    b1 = np.zeros((1, hidden_size))
    W2 = np.random.randn(hidden_size, output_size) * 0.5
    b2 = np.zeros((1, output_size))

    def sigmoid(z):
        return 1 / (1 + np.exp(-np.clip(z, -500, 500)))

    for _ in range(epochs):
        # Forward pass
        z1 = X @ W1 + b1
        a1 = sigmoid(z1)
        z2 = a1 @ W2 + b2
        a2 = sigmoid(z2)

        # Backpropagation
        m = X.shape[0]
        dz2 = a2 - y.reshape(-1, 1)
        dW2 = (a1.T @ dz2) / m
        db2 = np.sum(dz2, axis=0, keepdims=True) / m

        dz1 = (dz2 @ W2.T) * a1 * (1 - a1)
        dW1 = (X.T @ dz1) / m
        db1 = np.sum(dz1, axis=0, keepdims=True) / m

        W2 -= lr * dW2
        b2 -= lr * db2
        W1 -= lr * dW1
        b1 -= lr * db1

    # Final predictions
    z1 = X @ W1 + b1
    a1 = sigmoid(z1)
    z2 = a1 @ W2 + b2
    predictions = sigmoid(z2)
    return (predictions > 0.5).astype(int).flatten()

Explanation

  1. Initialize weights randomly and biases to zero for a two-layer network (input -> hidden -> output).
  2. Forward pass: Compute linear transformations followed by sigmoid activations at each layer.
  3. Backpropagation: Compute gradients of the binary cross-entropy loss with respect to each weight and bias using the chain rule. The sigmoid derivative is a * (1 - a).
  4. Gradient descent: Update all parameters by subtracting the learning rate times the gradient.
  5. After training, run a final forward pass and threshold predictions at 0.5.

Complexity

  • Time: O(epochs n h) where n is the number of samples and h is the hidden size
  • Space: O(n * h) for storing activations