Train Logistic Regression with Gradient Descent

#106 · Machine Learning · Hard

Problem

Train a logistic regression model using gradient descent on binary classification data. Implement the training loop from scratch, computing the binary cross-entropy loss gradient and updating weights iteratively.

Solution

import numpy as np

def train_logistic_regression(X: np.ndarray, y: np.ndarray, lr: float = 0.01, epochs: int = 1000, tol: float = 1e-6) -> dict:
    n_samples, n_features = X.shape
    weights = np.zeros(n_features)
    bias = 0.0
    losses = []

    for epoch in range(epochs):
        z = X @ weights + bias
        predictions = 1 / (1 + np.exp(-np.clip(z, -500, 500)))

        # Binary cross-entropy loss
        eps = 1e-15
        loss = -np.mean(y * np.log(predictions + eps) + (1 - y) * np.log(1 - predictions + eps))
        losses.append(loss)

        # Gradients
        error = predictions - y
        dw = (X.T @ error) / n_samples
        db = np.mean(error)

        weights -= lr * dw
        bias -= lr * db

        # Convergence check
        if epoch > 0 and abs(losses[-2] - losses[-1]) < tol:
            break

    return {"weights": weights, "bias": bias, "losses": losses}

Explanation

Initialize weights to zeros and bias to 0.
Forward pass: Compute sigmoid(Xw + b) to get predicted probabilities.
Loss: Binary cross-entropy -mean(y*log(p) + (1-y)*log(1-p)) measures how far predictions are from true labels. Add epsilon to avoid log(0).
Gradients: The gradient of BCE with respect to weights is X^T(predictions - y) / n. This elegant form comes from the derivative of sigmoid composed with log loss.
Convergence: Stop early if the loss change falls below a tolerance threshold.

Complexity

Time: O(epochs n d) where n is samples and d is features
Space: O(n + d) for predictions and weight vectors

← #105 #107 →