Elastic Net Regression via Gradient Descent

#139 · Machine Learning · Medium

Problem

Implement Elastic Net Regression using gradient descent. Elastic Net combines L1 (Lasso) and L2 (Ridge) regularization penalties. Given training data, the L1 ratio, and the regularization strength, fit a linear model.

Solution

import numpy as np

def elastic_net(X: np.ndarray, y: np.ndarray, alpha: float = 1.0, l1_ratio: float = 0.5,
                learning_rate: float = 0.01, n_iterations: int = 1000) -> np.ndarray:
    n_samples, n_features = X.shape
    weights = np.zeros(n_features)
    bias = 0.0

    for _ in range(n_iterations):
        # Forward
        y_pred = X @ weights + bias
        error = y_pred - y

        # Gradients from MSE
        dw = (1.0 / n_samples) * (X.T @ error)
        db = (1.0 / n_samples) * np.sum(error)

        # L2 gradient: alpha * (1 - l1_ratio) * weights
        l2_grad = alpha * (1 - l1_ratio) * weights

        # L1 subgradient: alpha * l1_ratio * sign(weights)
        l1_grad = alpha * l1_ratio * np.sign(weights)

        # Update
        weights -= learning_rate * (dw + l2_grad + l1_grad)
        bias -= learning_rate * db

    return np.append(weights, bias)

Explanation

Elastic Net loss = MSE + alpha l1_ratio ||w||_1 + alpha (1 - l1_ratio) 0.5 * ||w||_2^2.
The MSE gradient is the standard X^T (y_pred - y) / N.
The L2 penalty gradient is alpha * (1 - l1_ratio) * w.
The L1 penalty uses the subgradient alpha * l1_ratio * sign(w) since |w| is not differentiable at 0.
When l1_ratio=1, this reduces to Lasso; when l1_ratio=0, it reduces to Ridge.

Complexity

Time: O(iterations N F) where N = samples, F = features
Space: O(F) for weights

← #138 #140 →