Implement Gradient Boosting Regressor Step

#344 · Machine Learning · Medium

Problem

Implement a single step of gradient boosting regression. Given the current ensemble predictions, compute pseudo-residuals, fit a simple decision tree to them, and update the ensemble.

Solution

import numpy as np
from typing import Dict, Tuple

def decision_stump(X: np.ndarray, y: np.ndarray) -> Tuple[int, float, float, float]:
    best_feat, best_thresh = 0, 0.0
    best_mse = float('inf')
    best_left_val, best_right_val = 0.0, 0.0

    for feat in range(X.shape[1]):
        thresholds = np.unique(X[:, feat])
        for thresh in thresholds:
            left_mask = X[:, feat] <= thresh
            right_mask = ~left_mask
            if left_mask.sum() == 0 or right_mask.sum() == 0:
                continue

            left_val = y[left_mask].mean()
            right_val = y[right_mask].mean()
            pred = np.where(left_mask, left_val, right_val)
            mse = np.mean((y - pred) ** 2)

            if mse < best_mse:
                best_mse = mse
                best_feat = feat
                best_thresh = thresh
                best_left_val = left_val
                best_right_val = right_val

    return best_feat, best_thresh, best_left_val, best_right_val

def gradient_boosting_step(
    X: np.ndarray,
    y: np.ndarray,
    current_predictions: np.ndarray,
    learning_rate: float = 0.1
) -> Dict:
    # Compute pseudo-residuals (negative gradient of MSE loss)
    residuals = y - current_predictions

    # Fit a decision stump to the residuals
    feat, thresh, left_val, right_val = decision_stump(X, residuals)

    # Predict with the stump
    left_mask = X[:, feat] <= thresh
    stump_pred = np.where(left_mask, left_val, right_val)

    # Update predictions
    new_predictions = current_predictions + learning_rate * stump_pred
    new_mse = float(np.mean((y - new_predictions) ** 2))

    return {
        "feature": int(feat),
        "threshold": float(thresh),
        "left_value": float(left_val),
        "right_value": float(right_val),
        "new_predictions": new_predictions,
        "mse": new_mse,
        "residual_mean": float(np.mean(np.abs(residuals)))
    }

Explanation

Pseudo-residuals: For MSE loss, the negative gradient is simply y - current_prediction (the residuals).
Fit a weak learner: A decision stump (depth-1 tree) is fit to the residuals, finding the best feature and threshold to split on.
Update: Add the stump's predictions scaled by the learning rate to the ensemble predictions.
The learning rate controls the contribution of each tree, trading off convergence speed for accuracy.

Complexity

Time: O(n * p) where n is samples and p is features (for the stump fitting)
Space: O(n) for residuals and predictions

← #343 #345 →