← back

Implement Grid Search

#288 · Machine Learning · Medium

⊣ Solve on deep-ml.com

Problem

Implement grid search for hyperparameter tuning. Given a model class, a parameter grid (dictionary of parameter names to lists of values), training data, and a scoring function, find the best combination of hyperparameters using exhaustive search with cross-validation.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import itertools
import numpy as np

def cross_val_score(model_class, params, X, y, k=5):
    n = len(y)
    indices = np.arange(n)
    fold_size = n // k
    scores = []

    for i in range(k):
        val_start = i * fold_size
        val_end = val_start + fold_size if i < k - 1 else n
        val_idx = indices[val_start:val_end]
        train_idx = np.concatenate([indices[:val_start], indices[val_end:]])

        X_train, X_val = X[train_idx], X[val_idx]
        y_train, y_val = y[train_idx], y[val_idx]

        model = model_class(**params)
        model.fit(X_train, y_train)
        score = model.score(X_val, y_val)
        scores.append(score)

    return np.mean(scores)

def grid_search(model_class, param_grid: dict, X: np.ndarray, y: np.ndarray, k: int = 5) -> dict:
    keys = list(param_grid.keys())
    values = list(param_grid.values())
    best_score = -float('inf')
    best_params = None

    for combo in itertools.product(*values):
        params = dict(zip(keys, combo))
        score = cross_val_score(model_class, params, X, y, k)
        if score > best_score:
            best_score = score
            best_params = params

    return {"best_params": best_params, "best_score": best_score}

Explanation

  1. Generate all combinations of hyperparameter values using itertools.product.
  2. For each combination, evaluate the model using k-fold cross-validation.
  3. In each fold, train on k-1 folds and validate on the remaining fold. Average the scores.
  4. Track and return the parameter combination that achieves the highest mean cross-validation score.

Complexity

  • Time: O(P k T) where P is the total number of parameter combinations, k is the number of folds, and T is the time to train and evaluate one model
  • Space: O(n) for fold indices