Implement K-Fold Cross-Validation

#18 · Machine Learning · Medium

Problem

Implement K-Fold Cross-Validation from scratch. Split a dataset into K folds, and for each fold, use it as the validation set while the remaining folds form the training set.

Solution

def cross_validation_split(data: list, k: int) -> list:
    fold_size = len(data) // k
    folds = []
    for i in range(k):
        start = i * fold_size
        end = start + fold_size
        validation = data[start:end]
        training = data[:start] + data[end:]
        folds.append([training, validation])
    return folds

Explanation

Determine the fold size by dividing the dataset length by K.
For each fold index i, the validation set is the slice from i * fold_size to (i+1) * fold_size.
The training set is everything before and after that slice, concatenated.
Return a list of [training, validation] pairs.

Complexity

Time: O(k * n) where n is the dataset size (due to list slicing)
Space: O(k * n) for all the folds

← #17 #19 →