Random Shuffle of Dataset

#29 · Machine Learning · Easy

Problem

Implement a function to randomly shuffle a dataset (list of data points) so that every permutation is equally likely. This is the Fisher-Yates shuffle.

Solution

import random

def shuffle_data(data: list, seed: int = None) -> list:
    if seed is not None:
        random.seed(seed)
    shuffled = list(data)
    n = len(shuffled)
    for i in range(n - 1, 0, -1):
        j = random.randint(0, i)
        shuffled[i], shuffled[j] = shuffled[j], shuffled[i]
    return shuffled

Explanation

Create a copy of the data to avoid modifying the original.
Use the Fisher-Yates shuffle: iterate from the last index down to 1, and at each step swap the current element with a randomly chosen element from index 0 to i (inclusive).
Each permutation has equal probability, making this an unbiased shuffle.
An optional seed parameter allows reproducible results.

Complexity

Time: O(n) where n is the length of the dataset
Space: O(n) for the copy of the data

← #28 #30 →