← back

Random Shuffle of Dataset

#29 · Machine Learning · Easy

⊣ Solve on deep-ml.com

Problem

Implement a function to randomly shuffle a dataset (list of data points) so that every permutation is equally likely. This is the Fisher-Yates shuffle.

Solution

1
2
3
4
5
6
7
8
9
10
11
import random

def shuffle_data(data: list, seed: int = None) -> list:
    if seed is not None:
        random.seed(seed)
    shuffled = list(data)
    n = len(shuffled)
    for i in range(n - 1, 0, -1):
        j = random.randint(0, i)
        shuffled[i], shuffled[j] = shuffled[j], shuffled[i]
    return shuffled

Explanation

  1. Create a copy of the data to avoid modifying the original.
  2. Use the Fisher-Yates shuffle: iterate from the last index down to 1, and at each step swap the current element with a randomly chosen element from index 0 to i (inclusive).
  3. Each permutation has equal probability, making this an unbiased shuffle.
  4. An optional seed parameter allows reproducible results.

Complexity

  • Time: O(n) where n is the length of the dataset
  • Space: O(n) for the copy of the data