← back

Generate Random Subsets of a Dataset

#33 · Machine Learning · Medium

⊣ Solve on deep-ml.com

Problem

Generate random subsets of a dataset. Given a 2D NumPy array and the number of subsets to generate, return random subsets of the data. Each subset is sampled with replacement and has the same number of rows as the original dataset.

Solution

1
2
3
4
5
6
7
8
9
10
11
import numpy as np

def generate_random_subsets(X, n_subsets, seed=None):
    if seed is not None:
        np.random.seed(seed)
    n_samples = X.shape[0]
    subsets = []
    for _ in range(n_subsets):
        indices = np.random.choice(n_samples, size=n_samples, replace=True)
        subsets.append(X[indices])
    return subsets

Explanation

  1. Determine the number of samples in the dataset.
  2. For each subset, randomly sample n_samples indices with replacement using np.random.choice.
  3. Index into the dataset with the sampled indices to create each bootstrap subset.
  4. Collect and return all subsets.

Complexity

  • Time: O(n_subsets * n_samples)
  • Space: O(n_subsets n_samples n_features) for storing all subsets