← back

Divide Dataset Based on Feature Threshold

#31 · Machine Learning · Medium

⊣ Solve on deep-ml.com

Problem

Divide a dataset into two subsets based on a feature threshold. Given a dataset (2D NumPy array), a feature index, and a threshold value, split the data into rows where the feature value is greater than or equal to the threshold and rows where it is less.

Solution

1
2
3
4
5
6
import numpy as np

def divide_on_feature(X, feature_index, threshold):
    left = X[X[:, feature_index] >= threshold]
    right = X[X[:, feature_index] < threshold]
    return left, right

Explanation

  1. Extract the column at feature_index from every row.
  2. Build a boolean mask where the feature value is >= threshold for the left split and < threshold for the right split.
  3. Use boolean indexing to select the corresponding rows.

Complexity

  • Time: O(n) where n is the number of rows
  • Space: O(n) for the two output arrays