#355 · Data Preprocessing · Medium
⊣ Solve on deep-ml.comDetect and remove outliers from a dataset using the Interquartile Range (IQR) method. For each feature, values below Q1 - 1.5IQR or above Q3 + 1.5IQR are considered outliers. Remove any row that has an outlier in any feature.
import numpy as np
def remove_outliers_iqr(X: np.ndarray) -> np.ndarray:
n, d = X.shape
mask = np.ones(n, dtype=bool)
for col in range(d):
q1 = np.percentile(X[:, col], 25)
q3 = np.percentile(X[:, col], 75)
iqr = q3 - q1
lower = q1 - 1.5 * iqr
upper = q3 + 1.5 * iqr
mask &= (X[:, col] >= lower) & (X[:, col] <= upper)
return X[mask]