← back

Outlier Detection and Removal Using IQR Method

#355 · Data Preprocessing · Medium

⊣ Solve on deep-ml.com

Problem

Detect and remove outliers from a dataset using the Interquartile Range (IQR) method. For each feature, values below Q1 - 1.5IQR or above Q3 + 1.5IQR are considered outliers. Remove any row that has an outlier in any feature.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import numpy as np

def remove_outliers_iqr(X: np.ndarray) -> np.ndarray:
    n, d = X.shape
    mask = np.ones(n, dtype=bool)

    for col in range(d):
        q1 = np.percentile(X[:, col], 25)
        q3 = np.percentile(X[:, col], 75)
        iqr = q3 - q1
        lower = q1 - 1.5 * iqr
        upper = q3 + 1.5 * iqr
        mask &= (X[:, col] >= lower) & (X[:, col] <= upper)

    return X[mask]

Explanation

  1. For each feature column, compute Q1 (25th percentile) and Q3 (75th percentile).
  2. Calculate the IQR as Q3 - Q1, then define the acceptable range as [Q1 - 1.5IQR, Q3 + 1.5IQR].
  3. Mark any row as an outlier if it falls outside the acceptable range in any feature.
  4. Return only the non-outlier rows.

Complexity

  • Time: O(n d log n) for percentile calculation per feature
  • Space: O(n) for the boolean mask