← back

Distance Correlation for Measuring Metadata Dependence

#359 · Deep Learning · Medium

⊣ Solve on deep-ml.com

Problem

Implement distance correlation, a measure of dependence between two random variables that can detect nonlinear relationships (unlike Pearson correlation). Given two sample vectors X and Y, compute their distance correlation.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import numpy as np

def distance_correlation(X: np.ndarray, Y: np.ndarray) -> float:
    n = len(X)
    X = X.reshape(-1, 1) if X.ndim == 1 else X
    Y = Y.reshape(-1, 1) if Y.ndim == 1 else Y

    # Compute pairwise distance matrices
    def pairwise_dist(Z):
        return np.sqrt(np.sum((Z[:, None] - Z[None, :]) ** 2, axis=-1))

    a = pairwise_dist(X)
    b = pairwise_dist(Y)

    # Double center the distance matrices
    def double_center(D):
        row_mean = D.mean(axis=1, keepdims=True)
        col_mean = D.mean(axis=0, keepdims=True)
        grand_mean = D.mean()
        return D - row_mean - col_mean + grand_mean

    A = double_center(a)
    B = double_center(b)

    # Compute distance covariance and variances
    dcov_xy = np.sqrt(np.maximum((A * B).sum() / (n * n), 0))
    dvar_x = np.sqrt(np.maximum((A * A).sum() / (n * n), 0))
    dvar_y = np.sqrt(np.maximum((B * B).sum() / (n * n), 0))

    if dvar_x * dvar_y == 0:
        return 0.0
    return dcov_xy / np.sqrt(dvar_x * dvar_y)

Explanation

  1. Compute pairwise Euclidean distance matrices for X and Y.
  2. Double-center each distance matrix by subtracting row means, column means, and adding back the grand mean.
  3. Distance covariance is the root of the mean of element-wise products of centered matrices.
  4. Distance correlation normalizes by the product of the individual distance standard deviations.

Complexity

  • Time: O(n^2 * d) for pairwise distances
  • Space: O(n^2) for the distance matrices