← back

Calculate the Phi Coefficient

#95 · Statistics · Easy

⊣ Solve on deep-ml.com

Problem

Calculate the Phi coefficient (Matthews correlation coefficient for 2x2) between two binary variables. The Phi coefficient measures the association between two binary variables and ranges from -1 to 1.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
def phi_coefficient(x: list[int], y: list[int]) -> float:
    n11 = sum(1 for a, b in zip(x, y) if a == 1 and b == 1)
    n10 = sum(1 for a, b in zip(x, y) if a == 1 and b == 0)
    n01 = sum(1 for a, b in zip(x, y) if a == 0 and b == 1)
    n00 = sum(1 for a, b in zip(x, y) if a == 0 and b == 0)

    numerator = (n11 * n00) - (n10 * n01)
    denominator = ((n11 + n10) * (n11 + n01) * (n00 + n10) * (n00 + n01)) ** 0.5

    if denominator == 0:
        return 0.0
    return round(numerator / denominator, 4)

Explanation

  1. Build the 2x2 contingency table: count co-occurrences n11, n10, n01, n00.
  2. Numerator: (n11 * n00) - (n10 * n01) captures the difference between concordant and discordant pairs.
  3. Denominator: The geometric mean of the four marginal products normalizes the coefficient to [-1, 1].
  4. A value of +1 indicates perfect agreement, -1 indicates perfect disagreement, and 0 indicates no association.

Complexity

  • Time: O(n) where n is the number of observations
  • Space: O(1)