← back

Compute Arithmetic Intensity and Classify Bottleneck

#414 · Inference · Easy

⊣ Solve on deep-ml.com

Problem

Given the total FLOPs and total bytes of memory traffic for a particular operation, compute the arithmetic intensity (FLOPs per byte). Then, given the hardware's compute ceiling and memory bandwidth, classify the operation as compute-bound or memory-bound using the roofline model.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def compute_arithmetic_intensity(
    flops: float,
    memory_bytes: float,
    peak_flops: float,
    peak_bandwidth: float
) -> dict:
    arithmetic_intensity = flops / memory_bytes if memory_bytes > 0 else float('inf')
    ridge_point = peak_flops / peak_bandwidth
    if arithmetic_intensity >= ridge_point:
        bottleneck = "compute-bound"
        attainable_flops = peak_flops
    else:
        bottleneck = "memory-bound"
        attainable_flops = arithmetic_intensity * peak_bandwidth
    return {
        "arithmetic_intensity": round(arithmetic_intensity, 4),
        "ridge_point": round(ridge_point, 4),
        "bottleneck": bottleneck,
        "attainable_flops": round(attainable_flops, 4)
    }

Explanation

  1. Arithmetic intensity = total FLOPs / total bytes moved. It measures how much computation is done per byte of data transferred.
  2. The ridge point = peak compute / peak bandwidth. This is the arithmetic intensity at which the operation transitions from memory-bound to compute-bound.
  3. If the operation's arithmetic intensity is below the ridge point, performance is limited by memory bandwidth. Otherwise, it is limited by compute.
  4. The attainable FLOPs is min(peak_flops, arithmetic_intensity * peak_bandwidth).

Complexity

  • Time: O(1)
  • Space: O(1)