#414 · Inference · Easy
⊣ Solve on deep-ml.comGiven the total FLOPs and total bytes of memory traffic for a particular operation, compute the arithmetic intensity (FLOPs per byte). Then, given the hardware's compute ceiling and memory bandwidth, classify the operation as compute-bound or memory-bound using the roofline model.
def compute_arithmetic_intensity(
flops: float,
memory_bytes: float,
peak_flops: float,
peak_bandwidth: float
) -> dict:
arithmetic_intensity = flops / memory_bytes if memory_bytes > 0 else float('inf')
ridge_point = peak_flops / peak_bandwidth
if arithmetic_intensity >= ridge_point:
bottleneck = "compute-bound"
attainable_flops = peak_flops
else:
bottleneck = "memory-bound"
attainable_flops = arithmetic_intensity * peak_bandwidth
return {
"arithmetic_intensity": round(arithmetic_intensity, 4),
"ridge_point": round(ridge_point, 4),
"bottleneck": bottleneck,
"attainable_flops": round(attainable_flops, 4)
}