Break-Even Pay-Per-Token API vs Dedicated GPU

#451 · Machine Learning · Medium

Problem

Determine the break-even point (in tokens) for using a pay-per-token API versus a dedicated GPU. Given the cost per token for the API, the fixed hourly cost of a GPU, and the GPU's throughput in tokens per second, compute the number of tokens per hour at which both options cost the same.

Solution

1

2

def break_even_tokens(cost_per_token: float, gpu_hourly_cost: float, gpu_tokens_per_sec: float) -> float:
    return gpu_hourly_cost / cost_per_token

Explanation

The API cost for N tokens is N * cost_per_token.
The GPU cost for one hour is the fixed gpu_hourly_cost, regardless of how many tokens are processed (up to its throughput limit).
At the break-even point the two costs are equal: N * cost_per_token = gpu_hourly_cost, so N = gpu_hourly_cost / cost_per_token.
The gpu_tokens_per_sec parameter can be used to verify that the break-even volume is actually achievable: the GPU can produce at most gpu_tokens_per_sec * 3600 tokens per hour.

Complexity

Time: O(1)
Space: O(1)

← #450 #452 →