Decompose the end-to-end latency of an LLM inference request into its component parts: tokenization time, prefill (prompt processing) time, decode (token generation) time, and detokenization time. Given a list of stage timings, compute the total latency and the percentage contribution of each stage.
def decompose_latency(stage_times: dict[str, float]) -> dict:
total = sum(stage_times.values())
result = {"total_latency": round(total, 4)}
for stage, t in stage_times.items():
pct = (t / total * 100) if total > 0 else 0.0
result[stage] = {"time": round(t, 4), "percentage": round(pct, 2)}
return result