#411 · Machine Learning · Easy
⊣ Solve on deep-ml.comGiven a list of token timestamps from an LLM generation stream, compute three key latency metrics:
Return a dictionary with keys "ttft", "itl", and "tps", each rounded to 4 decimal places.
def compute_token_metrics(timestamps: list[float]) -> dict:
n = len(timestamps)
if n == 0:
return {"ttft": 0.0, "itl": 0.0, "tps": 0.0}
ttft = timestamps[0]
if n == 1:
return {"ttft": round(ttft, 4), "itl": 0.0, "tps": round(1.0 / ttft if ttft > 0 else 0.0, 4)}
total_time = timestamps[-1] - timestamps[0]
itl = total_time / (n - 1) if n > 1 else 0.0
total_gen_time = timestamps[-1]
tps = n / total_gen_time if total_gen_time > 0 else 0.0
return {
"ttft": round(ttft, 4),
"itl": round(itl, 4),
"tps": round(tps, 4)
}(n - 1) intervals.