Compute TTFT ITL and TPS from a Token Timestamp Stream

#411 · Machine Learning · Easy

Problem

Given a list of token timestamps from an LLM generation stream, compute three key latency metrics:

TTFT (Time To First Token): the time from the start until the first token is produced.
ITL (Inter-Token Latency): the average time between consecutive tokens after the first.
TPS (Tokens Per Second): total tokens generated divided by total generation time.

Return a dictionary with keys "ttft", "itl", and "tps", each rounded to 4 decimal places.

Solution

def compute_token_metrics(timestamps: list[float]) -> dict:
    n = len(timestamps)
    if n == 0:
        return {"ttft": 0.0, "itl": 0.0, "tps": 0.0}
    ttft = timestamps[0]
    if n == 1:
        return {"ttft": round(ttft, 4), "itl": 0.0, "tps": round(1.0 / ttft if ttft > 0 else 0.0, 4)}
    total_time = timestamps[-1] - timestamps[0]
    itl = total_time / (n - 1) if n > 1 else 0.0
    total_gen_time = timestamps[-1]
    tps = n / total_gen_time if total_gen_time > 0 else 0.0
    return {
        "ttft": round(ttft, 4),
        "itl": round(itl, 4),
        "tps": round(tps, 4)
    }

Explanation

TTFT is simply the first timestamp, representing the delay before the first token appeared.
ITL is the average gap between consecutive tokens. We compute the total span from the first to the last token and divide by (n - 1) intervals.
TPS divides the total number of tokens by the total elapsed time from the request start (time 0) to the last token.

Complexity

Time: O(1) since we only look at the first and last timestamps
Space: O(1)

← #410 #412 →