TTS Concurrent Real-Time Stream Capacity

#445 · Machine Learning · Medium

Problem

Determine the maximum number of concurrent text-to-speech (TTS) streams a server can sustain in real time. Given per-stream compute cost (time to synthesize one second of audio), the number of available compute units (e.g., GPU cores or inference threads), and a target real-time constraint, compute the maximum concurrent stream count and the aggregate audio throughput.

Solution

def tts_concurrent_capacity(
    synthesis_time_per_sec_audio: float,
    num_compute_units: int,
    target_rtf: float = 1.0
) -> dict:
    if synthesis_time_per_sec_audio <= 0:
        return {"error": "synthesis time must be positive"}

    rtf_per_unit = synthesis_time_per_sec_audio

    streams_per_unit = target_rtf / rtf_per_unit

    max_concurrent = int(streams_per_unit * num_compute_units)

    aggregate_audio_per_sec = max_concurrent * 1.0

    utilization = (max_concurrent * rtf_per_unit) / num_compute_units if num_compute_units > 0 else 0

    return {
        "max_concurrent_streams": max_concurrent,
        "aggregate_audio_sec_per_sec": round(aggregate_audio_per_sec, 2),
        "compute_utilization": round(min(utilization, 1.0), 4),
        "rtf_per_stream": round(rtf_per_unit, 4)
    }

Explanation

The RTF per compute unit is the time to synthesize 1 second of audio. If it takes 0.25s to produce 1s of audio, the RTF is 0.25.
Each compute unit can handle target_rtf / rtf_per_unit streams concurrently while maintaining the real-time target.
Multiply streams per unit by the number of compute units to get the total concurrent capacity.
Aggregate throughput is the total seconds of audio produced per real-time second across all streams.
Compute utilization indicates what fraction of the total compute budget is consumed.

Complexity

Time: O(1)
Space: O(1)

← #444 #446 →