#445 · Machine Learning · Medium
⊣ Solve on deep-ml.comDetermine the maximum number of concurrent text-to-speech (TTS) streams a server can sustain in real time. Given per-stream compute cost (time to synthesize one second of audio), the number of available compute units (e.g., GPU cores or inference threads), and a target real-time constraint, compute the maximum concurrent stream count and the aggregate audio throughput.
def tts_concurrent_capacity(
synthesis_time_per_sec_audio: float,
num_compute_units: int,
target_rtf: float = 1.0
) -> dict:
if synthesis_time_per_sec_audio <= 0:
return {"error": "synthesis time must be positive"}
rtf_per_unit = synthesis_time_per_sec_audio
streams_per_unit = target_rtf / rtf_per_unit
max_concurrent = int(streams_per_unit * num_compute_units)
aggregate_audio_per_sec = max_concurrent * 1.0
utilization = (max_concurrent * rtf_per_unit) / num_compute_units if num_compute_units > 0 else 0
return {
"max_concurrent_streams": max_concurrent,
"aggregate_audio_sec_per_sec": round(aggregate_audio_per_sec, 2),
"compute_utilization": round(min(utilization, 1.0), 4),
"rtf_per_stream": round(rtf_per_unit, 4)
}target_rtf / rtf_per_unit streams concurrently while maintaining the real-time target.