#444 · Machine Learning · Medium
⊣ Solve on deep-ml.comCompute the real-time factor (RTF) for an ASR (Automatic Speech Recognition) system that processes audio by splitting it into overlapping chunks and transcribing them in parallel. Given the total audio duration, chunk size, overlap, number of parallel workers, and per-chunk processing time, calculate the RTF and effective wall-clock time.
def asr_realtime_factor(
audio_duration_sec: float,
chunk_size_sec: float,
overlap_sec: float,
num_workers: int,
per_chunk_process_sec: float
) -> dict:
step = chunk_size_sec - overlap_sec
if step <= 0:
return {"error": "overlap must be less than chunk_size"}
num_chunks = 1
covered = chunk_size_sec
while covered < audio_duration_sec:
covered += step
num_chunks += 1
num_batches = (num_chunks + num_workers - 1) // num_workers
wall_clock_sec = num_batches * per_chunk_process_sec
rtf = wall_clock_sec / audio_duration_sec if audio_duration_sec > 0 else 0.0
return {
"num_chunks": num_chunks,
"num_batches": num_batches,
"wall_clock_sec": round(wall_clock_sec, 4),
"rtf": round(rtf, 4)
}chunk_size - overlap.num_workers parallel processors, the chunks are processed in ceil(num_chunks / num_workers) batches.