← back

Autoscaling Replica Simulator with SLA Tracking

#449 · Machine Learning · Medium

⊣ Solve on deep-ml.com

Problem

Build an autoscaling replica simulator with SLA tracking. Given a time series of request rates, a per-replica throughput capacity, scale-up and scale-down thresholds (utilization-based), scaling cooldown periods, and an SLA latency target, simulate the autoscaler and report per-interval replica counts, utilization, and SLA violations.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def autoscaling_simulator(
    request_rates: list[float],
    replica_capacity: float,
    scale_up_threshold: float,
    scale_down_threshold: float,
    cooldown_steps: int,
    min_replicas: int = 1,
    max_replicas: int = 100
) -> dict:
    replicas = min_replicas
    cooldown_remaining = 0
    history = []
    sla_violations = 0

    for t, rate in enumerate(request_rates):
        total_capacity = replicas * replica_capacity
        utilization = rate / total_capacity if total_capacity > 0 else 1.0

        violated = rate > total_capacity

        if violated:
            sla_violations += 1
            dropped = rate - total_capacity
        else:
            dropped = 0

        history.append({
            "step": t,
            "request_rate": rate,
            "replicas": replicas,
            "utilization": round(min(utilization, 1.0), 4),
            "sla_violated": violated,
            "dropped_requests": round(dropped, 2)
        })

        if cooldown_remaining > 0:
            cooldown_remaining -= 1
            continue

        if utilization > scale_up_threshold:
            needed = int(rate / (replica_capacity * scale_up_threshold)) + 1
            new_replicas = min(needed, max_replicas)
            if new_replicas > replicas:
                replicas = new_replicas
                cooldown_remaining = cooldown_steps
        elif utilization < scale_down_threshold:
            needed = max(int(rate / (replica_capacity * scale_up_threshold)) + 1, min_replicas)
            new_replicas = max(needed, min_replicas)
            if new_replicas < replicas:
                replicas = new_replicas
                cooldown_remaining = cooldown_steps

    total_intervals = len(request_rates)
    violation_rate = sla_violations / total_intervals if total_intervals > 0 else 0.0

    return {
        "history": history,
        "total_sla_violations": sla_violations,
        "violation_rate": round(violation_rate, 4)
    }

Explanation

  1. At each time step, compute utilization as request_rate / (replicas * replica_capacity).
  2. If utilization exceeds the scale-up threshold, calculate the number of replicas needed to bring utilization back below threshold and scale up.
  3. If utilization drops below the scale-down threshold, reduce replicas to match demand while respecting the minimum.
  4. After any scaling event, enter a cooldown period where no further scaling occurs to prevent oscillation.
  5. An SLA violation is recorded whenever the request rate exceeds total capacity. The violation rate summarizes reliability over the simulation.

Complexity

  • Time: O(T) where T is the number of time intervals
  • Space: O(T) for the history