← back

Multi-Instance GPU (MIG) Resource Allocation

#423 · Inference · Medium

⊣ Solve on deep-ml.com

Problem

NVIDIA Multi-Instance GPU (MIG) allows partitioning a single GPU into multiple isolated instances. Given a GPU's total compute units and memory, and a set of requested MIG profiles, determine if the allocation is feasible and compute the resource utilization.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def mig_allocation(
    total_sms: int,
    total_memory_gb: float,
    profiles: list[dict]
) -> dict:
    # Each profile: {"name": str, "sms": int, "memory_gb": float, "count": int}
    total_sms_requested = sum(p["sms"] * p["count"] for p in profiles)
    total_mem_requested = sum(p["memory_gb"] * p["count"] for p in profiles)

    feasible = total_sms_requested <= total_sms and total_mem_requested <= total_memory_gb

    instances = []
    for p in profiles:
        for i in range(p["count"]):
            instances.append({
                "name": f"{p['name']}-{i}",
                "sms": p["sms"],
                "memory_gb": p["memory_gb"]
            })

    sm_utilization = total_sms_requested / total_sms * 100 if total_sms > 0 else 0
    mem_utilization = total_mem_requested / total_memory_gb * 100 if total_memory_gb > 0 else 0

    return {
        "feasible": feasible,
        "instances": instances,
        "total_instances": len(instances),
        "sm_utilization_pct": round(sm_utilization, 2),
        "memory_utilization_pct": round(mem_utilization, 2),
        "sms_remaining": total_sms - total_sms_requested,
        "memory_remaining_gb": round(total_memory_gb - total_mem_requested, 2)
    }

Explanation

  1. MIG partitions a GPU at the hardware level into isolated instances, each with a dedicated fraction of SMs (streaming multiprocessors) and memory.
  2. Common A100 MIG profiles: 1g.5gb (1/7 GPU), 2g.10gb (2/7), 3g.20gb (3/7), 4g.20gb (4/7), 7g.40gb (full).
  3. The allocation checks that total requested SMs and memory do not exceed the GPU's capacity.
  4. Each instance runs in full isolation with guaranteed resources and separate failure domains.
  5. Utilization percentages show how much of the GPU's resources are allocated versus left idle.

Complexity

  • Time: O(n) where n is the total number of instances created
  • Space: O(n)