Multi-Instance GPU (MIG) Resource Allocation

#423 · Inference · Medium

Problem

NVIDIA Multi-Instance GPU (MIG) allows partitioning a single GPU into multiple isolated instances. Given a GPU's total compute units and memory, and a set of requested MIG profiles, determine if the allocation is feasible and compute the resource utilization.

Solution

def mig_allocation(
    total_sms: int,
    total_memory_gb: float,
    profiles: list[dict]
) -> dict:
    # Each profile: {"name": str, "sms": int, "memory_gb": float, "count": int}
    total_sms_requested = sum(p["sms"] * p["count"] for p in profiles)
    total_mem_requested = sum(p["memory_gb"] * p["count"] for p in profiles)

    feasible = total_sms_requested <= total_sms and total_mem_requested <= total_memory_gb

    instances = []
    for p in profiles:
        for i in range(p["count"]):
            instances.append({
                "name": f"{p['name']}-{i}",
                "sms": p["sms"],
                "memory_gb": p["memory_gb"]
            })

    sm_utilization = total_sms_requested / total_sms * 100 if total_sms > 0 else 0
    mem_utilization = total_mem_requested / total_memory_gb * 100 if total_memory_gb > 0 else 0

    return {
        "feasible": feasible,
        "instances": instances,
        "total_instances": len(instances),
        "sm_utilization_pct": round(sm_utilization, 2),
        "memory_utilization_pct": round(mem_utilization, 2),
        "sms_remaining": total_sms - total_sms_requested,
        "memory_remaining_gb": round(total_memory_gb - total_mem_requested, 2)
    }

Explanation

MIG partitions a GPU at the hardware level into isolated instances, each with a dedicated fraction of SMs (streaming multiprocessors) and memory.
Common A100 MIG profiles: 1g.5gb (1/7 GPU), 2g.10gb (2/7), 3g.20gb (3/7), 4g.20gb (4/7), 7g.40gb (full).
The allocation checks that total requested SMs and memory do not exceed the GPU's capacity.
Each instance runs in full isolation with guaranteed resources and separate failure domains.
Utilization percentages show how much of the GPU's resources are allocated versus left idle.

Complexity

Time: O(n) where n is the total number of instances created
Space: O(n)

← #422 #424 →