Embedding Quantization Quality via Cosine Similarity

#443 · Machine Learning · Medium

Problem

Measure the quality of embedding quantization by computing the cosine similarity between original full-precision embeddings and their quantized-then-dequantized versions. Given a batch of embedding vectors, a target bit-width, and a quantization scheme (symmetric uniform), return the per-vector and average cosine similarity.

Solution

def embedding_quantization_quality(
    embeddings: list[list[float]],
    bit_width: int
) -> dict:
    import math

    def dot(a: list[float], b: list[float]) -> float:
        return sum(x * y for x, y in zip(a, b))

    def norm(a: list[float]) -> float:
        return math.sqrt(sum(x * x for x in a))

    def cosine_sim(a: list[float], b: list[float]) -> float:
        n_a = norm(a)
        n_b = norm(b)
        if n_a == 0 or n_b == 0:
            return 0.0
        return dot(a, b) / (n_a * n_b)

    def quantize_dequantize(vec: list[float], bits: int) -> list[float]:
        if not vec:
            return vec
        max_abs = max(abs(x) for x in vec)
        if max_abs == 0:
            return vec[:]
        qmax = (1 << (bits - 1)) - 1
        scale = max_abs / qmax
        quantized = [max(-qmax, min(qmax, round(x / scale))) for x in vec]
        return [q * scale for q in quantized]

    similarities = []
    for emb in embeddings:
        dequant = quantize_dequantize(emb, bit_width)
        sim = cosine_sim(emb, dequant)
        similarities.append(round(sim, 6))

    avg_sim = sum(similarities) / len(similarities) if similarities else 0.0

    return {
        "per_vector_similarity": similarities,
        "average_similarity": round(avg_sim, 6)
    }

Explanation

For symmetric uniform quantization, find the maximum absolute value in each vector and compute a scale factor: scale = max_abs / (2^(bits-1) - 1).
Quantize each element by dividing by the scale and rounding to the nearest integer, clamping to the representable range.
Dequantize by multiplying the integer values back by the scale factor.
Compute cosine similarity between the original and dequantized vectors: dot(a, b) / (||a|| * ||b||).
Higher bit-widths yield similarities closer to 1.0; this metric reveals how much directional information is preserved.

Complexity

Time: O(n * d) where n is the number of embeddings and d is the embedding dimension
Space: O(d) for the dequantized vector (reusable per embedding)

← #442 #444 →