Graph

Floyd-Warshall (All-Pairs Shortest Path)

O(V³) DP where for each intermediate node k, update the shortest path between every pair (i, j) through k. Used when all-pairs shortest paths are needed and V is small.

Floyd-Warshall: All-Pairs Shortest Path

When You Need Every Pair

Sometimes you do not just need the shortest path from one source. You need shortest paths between every pair of nodes simultaneously. Given a network of cities, you might want to precompute the travel time between all pairs so that any query can be answered in $O(1)$ . Or you might need to find which city can reach the fewest other cities within a distance threshold. These are all-pairs shortest path problems, and Floyd-Warshall is the classic algorithm for solving them.

Floyd-Warshall vs. Running Dijkstra $V$ Times

An obvious approach to all-pairs shortest paths is to run Dijkstra from every node. With a binary heap, each run takes $O((V + E) \log V)$ , so $V$ runs take $O(V(V + E) \log V)$ . For dense graphs where $E$ is close to $V^2$ , this becomes $O(V^3 \log V)$ .

Floyd-Warshall runs in $O(V^3)$ with no log factor. For dense graphs, it is both simpler and faster. It also handles negative edge weights, which Dijkstra cannot. The trade-off is that Floyd-Warshall always takes $O(V^3)$ regardless of how sparse the graph is. For sparse graphs with non-negative weights, running Dijkstra $V$ times can be faster.

In practice, Floyd-Warshall shines when $V$ is small (up to around 400), the graph may have negative edges, or you want clean, simple code. For interview problems with small node counts asking about all-pairs distances, Floyd-Warshall is usually the right tool.

The Triple-Loop Idea

Floyd-Warshall uses dynamic programming. The key idea is to gradually expand the set of allowed intermediate nodes.

Define dist[i][j] as the shortest path from i to j using only nodes 0 through k-1 as intermediates. Initially (k = 0), no intermediate nodes are allowed, so dist[i][j] is simply the direct edge weight from i to j (or infinity if no direct edge exists, or 0 if i equals j).

When we introduce node k as a potential intermediate, we ask: is it cheaper to go from i to j through k? The path from i to j through k consists of the path from i to k and then from k to j. So we check whether dist[i][k] + dist[k][j] is less than dist[i][j] and update accordingly.

After considering all possible intermediate nodes (k from 0 to V-1), dist[i][j] contains the true shortest path between every pair.

Why k Must Be the Outer Loop

The most common mistake when implementing Floyd-Warshall is putting k in one of the inner loops instead of the outer one. This produces incorrect results.

The reason k must be outermost is that the recurrence depends on the subproblem structure. When we process intermediate node k, we need the distance values to already reflect the optimal paths using intermediates 0 through k-1. If k were an inner loop, we would be updating dist[i][j] for a fixed i across all intermediates in a single pass of i. This mixes up the subproblem boundaries: when we consider intermediate k for pair (i, j), the values dist[i][k] and dist[k][j] might not yet have been computed with all intermediates 0 through k-1 for those pairs.

With k as the outer loop, by the time we reach iteration k, the entire dist matrix correctly represents shortest paths using intermediates 0 through k-1. This guarantees that dist[i][k] and dist[k][j] are correct when we use them to update dist[i][j].

Think of it this way: each iteration of k is a complete "upgrade" of the distance matrix, incorporating one more possible waypoint. You must finish one upgrade before starting the next.

Walking Through a 4-Node Graph

Consider 4 nodes (0, 1, 2, 3) with these directed edges: 0 to 1 with weight 3, 0 to 2 with weight 6, 1 to 2 with weight 2, 2 to 3 with weight 1, 3 to 0 with weight 2, 1 to 3 with weight 8.

Initialize the distance matrix. Use 0 for diagonal entries, edge weights for direct edges, and infinity (shown as INF) otherwise.

Initial matrix:

Row 0: [0, 3, 6, INF]. Direct edges to 1 and 2 exist. Row 1: [INF, 0, 2, 8]. Direct edges to 2 and 3. Row 2: [INF, INF, 0, 1]. Direct edge to 3. Row 3: [2, INF, INF, 0]. Direct edge to 0.

Iteration k = 0 (allow node 0 as intermediate). For each pair (i, j), check if going through 0 is cheaper. The relevant paths through 0 require a finite dist[i][0] and dist[0][j]. Node 3 can reach 0 (dist = 2), so paths from 3 through 0 are considered. dist[3][1] = min(INF, dist[3][0] + dist[0][1]) = min(INF, 2 + 3) = 5. dist[3][2] = min(INF, 2 + 6) = 8.

After k = 0: Row 3 becomes [2, 5, 8, 0]. All other rows unchanged.

Iteration k = 1 (allow node 1 as intermediate). Node 0 can reach 1 (dist = 3). dist[0][2] = min(6, 3 + 2) = 5. dist[0][3] = min(INF, 3 + 8) = 11. Node 3 can reach 1 (dist = 5). dist[3][2] = min(8, 5 + 2) = 7. dist[3][3] is already 0, no change.

After k = 1: Row 0 becomes [0, 3, 5, 11]. Row 3 becomes [2, 5, 7, 0].

Iteration k = 2 (allow node 2 as intermediate). Paths through 2 use dist[i][2] + dist[2][j]. dist[2][3] = 1, so any node that can reach 2 can potentially reach 3 more cheaply. dist[0][3] = min(11, 5 + 1) = 6. dist[1][3] = min(8, 2 + 1) = 3. dist[3][3] = min(0, 7 + 1) = 0, no change.

After k = 2: Row 0 becomes [0, 3, 5, 6]. Row 1 becomes [INF, 0, 2, 3].

Iteration k = 3 (allow node 3 as intermediate). dist[3][0] = 2, so paths through 3 back to 0 are considered. dist[1][0] = min(INF, 3 + 2) = 5. dist[2][0] = min(INF, 1 + 2) = 3. dist[2][1] = min(INF, 1 + 5) = 6. Checking others: dist[0][0] = min(0, 6 + 2) = 0, no change.

Final matrix:

Row 0: [0, 3, 5, 6]. Row 1: [5, 0, 2, 3]. Row 2: [3, 6, 0, 1]. Row 3: [2, 5, 7, 0].

Every entry dist[i][j] now holds the shortest distance from node i to node j. For example, the shortest path from 1 to 0 is 1 -> 2 -> 3 -> 0 with cost 2 + 1 + 2 = 5.

Code: Floyd-Warshall

def floyd_warshall(n, edges):
    INF = float('inf')
    dist = [[INF] * n for _ in range(n)]

    # Distance from every node to itself is 0
    for i in range(n):
        dist[i][i] = 0

    # Initialize direct edge weights
    for u, v, w in edges:
        dist[u][v] = min(dist[u][v], w)  # handle parallel edges

    # k MUST be the outer loop
    for k in range(n):
        for i in range(n):
            if dist[i][k] == INF:
                continue  # optimization: skip if i can't reach k
            for j in range(n):
                if dist[i][k] + dist[k][j] < dist[i][j]:
                    dist[i][j] = dist[i][k] + dist[k][j]

    return dist

The early continue when dist[i][k] is infinity is a practical optimization. If node i cannot reach intermediate node k, then routing through k cannot help for any destination j. This does not change the asymptotic complexity but avoids unnecessary additions and comparisons.

Note the min when initializing edge weights. If there are multiple edges between the same pair of nodes (parallel edges), we only want the lightest one.

Handling Negative Edges

Unlike Dijkstra, Floyd-Warshall correctly handles negative edge weights. The algorithm considers all possible intermediate paths and does not rely on the greedy assumption that closer nodes are finalized first. It will discover that a path through a negative edge offers a shorter route regardless of the order in which intermediates are considered.

However, Floyd-Warshall does not handle negative cycles, cycles whose total weight is negative. If a negative cycle exists, you can loop around it indefinitely and drive the distance to negative infinity. The algorithm will not detect this automatically during its main loop.

Detecting Negative Cycles

After running Floyd-Warshall, check the diagonal of the distance matrix. If dist[i][i] is negative for any node i, then node i lies on a negative cycle. In a graph without negative cycles, the shortest path from any node to itself is 0 (just stay put). A negative diagonal entry means the algorithm found a path from i back to i with negative total weight, which is exactly a negative cycle.

1

2

3

4

5

def has_negative_cycle(dist):
    for i in range(len(dist)):
        if dist[i][i] < 0:
            return True
    return False

If you need to identify which pairs of nodes are affected by negative cycles (i.e., have effectively negative-infinity distances), you can do a second pass: if dist[i][k] is finite and dist[k][j] is finite and dist[k][k] < 0, then the shortest path from i to j is negative infinity because you can detour through the negative cycle at k as many times as you like.

Transitive Closure Variant

A related problem is transitive closure: given a directed graph, determine for every pair (i, j) whether there exists a path from i to j (of any length). This is a reachability problem rather than a shortest-path problem.

Floyd-Warshall adapts to this by replacing arithmetic with logic. Instead of a distance matrix, maintain a boolean matrix reach[i][j] indicating whether i can reach j. Initialize it to True where a direct edge exists and on the diagonal. The update becomes: reach[i][j] = reach[i][j] OR (reach[i][k] AND reach[k][j]).

def transitive_closure(n, edges):
    reach = [[False] * n for _ in range(n)]
    for i in range(n):
        reach[i][i] = True
    for u, v in edges:
        reach[u][v] = True

    for k in range(n):
        for i in range(n):
            if not reach[i][k]:
                continue
            for j in range(n):
                if reach[k][j]:
                    reach[i][j] = True

    return reach

The structure is identical to Floyd-Warshall. The outer loop over k is still essential for correctness, for exactly the same reason as before.

Application: Evaluate Division

LeetCode 399 gives you a list of equations like a / b = 2.0, b / c = 3.0, and asked to evaluate queries like a / c or c / a. This is elegantly modeled as a weighted graph problem.

Each variable is a node. An equation a / b = v creates a directed edge from a to b with weight v, and a reverse edge from b to a with weight 1/v. The query a / c asks for the product of weights along a path from a to c in this graph.

Floyd-Warshall fits naturally, but with multiplication instead of addition. Initialize dist[a][b] = v for each equation. The update becomes dist[i][j] = dist[i][k] * dist[k][j] if both dist[i][k] and dist[k][j] are known. After running the algorithm, dist[a][c] directly gives the answer to a / c.

def calcEquation(equations, values, queries):
    # Map variables to indices
    variables = set()
    for a, b in equations:
        variables.add(a)
        variables.add(b)
    idx = {v: i for i, v in enumerate(variables)}
    n = len(variables)

    # Initialize distance matrix
    dist = [[float('inf')] * n for _ in range(n)]
    for i in range(n):
        dist[i][i] = 1.0  # a / a = 1

    for (a, b), v in zip(equations, values):
        dist[idx[a]][idx[b]] = v
        dist[idx[b]][idx[a]] = 1.0 / v

    # Floyd-Warshall: propagate ratios through intermediates
    for k in range(n):
        for i in range(n):
            if dist[i][k] == float('inf'):
                continue
            for j in range(n):
                if dist[k][j] == float('inf'):
                    continue
                if dist[i][j] == float('inf'):
                    dist[i][j] = dist[i][k] * dist[k][j]

    # Answer queries
    results = []
    for a, b in queries:
        if a not in idx or b not in idx:
            results.append(-1.0)
        elif dist[idx[a]][idx[b]] == float('inf'):
            results.append(-1.0)
        else:
            results.append(dist[idx[a]][idx[b]])

    return results

In practice, this problem is more commonly solved with union-find or DFS/BFS since the number of variables is small and we only need specific pairs, not all pairs. But it illustrates how Floyd-Warshall generalizes: any operation that is associative and has an identity element can replace addition, and any compatible comparison can replace min.

Complexity Analysis

Floyd-Warshall has three nested loops, each iterating $V$ times, giving $O(V^3)$ time complexity. This is independent of the number of edges. Space complexity is $O(V^2)$ for the distance matrix.

The cubic time makes Floyd-Warshall practical only for small graphs, typically up to around 400 nodes. Beyond that, the $V^3$ term becomes prohibitive. For larger graphs, running Dijkstra from each node or using Johnson's algorithm (which combines Bellman-Ford reweighting with Dijkstra) is preferable.

Despite its simplicity, Floyd-Warshall is remarkably versatile. By changing the "relax" operation, it solves transitive closure, widest path (maximum bottleneck), and multiplicative path problems with the same elegant triple loop.

Recognizing this pattern

You're probably looking at Floyd-Warshall when:

The problem wants shortest distances between every pair of nodes, not just one source
The vertex count is small enough that $V^3$ is acceptable, typically V is at most a few hundred
You need transitive closure: which nodes can reach which other nodes
The problem asks for the maximum bottleneck path, multiplicative path, or another min-plus-style relaxation across all pairs

Common templates:

All-pairs shortest distance: initialize the matrix from edges, run the triple loop with $\min(d[i][j], d[i][k] + d[k][j])$ . Example: Find the City With the Smallest Number of Neighbors at a Threshold Distance.
Transitive closure: replace addition with logical AND and minimum with OR to compute reachability between every pair. Example: Course Schedule IV.
Maximum-probability path: swap min for max and addition for multiplication, useful when edges represent probabilities or reliabilities. Example: Path with Maximum Probability.
Dense-graph rerouting: precompute all-pairs distances, then answer many queries in $O(1)$ per query. Example: Count Subtrees With Max Distance Between Cities.