Graph

Topological Sort

Order nodes such that all dependencies come before dependents. Use Kahn's algorithm (BFS with in-degree counts) or DFS with a finished stack. Used for course scheduling, build order, and alien dictionary.

Topological Sort

Course Schedule

You are given n courses numbered 0 to n-1 and a list of prerequisites. Each prerequisite is a pair [a, b] meaning you must complete course b before course a. The question: is it possible to finish all the courses? And if so, in what order should you take them?

This is the Course Schedule problem (LeetCode 207 and 210), and it is the canonical introduction to topological sorting. The courses are nodes in a directed graph, and each prerequisite is a directed edge from the required course to the dependent one. Finishing all courses is possible if and only if there is no cycle in this graph. And the valid ordering is a topological sort of the graph.

What Topological Sort Means

A topological sort of a directed graph is a linear ordering of its vertices such that for every directed edge u to v, vertex u appears before v in the ordering. In the course analogy, every prerequisite course comes before the course that depends on it.

Not every directed graph has a topological ordering. If the graph contains a cycle, say A requires B, B requires C, and C requires A, then there is no way to order them linearly. You would need to take A before C, C before B, and B before A, which is a contradiction. Topological sort only works on directed acyclic graphs (DAGs).

When the graph is a DAG, there is always at least one valid topological ordering, and there may be many. The set of valid orderings depends on the structure of the graph.

Kahn's Algorithm (BFS-Based)

The most intuitive approach to topological sorting is Kahn's algorithm. The idea is to repeatedly find nodes that have no remaining prerequisites (in-degree 0), process them, and remove their outgoing edges. This peels the graph layer by layer from the "leaves" inward.

Here is the procedure:

Compute the in-degree of every node (how many edges point to it).
Add all nodes with in-degree 0 to a queue. These have no prerequisites and can be taken first.
While the queue is not empty, dequeue a node, add it to the result, and for each of its neighbors, decrement their in-degree by 1.
Whenever a neighbor's in-degree drops to 0, add it to the queue.
When the queue is empty, check whether the result contains all n nodes. If yes, the result is a valid topological ordering. If not, the graph has a cycle.

Walking Through a Concrete Example

Suppose there are 6 courses (0 through 5) with these prerequisites: to take course 2 you need course 0, to take course 3 you need courses 0 and 1, to take course 4 you need course 2, and to take course 5 you need courses 3 and 4.

The edges are: 0 -> 2, 0 -> 3, 1 -> 3, 2 -> 4, 3 -> 5, 4 -> 5.

We compute in-degrees. Course 0 has in-degree 0 (no prerequisites). Course 1 has in-degree 0. Course 2 has in-degree 1 (depends on 0). Course 3 has in-degree 2 (depends on 0 and 1). Course 4 has in-degree 1 (depends on 2). Course 5 has in-degree 2 (depends on 3 and 4).

The queue starts with [0, 1] since those have in-degree 0.

We dequeue 0 and add it to the result. We process its neighbors: course 2's in-degree drops from 1 to 0 (add to queue), course 3's in-degree drops from 2 to 1. Queue is now [1, 2]. Result: [0].

We dequeue 1 and add it to the result. Its neighbor is course 3, whose in-degree drops from 1 to 0 (add to queue). Queue: [2, 3]. Result: [0, 1].

We dequeue 2 and add it to the result. Its neighbor is course 4, whose in-degree drops from 1 to 0 (add to queue). Queue: [3, 4]. Result: [0, 1, 2].

We dequeue 3 and add it to the result. Its neighbor is course 5, whose in-degree drops from 2 to 1. Queue: [4]. Result: [0, 1, 2, 3].

We dequeue 4 and add it to the result. Its neighbor is course 5, whose in-degree drops from 1 to 0 (add to queue). Queue: [5]. Result: [0, 1, 2, 3, 4].

We dequeue 5 and add it to the result. It has no neighbors. Queue is empty. Result: [0, 1, 2, 3, 4, 5].

The result contains all 6 nodes, so this is a valid topological ordering. You take the courses in this order, and every prerequisite is satisfied before the course that needs it.

Code: Topological Sort (Kahn's Algorithm)

from collections import deque, defaultdict

def topological_sort(num_nodes, edges):
    graph = defaultdict(list)
    in_degree = [0] * num_nodes

    for u, v in edges:
        graph[u].append(v)
        in_degree[v] += 1

    queue = deque()
    for i in range(num_nodes):
        if in_degree[i] == 0:
            queue.append(i)

    order = []
    while queue:
        node = queue.popleft()
        order.append(node)
        for neighbor in graph[node]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    if len(order) == num_nodes:
        return order
    else:
        return []  # cycle detected

Each edge is a pair (u, v) meaning u must come before v. We build an adjacency list and track in-degrees. The queue seeds with all zero-in-degree nodes. As we process each node, we "remove" its outgoing edges by decrementing neighbors' in-degrees. If all nodes end up in the result, we have a valid ordering.

Cycle Detection

The elegance of Kahn's algorithm is that cycle detection comes for free. If the graph has a cycle, the nodes in the cycle will never reach in-degree 0. They keep pointing at each other, so none of them ever enters the queue. When the algorithm finishes, the result will contain fewer nodes than the total. The difference tells you a cycle exists.

For the Course Schedule feasibility problem (can you finish all courses?), you simply run Kahn's algorithm and check whether the length of the result equals the number of courses.

1

2

3

4

def can_finish(num_courses, prerequisites):
    # prerequisites[i] = [a, b] means b -> a (must take b before a)
    order = topological_sort(num_courses, [(b, a) for a, b in prerequisites])
    return len(order) == num_courses

DFS-Based Topological Sort

There is an alternative approach using depth-first search. The insight is that in a DFS, a node finishes (all its descendants have been fully explored) before any of its ancestors finish. If you record nodes in the order they finish and then reverse that list, you get a valid topological ordering.

def topological_sort_dfs(num_nodes, edges):
    graph = defaultdict(list)
    for u, v in edges:
        graph[u].append(v)

    WHITE, GRAY, BLACK = 0, 1, 2
    color = [WHITE] * num_nodes
    order = []
    has_cycle = False

    def dfs(node):
        nonlocal has_cycle
        if has_cycle:
            return
        color[node] = GRAY
        for neighbor in graph[node]:
            if color[neighbor] == GRAY:
                has_cycle = True
                return
            if color[neighbor] == WHITE:
                dfs(neighbor)
        color[node] = BLACK
        order.append(node)

    for i in range(num_nodes):
        if color[i] == WHITE:
            dfs(i)

    if has_cycle:
        return []
    order.reverse()
    return order

The three-color scheme (white for unvisited, gray for in-progress, black for finished) handles cycle detection. If during DFS you encounter a gray node, you have found a back edge, which means a cycle exists. After all DFS calls complete, the reversed finish-order list is the topological sort.

Kahn's vs DFS: When to Use Which

Kahn's algorithm is typically the better default choice. It is more intuitive, it naturally detects cycles, and the BFS-based processing is easy to reason about. It also adapts well to variations like finding the lexicographically smallest ordering (replace the queue with a min-heap).

The DFS approach is useful when you are already performing DFS for another reason, such as computing strongly connected components, or when the problem naturally lends itself to recursive exploration. Some people find the DFS approach more elegant, but it requires the three-color scheme to detect cycles, which adds complexity.

Course Schedule II: Return the Ordering

Course Schedule II (LeetCode 210) asks you to return one valid ordering rather than just checking feasibility. The solution is exactly Kahn's algorithm: run it, and if the result contains all courses, return the result. If not, return an empty list to indicate that no valid ordering exists.

def find_order(num_courses, prerequisites):
    graph = defaultdict(list)
    in_degree = [0] * num_courses

    for course, prereq in prerequisites:
        graph[prereq].append(course)
        in_degree[course] += 1

    queue = deque(i for i in range(num_courses) if in_degree[i] == 0)
    order = []

    while queue:
        node = queue.popleft()
        order.append(node)
        for neighbor in graph[node]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    return order if len(order) == num_courses else []

The only difference from the feasibility check is that we return the ordering itself instead of a boolean.

Alien Dictionary

LeetCode 269 presents a list of words sorted in an alien language's lexicographic order. You must derive the ordering of characters in that language. This is a topological sort problem in disguise.

The key observation is that you can extract ordering constraints by comparing adjacent words in the list. For two adjacent words, find the first position where they differ. If word1[i] is 'a' and word2[i] is 'c', then 'a' comes before 'c' in the alien alphabet. That gives you a directed edge from 'a' to 'c'.

After extracting all such edges from every pair of adjacent words, you have a directed graph on characters. A topological sort of this graph gives you the alien alphabet ordering. If the graph has a cycle, the input is inconsistent and no valid ordering exists.

One edge case to watch: if a longer word appears before its prefix (for example, "abc" before "ab"), the input is invalid because in any lexicographic order, a prefix always comes first.

def alien_order(words):
    graph = defaultdict(set)
    in_degree = {c: 0 for word in words for c in word}

    for i in range(len(words) - 1):
        w1, w2 = words[i], words[i + 1]
        min_len = min(len(w1), len(w2))
        if len(w1) > len(w2) and w1[:min_len] == w2[:min_len]:
            return ""  # invalid: longer word before its prefix
        for j in range(min_len):
            if w1[j] != w2[j]:
                if w2[j] not in graph[w1[j]]:
                    graph[w1[j]].add(w2[j])
                    in_degree[w2[j]] += 1
                break

    queue = deque(c for c in in_degree if in_degree[c] == 0)
    result = []

    while queue:
        c = queue.popleft()
        result.append(c)
        for neighbor in graph[c]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    if len(result) == len(in_degree):
        return "".join(result)
    return ""  # cycle detected

We use a set for each node's neighbors to avoid duplicate edges when the same character pair appears in multiple adjacent word comparisons. The rest is standard Kahn's algorithm.

Lexicographically Smallest Ordering

Some problems ask for the lexicographically smallest valid topological ordering. The modification to Kahn's algorithm is elegant: replace the FIFO queue with a min-heap (priority queue). Instead of processing zero-in-degree nodes in the order they were discovered, you always process the smallest one first.

import heapq

def smallest_topological_order(num_nodes, edges):
    graph = defaultdict(list)
    in_degree = [0] * num_nodes

    for u, v in edges:
        graph[u].append(v)
        in_degree[v] += 1

    heap = [i for i in range(num_nodes) if in_degree[i] == 0]
    heapq.heapify(heap)
    order = []

    while heap:
        node = heapq.heappop(heap)
        order.append(node)
        for neighbor in graph[node]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                heapq.heappush(heap, neighbor)

    return order if len(order) == num_nodes else []

This increases the time complexity from $O(V + E)$ to $O(V \log V + E)$ because of the heap operations, but it guarantees that among all valid topological orderings, you get the lexicographically smallest one. The DFS-based approach cannot easily produce this variant, which is another advantage of Kahn's algorithm.

Complexity Analysis

Both Kahn's algorithm and the DFS-based approach run in $O(V + E)$ time, where V is the number of nodes and E is the number of edges. Each node is processed exactly once (dequeued in Kahn's, finished in DFS), and each edge is examined exactly once (when decrementing in-degrees or exploring neighbors).

The space complexity is $O(V + E)$ for storing the graph as an adjacency list, plus $O(V)$ for the in-degree array (Kahn's) or the color array (DFS). The queue or recursion stack holds at most V nodes.

For the Course Schedule problems, V is the number of courses and E is the number of prerequisites. For the Alien Dictionary, V is the number of distinct characters and E is the number of ordering constraints derived from adjacent words. In both cases, the algorithm is linear in the size of the input, which is optimal since you must at least read every edge to determine the ordering.

Recognizing this pattern

You're probably looking at topological sort when:

The problem mentions prerequisites, dependencies, or "do X before Y" constraints on a set of items
You need to produce an ordering of tasks, courses, or build steps that respects directed edges
The input is a DAG (or you need to detect whether a cycle prevents a valid ordering)
The problem hints at ordering characters or symbols from inequality constraints, like an alien alphabet

Common templates:

Kahn's BFS with in-degree queue: repeatedly pop nodes whose in-degree is zero and decrement neighbors. Example: Course Schedule II.
DFS with three-color cycle detection: color nodes white, gray, black to detect back edges while pushing finished nodes onto a stack. Example: Course Schedule.
Constraint extraction then topo sort: derive edges from pairwise relations, then run a standard topo sort. Example: Alien Dictionary.
Layered topo sort for parallel scheduling: process all in-degree-zero nodes at once to compute level counts or minimum rounds. Example: Parallel Courses.

Practice Problems (18)

⊣ 207 Course ScheduleMedium

⊣ 210 Course Schedule IIMedium

⊣ 310 Minimum Height TreesMedium

⊣ 329 Longest Increasing Path in a MatrixHard

⊣ 802 Find Eventual Safe StatesMedium

⊣ 851 Loud and RichMedium

⊣ 913 Cat and MouseHard

⊣ 1203 Sort Items by Groups Respecting DependenciesHard

⊣ 1462 Course Schedule IVMedium

⊣ 1591 Strange Printer IIHard

⊣ 1857 Largest Color Value in a Directed GraphHard

⊣ 2050 Parallel Courses IIIHard

⊣ 2115 Find All Possible Recipes from Given SuppliesMedium

⊣ 2127 Maximum Employees to Be Invited to a MeetingHard

⊣ 2192 All Ancestors of a Node in a Directed Acyclic GraphMedium

⊣ 2392 Build a Matrix With ConditionsHard

⊣ 2603 Collect Coins in a TreeHard

⊣ 3435 Frequencies of Shortest SupersequencesHard