Graph
Order nodes such that all dependencies come before dependents. Use Kahn's algorithm (BFS with in-degree counts) or DFS with a finished stack. Used for course scheduling, build order, and alien dictionary.
You are given n courses numbered 0 to n-1 and a list of prerequisites. Each prerequisite is a pair [a, b] meaning you must complete course b before course a. The question: is it possible to finish all the courses? And if so, in what order should you take them?
This is the Course Schedule problem (LeetCode 207 and 210), and it is the canonical introduction to topological sorting. The courses are nodes in a directed graph, and each prerequisite is a directed edge from the required course to the dependent one. Finishing all courses is possible if and only if there is no cycle in this graph. And the valid ordering is a topological sort of the graph.
A topological sort of a directed graph is a linear ordering of its vertices such that for every directed edge u to v, vertex u appears before v in the ordering. In the course analogy, every prerequisite course comes before the course that depends on it.
Not every directed graph has a topological ordering. If the graph contains a cycle, say A requires B, B requires C, and C requires A, then there is no way to order them linearly. You would need to take A before C, C before B, and B before A, which is a contradiction. Topological sort only works on directed acyclic graphs (DAGs).
When the graph is a DAG, there is always at least one valid topological ordering, and there may be many. The set of valid orderings depends on the structure of the graph.
The most intuitive approach to topological sorting is Kahn's algorithm. The idea is to repeatedly find nodes that have no remaining prerequisites (in-degree 0), process them, and remove their outgoing edges. This peels the graph layer by layer from the "leaves" inward.
Here is the procedure:
Suppose there are 6 courses (0 through 5) with these prerequisites: to take course 2 you need course 0, to take course 3 you need courses 0 and 1, to take course 4 you need course 2, and to take course 5 you need courses 3 and 4.
The edges are: 0 -> 2, 0 -> 3, 1 -> 3, 2 -> 4, 3 -> 5, 4 -> 5.
We compute in-degrees. Course 0 has in-degree 0 (no prerequisites). Course 1 has in-degree 0. Course 2 has in-degree 1 (depends on 0). Course 3 has in-degree 2 (depends on 0 and 1). Course 4 has in-degree 1 (depends on 2). Course 5 has in-degree 2 (depends on 3 and 4).
The queue starts with [0, 1] since those have in-degree 0.
We dequeue 0 and add it to the result. We process its neighbors: course 2's in-degree drops from 1 to 0 (add to queue), course 3's in-degree drops from 2 to 1. Queue is now [1, 2]. Result: [0].
We dequeue 1 and add it to the result. Its neighbor is course 3, whose in-degree drops from 1 to 0 (add to queue). Queue: [2, 3]. Result: [0, 1].
We dequeue 2 and add it to the result. Its neighbor is course 4, whose in-degree drops from 1 to 0 (add to queue). Queue: [3, 4]. Result: [0, 1, 2].
We dequeue 3 and add it to the result. Its neighbor is course 5, whose in-degree drops from 2 to 1. Queue: [4]. Result: [0, 1, 2, 3].
We dequeue 4 and add it to the result. Its neighbor is course 5, whose in-degree drops from 1 to 0 (add to queue). Queue: [5]. Result: [0, 1, 2, 3, 4].
We dequeue 5 and add it to the result. It has no neighbors. Queue is empty. Result: [0, 1, 2, 3, 4, 5].
The result contains all 6 nodes, so this is a valid topological ordering. You take the courses in this order, and every prerequisite is satisfied before the course that needs it.
from collections import deque, defaultdict
def topological_sort(num_nodes, edges):
graph = defaultdict(list)
in_degree = [0] * num_nodes
for u, v in edges:
graph[u].append(v)
in_degree[v] += 1
queue = deque()
for i in range(num_nodes):
if in_degree[i] == 0:
queue.append(i)
order = []
while queue:
node = queue.popleft()
order.append(node)
for neighbor in graph[node]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor)
if len(order) == num_nodes:
return order
else:
return [] # cycle detectedEach edge is a pair (u, v) meaning u must come before v. We build an adjacency list and track in-degrees. The queue seeds with all zero-in-degree nodes. As we process each node, we "remove" its outgoing edges by decrementing neighbors' in-degrees. If all nodes end up in the result, we have a valid ordering.
The elegance of Kahn's algorithm is that cycle detection comes for free. If the graph has a cycle, the nodes in the cycle will never reach in-degree 0. They keep pointing at each other, so none of them ever enters the queue. When the algorithm finishes, the result will contain fewer nodes than the total. The difference tells you a cycle exists.
For the Course Schedule feasibility problem (can you finish all courses?), you simply run Kahn's algorithm and check whether the length of the result equals the number of courses.
def can_finish(num_courses, prerequisites):
# prerequisites[i] = [a, b] means b -> a (must take b before a)
order = topological_sort(num_courses, [(b, a) for a, b in prerequisites])
return len(order) == num_coursesThere is an alternative approach using depth-first search. The insight is that in a DFS, a node finishes (all its descendants have been fully explored) before any of its ancestors finish. If you record nodes in the order they finish and then reverse that list, you get a valid topological ordering.
def topological_sort_dfs(num_nodes, edges):
graph = defaultdict(list)
for u, v in edges:
graph[u].append(v)
WHITE, GRAY, BLACK = 0, 1, 2
color = [WHITE] * num_nodes
order = []
has_cycle = False
def dfs(node):
nonlocal has_cycle
if has_cycle:
return
color[node] = GRAY
for neighbor in graph[node]:
if color[neighbor] == GRAY:
has_cycle = True
return
if color[neighbor] == WHITE:
dfs(neighbor)
color[node] = BLACK
order.append(node)
for i in range(num_nodes):
if color[i] == WHITE:
dfs(i)
if has_cycle:
return []
order.reverse()
return orderThe three-color scheme (white for unvisited, gray for in-progress, black for finished) handles cycle detection. If during DFS you encounter a gray node, you have found a back edge, which means a cycle exists. After all DFS calls complete, the reversed finish-order list is the topological sort.
Kahn's algorithm is typically the better default choice. It is more intuitive, it naturally detects cycles, and the BFS-based processing is easy to reason about. It also adapts well to variations like finding the lexicographically smallest ordering (replace the queue with a min-heap).
The DFS approach is useful when you are already performing DFS for another reason, such as computing strongly connected components, or when the problem naturally lends itself to recursive exploration. Some people find the DFS approach more elegant, but it requires the three-color scheme to detect cycles, which adds complexity.
Course Schedule II (LeetCode 210) asks you to return one valid ordering rather than just checking feasibility. The solution is exactly Kahn's algorithm: run it, and if the result contains all courses, return the result. If not, return an empty list to indicate that no valid ordering exists.
def find_order(num_courses, prerequisites):
graph = defaultdict(list)
in_degree = [0] * num_courses
for course, prereq in prerequisites:
graph[prereq].append(course)
in_degree[course] += 1
queue = deque(i for i in range(num_courses) if in_degree[i] == 0)
order = []
while queue:
node = queue.popleft()
order.append(node)
for neighbor in graph[node]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor)
return order if len(order) == num_courses else []The only difference from the feasibility check is that we return the ordering itself instead of a boolean.
LeetCode 269 presents a list of words sorted in an alien language's lexicographic order. You must derive the ordering of characters in that language. This is a topological sort problem in disguise.
The key observation is that you can extract ordering constraints by comparing adjacent words in the list. For two adjacent words, find the first position where they differ. If word1[i] is 'a' and word2[i] is 'c', then 'a' comes before 'c' in the alien alphabet. That gives you a directed edge from 'a' to 'c'.
After extracting all such edges from every pair of adjacent words, you have a directed graph on characters. A topological sort of this graph gives you the alien alphabet ordering. If the graph has a cycle, the input is inconsistent and no valid ordering exists.
One edge case to watch: if a longer word appears before its prefix (for example, "abc" before "ab"), the input is invalid because in any lexicographic order, a prefix always comes first.
def alien_order(words):
graph = defaultdict(set)
in_degree = {c: 0 for word in words for c in word}
for i in range(len(words) - 1):
w1, w2 = words[i], words[i + 1]
min_len = min(len(w1), len(w2))
if len(w1) > len(w2) and w1[:min_len] == w2[:min_len]:
return "" # invalid: longer word before its prefix
for j in range(min_len):
if w1[j] != w2[j]:
if w2[j] not in graph[w1[j]]:
graph[w1[j]].add(w2[j])
in_degree[w2[j]] += 1
break
queue = deque(c for c in in_degree if in_degree[c] == 0)
result = []
while queue:
c = queue.popleft()
result.append(c)
for neighbor in graph[c]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor)
if len(result) == len(in_degree):
return "".join(result)
return "" # cycle detectedWe use a set for each node's neighbors to avoid duplicate edges when the same character pair appears in multiple adjacent word comparisons. The rest is standard Kahn's algorithm.
Some problems ask for the lexicographically smallest valid topological ordering. The modification to Kahn's algorithm is elegant: replace the FIFO queue with a min-heap (priority queue). Instead of processing zero-in-degree nodes in the order they were discovered, you always process the smallest one first.
import heapq
def smallest_topological_order(num_nodes, edges):
graph = defaultdict(list)
in_degree = [0] * num_nodes
for u, v in edges:
graph[u].append(v)
in_degree[v] += 1
heap = [i for i in range(num_nodes) if in_degree[i] == 0]
heapq.heapify(heap)
order = []
while heap:
node = heapq.heappop(heap)
order.append(node)
for neighbor in graph[node]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
heapq.heappush(heap, neighbor)
return order if len(order) == num_nodes else []This increases the time complexity from to because of the heap operations, but it guarantees that among all valid topological orderings, you get the lexicographically smallest one. The DFS-based approach cannot easily produce this variant, which is another advantage of Kahn's algorithm.
Both Kahn's algorithm and the DFS-based approach run in time, where V is the number of nodes and E is the number of edges. Each node is processed exactly once (dequeued in Kahn's, finished in DFS), and each edge is examined exactly once (when decrementing in-degrees or exploring neighbors).
The space complexity is for storing the graph as an adjacency list, plus for the in-degree array (Kahn's) or the color array (DFS). The queue or recursion stack holds at most V nodes.
For the Course Schedule problems, V is the number of courses and E is the number of prerequisites. For the Alien Dictionary, V is the number of distinct characters and E is the number of ordering constraints derived from adjacent words. In both cases, the algorithm is linear in the size of the input, which is optimal since you must at least read every edge to determine the ordering.
You're probably looking at topological sort when:
Common templates:
Practice Problems (18)