Graph
Traverse the graph from each unvisited node to find all connected components. Mark nodes visited to avoid revisiting. Works for number of islands, counting provinces, and detecting connected regions.
LeetCode 200 asks you to work with an m-by-n grid of characters where '1' represents land and '0' represents water. An island is a group of adjacent land cells connected horizontally or vertically (not diagonally). The task is to count how many islands exist in the grid.
Consider this small grid:
1 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 1 1
There are three islands here. The four 1s in the top-left corner are all connected and form one island. The lone 1 in the middle of row 2 is surrounded by water on all sides, so it forms its own island. The two 1s in the bottom-right are adjacent and form the third.
This is fundamentally a connected components problem. Each island is a connected component in a graph where every land cell is a node and edges connect horizontally or vertically adjacent land cells. Counting islands means counting connected components.
The algorithm is simple: scan the grid from top-left to bottom-right. Every time you encounter a '1' that has not been visited, you have discovered a new island. Immediately launch a depth-first search from that cell, marking every connected '1' as visited (or overwriting it with '0' so you never count it again). After the DFS completes, increment your island count. Continue scanning. When you have processed every cell, the count is your answer.
Let us trace through the grid above. We start at cell (0,0), which is '1'. We launch DFS. The search visits (0,0), moves right to (0,1), then down to (1,1), then left to (1,0). Each of these cells gets marked as '0'. There are no more unvisited '1' neighbors, so DFS returns. Island count becomes 1.
We continue scanning. Cells (0,2) through (0,4) are '0', so we skip them. Same for (1,2) through (1,4). We reach (2,0) and (2,1), both '0'. Then (2,2) is '1'. We launch DFS again. This cell has no unvisited land neighbors, so DFS marks just this one cell and returns. Island count becomes 2.
We keep scanning. Eventually we reach (3,3), which is '1'. DFS visits (3,3) and moves right to (3,4). Both marked. Island count becomes 3. The remaining cells are all water. Final answer: 3.
def numIslands(grid):
if not grid:
return 0
rows, cols = len(grid), len(grid[0])
count = 0
def dfs(r, c):
if r < 0 or r >= rows or c < 0 or c >= cols:
return
if grid[r][c] != '1':
return
grid[r][c] = '0' # mark visited by sinking the land
dfs(r + 1, c)
dfs(r - 1, c)
dfs(r, c + 1)
dfs(r, c - 1)
for r in range(rows):
for c in range(cols):
if grid[r][c] == '1':
dfs(r, c)
count += 1
return countThe DFS function checks bounds and whether the current cell is land. If so, it marks the cell as '0' (effectively sinking it) and recurses in all four directions. The outer loop scans every cell; each time it finds surviving land, it triggers a flood-fill and bumps the count.
The same idea works with breadth-first search. Instead of recursion, you use a queue. When you find an unvisited '1', you add it to the queue and process neighbors level by level.
from collections import deque
def numIslands_bfs(grid):
if not grid:
return 0
rows, cols = len(grid), len(grid[0])
count = 0
for r in range(rows):
for c in range(cols):
if grid[r][c] == '1':
count += 1
grid[r][c] = '0'
queue = deque([(r, c)])
while queue:
row, col = queue.popleft()
for dr, dc in [(1, 0), (-1, 0), (0, 1), (0, -1)]:
nr, nc = row + dr, col + dc
if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == '1':
grid[nr][nc] = '0'
queue.append((nr, nc))
return countThe output is identical. The difference is mechanical: BFS explores cells in expanding wavefronts from the starting cell, while DFS dives deep along one path before backtracking.
For pure connected-components counting, it does not matter which you choose. DFS is often simpler to write because recursion handles the stack for you. BFS requires explicitly managing a queue, which adds a few lines of code but nothing conceptually harder.
The choice matters when the problem asks for something beyond reachability. BFS naturally discovers nodes in order of their distance from the source, which makes it the right tool when you need shortest paths in unweighted graphs. DFS does not provide distance information without extra bookkeeping.
As a rule of thumb: use DFS when you only care about reachability or component membership. Use BFS when you need shortest distances or level-by-level processing.
Grids are a special case where edges are implicit (each cell connects to its four neighbors). In a general graph represented as an adjacency list, as in LeetCode 323, the same pattern applies. You iterate over all nodes, and for each unvisited node, you launch DFS or BFS to discover its entire component.
def count_components(n, edges):
graph = [[] for _ in range(n)]
for u, v in edges:
graph[u].append(v)
graph[v].append(u)
visited = [False] * n
count = 0
def dfs(node):
visited[node] = True
for neighbor in graph[node]:
if not visited[neighbor]:
dfs(neighbor)
for i in range(n):
if not visited[i]:
dfs(i)
count += 1
return countThis handles undirected graphs. For each node from 0 to n-1, if it has not been visited, we start a DFS that marks all reachable nodes. The number of times we start a new DFS equals the number of connected components.
In the Number of Islands solution, we modified the input grid directly, changing '1' to '0' to mark visited cells. This avoids allocating a separate visited set and saves space. But it destroys the input, which may not be acceptable if the caller needs the grid later.
The alternative is to keep a visited set (or a boolean matrix of the same dimensions as the grid) and check it before processing each cell. This preserves the input at the cost of extra memory. In interviews, it is worth asking the interviewer whether mutating the input is acceptable. If the grid is enormous and memory is tight, in-place marking is preferred. If the grid will be reused, a visited set is safer.
For general graphs, a visited set (or array) is the standard approach since there is no grid to overwrite.
LeetCode 695 extends this idea: find the size of the largest connected component. The modification is minimal: during each DFS/BFS traversal, count how many nodes you visit. Track the maximum across all components.
def largest_island(grid):
if not grid:
return 0
rows, cols = len(grid), len(grid[0])
max_size = 0
def dfs(r, c):
if r < 0 or r >= rows or c < 0 or c >= cols:
return 0
if grid[r][c] != '1':
return 0
grid[r][c] = '0'
return 1 + dfs(r + 1, c) + dfs(r - 1, c) + dfs(r, c + 1) + dfs(r, c - 1)
for r in range(rows):
for c in range(cols):
if grid[r][c] == '1':
size = dfs(r, c)
max_size = max(max_size, size)
return max_sizeEach call to DFS now returns 1 (for the current cell) plus the sizes returned by its four recursive calls. The total is the size of the island rooted at the starting cell.
Consider LeetCode 733: flood fill is perhaps the simplest application of DFS/BFS on a grid. Given a starting pixel and a new color, change the starting pixel and all connected pixels of the same original color to the new color. This is exactly the paint bucket tool in image editors.
def flood_fill(image, sr, sc, new_color):
original = image[sr][sc]
if original == new_color:
return image
rows, cols = len(image), len(image[0])
def dfs(r, c):
if r < 0 or r >= rows or c < 0 or c >= cols:
return
if image[r][c] != original:
return
image[r][c] = new_color
dfs(r + 1, c)
dfs(r - 1, c)
dfs(r, c + 1)
dfs(r, c - 1)
dfs(sr, sc)
return imageThe early return when original == new_color is important. Without it, the DFS would loop infinitely because marking a cell with the new color would not distinguish it from unvisited cells of the original color (since they would be the same).
LeetCode 133 presents a variation: you are given a reference to a node in a connected undirected graph and must return a deep copy of the entire graph. Each node has a value and a list of neighbors. The tricky part is that the graph may contain cycles, so a naive recursive copy would loop forever.
The solution is to maintain a hash map from original nodes to their clones. When you visit a node for the first time, create its clone, store it in the map, then recurse into each neighbor. If a neighbor already exists in the map, you have already cloned it, so just wire the existing clone into the neighbor list instead of recursing. This map serves double duty: it tracks visited nodes and lets you look up clones in O(1).
def cloneGraph(node):
if not node:
return None
clones = {}
def dfs(n):
if n in clones:
return clones[n]
copy = Node(n.val)
clones[n] = copy
for neighbor in n.neighbors:
copy.neighbors.append(dfs(neighbor))
return copy
return dfs(node)The map clones maps each original node to its clone. When DFS reaches a node already in the map, it returns the existing clone immediately. This handles cycles: the clone is inserted into the map before its neighbors are processed, so a back-edge will find it and stop.
For LeetCode 130, you are given an m-by-n board of 'X' and 'O', and must capture all regions of 'O' that are completely surrounded by 'X'. A region is surrounded if none of its cells touch the border of the board. Captured regions are flipped to 'X'.
The key insight is to work backwards: instead of checking every 'O' region to see if it touches a border, start from the border. Any 'O' on an edge of the board, and every 'O' connected to it, is safe and should not be flipped. Everything else gets captured.
def solve(board):
if not board:
return
rows, cols = len(board), len(board[0])
def dfs(r, c):
if r < 0 or r >= rows or c < 0 or c >= cols:
return
if board[r][c] != 'O':
return
board[r][c] = 'S' # mark as safe
dfs(r + 1, c)
dfs(r - 1, c)
dfs(r, c + 1)
dfs(r, c - 1)
# mark all border-connected O's as safe
for r in range(rows):
dfs(r, 0)
dfs(r, cols - 1)
for c in range(cols):
dfs(0, c)
dfs(rows - 1, c)
# flip: S -> O (safe), O -> X (captured)
for r in range(rows):
for c in range(cols):
if board[r][c] == 'O':
board[r][c] = 'X'
elif board[r][c] == 'S':
board[r][c] = 'O'Three passes: first, DFS from every border 'O' to mark its connected component as safe ('S'). Second, any remaining 'O' is surrounded, so flip it to 'X'. Third, restore the safe cells back to 'O'.
LeetCode 417 adds a twist: you are given an m-by-n matrix of heights where water can flow from a cell to any adjacent cell with equal or lower height. The Pacific ocean borders the top and left edges; the Atlantic borders the bottom and right edges. Find all cells from which water can reach both oceans.
The brute-force approach, running a search from every cell to see if it reaches both coasts, is expensive. The trick is to reverse the direction: start from the ocean and flow uphill. Run a DFS/BFS from every cell on the Pacific border (top row and left column), marking all cells reachable by flowing to equal-or-higher neighbors. Do the same from the Atlantic border. The answer is the intersection of the two reachable sets.
def pacificAtlantic(heights):
if not heights:
return []
rows, cols = len(heights), len(heights[0])
pacific = set()
atlantic = set()
def dfs(r, c, reachable):
reachable.add((r, c))
for dr, dc in [(1, 0), (-1, 0), (0, 1), (0, -1)]:
nr, nc = r + dr, c + dc
if (0 <= nr < rows and 0 <= nc < cols
and (nr, nc) not in reachable
and heights[nr][nc] >= heights[r][c]):
dfs(nr, nc, reachable)
for c in range(cols):
dfs(0, c, pacific) # top edge -> Pacific
dfs(rows - 1, c, atlantic) # bottom edge -> Atlantic
for r in range(rows):
dfs(r, 0, pacific) # left edge -> Pacific
dfs(r, cols - 1, atlantic) # right edge -> Atlantic
return list(pacific & atlantic)Each DFS walks uphill from the ocean border. A cell is added to pacific if water from that cell could eventually flow down to the Pacific, and likewise for atlantic. The final intersection gives cells that can reach both.
LeetCode 827 takes this further: you are given an n-by-n binary grid and may flip at most one 0 to 1. Find the size of the largest island after the flip. This extends the basic connected-components pattern with a clever preprocessing step.
First, label every island with a unique ID and record its size. Then, for each 0 cell, look at its four neighbors. Collect the distinct island IDs among them and sum their sizes, plus one for the flipped cell itself. The maximum across all 0 cells (and across all original island sizes, in case no flip helps) is the answer.
def largestIsland(grid):
n = len(grid)
island_id = 2 # start labeling from 2 to avoid confusion with 0/1
sizes = {}
def dfs(r, c, uid):
if r < 0 or r >= n or c < 0 or c >= n:
return 0
if grid[r][c] != 1:
return 0
grid[r][c] = uid
return (1 + dfs(r + 1, c, uid) + dfs(r - 1, c, uid)
+ dfs(r, c + 1, uid) + dfs(r, c - 1, uid))
# label each island and record its size
for r in range(n):
for c in range(n):
if grid[r][c] == 1:
sizes[island_id] = dfs(r, c, island_id)
island_id += 1
if not sizes:
return 1 # all zeros, flip one
best = max(sizes.values())
# try flipping each 0
for r in range(n):
for c in range(n):
if grid[r][c] == 0:
seen = set()
total = 1
for dr, dc in [(1, 0), (-1, 0), (0, 1), (0, -1)]:
nr, nc = r + dr, c + dc
if 0 <= nr < n and 0 <= nc < n and grid[nr][nc] > 1:
uid = grid[nr][nc]
if uid not in seen:
seen.add(uid)
total += sizes[uid]
best = max(best, total)
return bestThe two-phase approach avoids repeated DFS. Phase one is standard connected-components with size tracking. Phase two iterates over water cells and uses the precomputed sizes to evaluate each flip in O(1). Using a set of IDs prevents double-counting when the same island borders the flipped cell from multiple directions.
Some problems ask not for components but for simple reachability: can you reach a target from a source? Or they ask you to enumerate all paths.
Keys and Rooms (LC 841) is a straightforward reachability check. You start in room 0; each room contains keys to other rooms. The question is whether you can visit every room. This is a plain DFS from node 0 on a directed graph.
def canVisitAllRooms(rooms):
visited = set()
def dfs(room):
visited.add(room)
for key in rooms[room]:
if key not in visited:
dfs(key)
dfs(0)
return len(visited) == len(rooms)All Paths From Source to Target (LC 797) asks for every path from node 0 to node n-1 in a directed acyclic graph. Since the graph is a DAG, there are no cycles, so you can use simple backtracking without a visited set.
def allPathsSourceTarget(graph):
target = len(graph) - 1
result = []
def dfs(node, path):
if node == target:
result.append(path[:])
return
for neighbor in graph[node]:
path.append(neighbor)
dfs(neighbor, path)
path.pop()
dfs(0, [0])
return resultThe backtracking pattern appends a neighbor to the current path, recurses, then pops it off. Each time we reach the target, we snapshot the path. Because the graph is acyclic, every recursive branch terminates, and we enumerate all possible routes.
For a graph with V vertices and E edges, both DFS and BFS visit every vertex once and examine every edge once. The time complexity is and the space complexity is for the visited set plus for the recursion stack (DFS) or queue (BFS).
For an m-by-n grid, V = m n and E is at most 4 m * n (each cell has up to four neighbors), so the time complexity simplifies to . The space complexity is in the worst case for the recursion stack or queue, which occurs when the entire grid is land and DFS recurses through every cell. If you use in-place marking, you save the visited set space but still need the stack/queue space.
The key takeaway is that connected-components problems are linear in the size of the input. You look at each node and each edge exactly once. There is no way to do better, since you must at least read the entire input to determine the answer.
You're probably looking at DFS/BFS for connected components or reachability when:
Common templates:
Practice Problems (76)