Graph

DFS / BFS: Connected Components & Reachability

Traverse the graph from each unvisited node to find all connected components. Mark nodes visited to avoid revisiting. Works for number of islands, counting provinces, and detecting connected regions.

DFS/BFS: Connected Components and Reachability

Number of Islands

LeetCode 200 asks you to work with an m-by-n grid of characters where '1' represents land and '0' represents water. An island is a group of adjacent land cells connected horizontally or vertically (not diagonally). The task is to count how many islands exist in the grid.

Consider this small grid:

1 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 1 1

There are three islands here. The four 1s in the top-left corner are all connected and form one island. The lone 1 in the middle of row 2 is surrounded by water on all sides, so it forms its own island. The two 1s in the bottom-right are adjacent and form the third.

This is fundamentally a connected components problem. Each island is a connected component in a graph where every land cell is a node and edges connect horizontally or vertically adjacent land cells. Counting islands means counting connected components.

DFS Flood-Fill Approach

The algorithm is simple: scan the grid from top-left to bottom-right. Every time you encounter a '1' that has not been visited, you have discovered a new island. Immediately launch a depth-first search from that cell, marking every connected '1' as visited (or overwriting it with '0' so you never count it again). After the DFS completes, increment your island count. Continue scanning. When you have processed every cell, the count is your answer.

Let us trace through the grid above. We start at cell (0,0), which is '1'. We launch DFS. The search visits (0,0), moves right to (0,1), then down to (1,1), then left to (1,0). Each of these cells gets marked as '0'. There are no more unvisited '1' neighbors, so DFS returns. Island count becomes 1.

We continue scanning. Cells (0,2) through (0,4) are '0', so we skip them. Same for (1,2) through (1,4). We reach (2,0) and (2,1), both '0'. Then (2,2) is '1'. We launch DFS again. This cell has no unvisited land neighbors, so DFS marks just this one cell and returns. Island count becomes 2.

We keep scanning. Eventually we reach (3,3), which is '1'. DFS visits (3,3) and moves right to (3,4). Both marked. Island count becomes 3. The remaining cells are all water. Final answer: 3.

Code: Number of Islands

def numIslands(grid):
    if not grid:
        return 0

    rows, cols = len(grid), len(grid[0])
    count = 0

    def dfs(r, c):
        if r < 0 or r >= rows or c < 0 or c >= cols:
            return
        if grid[r][c] != '1':
            return
        grid[r][c] = '0'  # mark visited by sinking the land
        dfs(r + 1, c)
        dfs(r - 1, c)
        dfs(r, c + 1)
        dfs(r, c - 1)

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                dfs(r, c)
                count += 1

    return count

The DFS function checks bounds and whether the current cell is land. If so, it marks the cell as '0' (effectively sinking it) and recurses in all four directions. The outer loop scans every cell; each time it finds surviving land, it triggers a flood-fill and bumps the count.

BFS Alternative

The same idea works with breadth-first search. Instead of recursion, you use a queue. When you find an unvisited '1', you add it to the queue and process neighbors level by level.

from collections import deque

def numIslands_bfs(grid):
    if not grid:
        return 0

    rows, cols = len(grid), len(grid[0])
    count = 0

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                count += 1
                grid[r][c] = '0'
                queue = deque([(r, c)])
                while queue:
                    row, col = queue.popleft()
                    for dr, dc in [(1, 0), (-1, 0), (0, 1), (0, -1)]:
                        nr, nc = row + dr, col + dc
                        if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == '1':
                            grid[nr][nc] = '0'
                            queue.append((nr, nc))

    return count

The output is identical. The difference is mechanical: BFS explores cells in expanding wavefronts from the starting cell, while DFS dives deep along one path before backtracking.

When to Use DFS vs BFS

For pure connected-components counting, it does not matter which you choose. DFS is often simpler to write because recursion handles the stack for you. BFS requires explicitly managing a queue, which adds a few lines of code but nothing conceptually harder.

The choice matters when the problem asks for something beyond reachability. BFS naturally discovers nodes in order of their distance from the source, which makes it the right tool when you need shortest paths in unweighted graphs. DFS does not provide distance information without extra bookkeeping.

As a rule of thumb: use DFS when you only care about reachability or component membership. Use BFS when you need shortest distances or level-by-level processing.

Connected Components in a General Graph

Grids are a special case where edges are implicit (each cell connects to its four neighbors). In a general graph represented as an adjacency list, as in LeetCode 323, the same pattern applies. You iterate over all nodes, and for each unvisited node, you launch DFS or BFS to discover its entire component.

def count_components(n, edges):
    graph = [[] for _ in range(n)]
    for u, v in edges:
        graph[u].append(v)
        graph[v].append(u)

    visited = [False] * n
    count = 0

    def dfs(node):
        visited[node] = True
        for neighbor in graph[node]:
            if not visited[neighbor]:
                dfs(neighbor)

    for i in range(n):
        if not visited[i]:
            dfs(i)
            count += 1

    return count

This handles undirected graphs. For each node from 0 to n-1, if it has not been visited, we start a DFS that marks all reachable nodes. The number of times we start a new DFS equals the number of connected components.

In-Place Marking vs Visited Set

In the Number of Islands solution, we modified the input grid directly, changing '1' to '0' to mark visited cells. This avoids allocating a separate visited set and saves $O(m \cdot n)$ space. But it destroys the input, which may not be acceptable if the caller needs the grid later.

The alternative is to keep a visited set (or a boolean matrix of the same dimensions as the grid) and check it before processing each cell. This preserves the input at the cost of extra memory. In interviews, it is worth asking the interviewer whether mutating the input is acceptable. If the grid is enormous and memory is tight, in-place marking is preferred. If the grid will be reused, a visited set is safer.

For general graphs, a visited set (or array) is the standard approach since there is no grid to overwrite.

Largest Component / Largest Island

LeetCode 695 extends this idea: find the size of the largest connected component. The modification is minimal: during each DFS/BFS traversal, count how many nodes you visit. Track the maximum across all components.

def largest_island(grid):
    if not grid:
        return 0

    rows, cols = len(grid), len(grid[0])
    max_size = 0

    def dfs(r, c):
        if r < 0 or r >= rows or c < 0 or c >= cols:
            return 0
        if grid[r][c] != '1':
            return 0
        grid[r][c] = '0'
        return 1 + dfs(r + 1, c) + dfs(r - 1, c) + dfs(r, c + 1) + dfs(r, c - 1)

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                size = dfs(r, c)
                max_size = max(max_size, size)

    return max_size

Each call to DFS now returns 1 (for the current cell) plus the sizes returned by its four recursive calls. The total is the size of the island rooted at the starting cell.

Flood Fill (Paint Bucket)

Consider LeetCode 733: flood fill is perhaps the simplest application of DFS/BFS on a grid. Given a starting pixel and a new color, change the starting pixel and all connected pixels of the same original color to the new color. This is exactly the paint bucket tool in image editors.

def flood_fill(image, sr, sc, new_color):
    original = image[sr][sc]
    if original == new_color:
        return image

    rows, cols = len(image), len(image[0])

    def dfs(r, c):
        if r < 0 or r >= rows or c < 0 or c >= cols:
            return
        if image[r][c] != original:
            return
        image[r][c] = new_color
        dfs(r + 1, c)
        dfs(r - 1, c)
        dfs(r, c + 1)
        dfs(r, c - 1)

    dfs(sr, sc)
    return image

The early return when original == new_color is important. Without it, the DFS would loop infinitely because marking a cell with the new color would not distinguish it from unvisited cells of the original color (since they would be the same).

Clone Graph

LeetCode 133 presents a variation: you are given a reference to a node in a connected undirected graph and must return a deep copy of the entire graph. Each node has a value and a list of neighbors. The tricky part is that the graph may contain cycles, so a naive recursive copy would loop forever.

The solution is to maintain a hash map from original nodes to their clones. When you visit a node for the first time, create its clone, store it in the map, then recurse into each neighbor. If a neighbor already exists in the map, you have already cloned it, so just wire the existing clone into the neighbor list instead of recursing. This map serves double duty: it tracks visited nodes and lets you look up clones in O(1).

def cloneGraph(node):
    if not node:
        return None

    clones = {}

    def dfs(n):
        if n in clones:
            return clones[n]
        copy = Node(n.val)
        clones[n] = copy
        for neighbor in n.neighbors:
            copy.neighbors.append(dfs(neighbor))
        return copy

    return dfs(node)

The map clones maps each original node to its clone. When DFS reaches a node already in the map, it returns the existing clone immediately. This handles cycles: the clone is inserted into the map before its neighbors are processed, so a back-edge will find it and stop.

Surrounded Regions

For LeetCode 130, you are given an m-by-n board of 'X' and 'O', and must capture all regions of 'O' that are completely surrounded by 'X'. A region is surrounded if none of its cells touch the border of the board. Captured regions are flipped to 'X'.

The key insight is to work backwards: instead of checking every 'O' region to see if it touches a border, start from the border. Any 'O' on an edge of the board, and every 'O' connected to it, is safe and should not be flipped. Everything else gets captured.

def solve(board):
    if not board:
        return

    rows, cols = len(board), len(board[0])

    def dfs(r, c):
        if r < 0 or r >= rows or c < 0 or c >= cols:
            return
        if board[r][c] != 'O':
            return
        board[r][c] = 'S'  # mark as safe
        dfs(r + 1, c)
        dfs(r - 1, c)
        dfs(r, c + 1)
        dfs(r, c - 1)

    # mark all border-connected O's as safe
    for r in range(rows):
        dfs(r, 0)
        dfs(r, cols - 1)
    for c in range(cols):
        dfs(0, c)
        dfs(rows - 1, c)

    # flip: S -> O (safe), O -> X (captured)
    for r in range(rows):
        for c in range(cols):
            if board[r][c] == 'O':
                board[r][c] = 'X'
            elif board[r][c] == 'S':
                board[r][c] = 'O'

Three passes: first, DFS from every border 'O' to mark its connected component as safe ('S'). Second, any remaining 'O' is surrounded, so flip it to 'X'. Third, restore the safe cells back to 'O'.

Pacific Atlantic Water Flow

LeetCode 417 adds a twist: you are given an m-by-n matrix of heights where water can flow from a cell to any adjacent cell with equal or lower height. The Pacific ocean borders the top and left edges; the Atlantic borders the bottom and right edges. Find all cells from which water can reach both oceans.

The brute-force approach, running a search from every cell to see if it reaches both coasts, is expensive. The trick is to reverse the direction: start from the ocean and flow uphill. Run a DFS/BFS from every cell on the Pacific border (top row and left column), marking all cells reachable by flowing to equal-or-higher neighbors. Do the same from the Atlantic border. The answer is the intersection of the two reachable sets.

def pacificAtlantic(heights):
    if not heights:
        return []

    rows, cols = len(heights), len(heights[0])
    pacific = set()
    atlantic = set()

    def dfs(r, c, reachable):
        reachable.add((r, c))
        for dr, dc in [(1, 0), (-1, 0), (0, 1), (0, -1)]:
            nr, nc = r + dr, c + dc
            if (0 <= nr < rows and 0 <= nc < cols
                    and (nr, nc) not in reachable
                    and heights[nr][nc] >= heights[r][c]):
                dfs(nr, nc, reachable)

    for c in range(cols):
        dfs(0, c, pacific)       # top edge -> Pacific
        dfs(rows - 1, c, atlantic)  # bottom edge -> Atlantic
    for r in range(rows):
        dfs(r, 0, pacific)       # left edge -> Pacific
        dfs(r, cols - 1, atlantic)  # right edge -> Atlantic

    return list(pacific & atlantic)

Each DFS walks uphill from the ocean border. A cell is added to pacific if water from that cell could eventually flow down to the Pacific, and likewise for atlantic. The final intersection gives cells that can reach both.

Making a Large Island

LeetCode 827 takes this further: you are given an n-by-n binary grid and may flip at most one 0 to 1. Find the size of the largest island after the flip. This extends the basic connected-components pattern with a clever preprocessing step.

First, label every island with a unique ID and record its size. Then, for each 0 cell, look at its four neighbors. Collect the distinct island IDs among them and sum their sizes, plus one for the flipped cell itself. The maximum across all 0 cells (and across all original island sizes, in case no flip helps) is the answer.

def largestIsland(grid):
    n = len(grid)
    island_id = 2  # start labeling from 2 to avoid confusion with 0/1
    sizes = {}

    def dfs(r, c, uid):
        if r < 0 or r >= n or c < 0 or c >= n:
            return 0
        if grid[r][c] != 1:
            return 0
        grid[r][c] = uid
        return (1 + dfs(r + 1, c, uid) + dfs(r - 1, c, uid)
                  + dfs(r, c + 1, uid) + dfs(r, c - 1, uid))

    # label each island and record its size
    for r in range(n):
        for c in range(n):
            if grid[r][c] == 1:
                sizes[island_id] = dfs(r, c, island_id)
                island_id += 1

    if not sizes:
        return 1  # all zeros, flip one

    best = max(sizes.values())

    # try flipping each 0
    for r in range(n):
        for c in range(n):
            if grid[r][c] == 0:
                seen = set()
                total = 1
                for dr, dc in [(1, 0), (-1, 0), (0, 1), (0, -1)]:
                    nr, nc = r + dr, c + dc
                    if 0 <= nr < n and 0 <= nc < n and grid[nr][nc] > 1:
                        uid = grid[nr][nc]
                        if uid not in seen:
                            seen.add(uid)
                            total += sizes[uid]
                best = max(best, total)

    return best

The two-phase approach avoids repeated DFS. Phase one is standard connected-components with size tracking. Phase two iterates over water cells and uses the precomputed sizes to evaluate each flip in O(1). Using a set of IDs prevents double-counting when the same island borders the flipped cell from multiple directions.

Reachability and Path Enumeration

Some problems ask not for components but for simple reachability: can you reach a target from a source? Or they ask you to enumerate all paths.

Keys and Rooms (LC 841) is a straightforward reachability check. You start in room 0; each room contains keys to other rooms. The question is whether you can visit every room. This is a plain DFS from node 0 on a directed graph.

def canVisitAllRooms(rooms):
    visited = set()

    def dfs(room):
        visited.add(room)
        for key in rooms[room]:
            if key not in visited:
                dfs(key)

    dfs(0)
    return len(visited) == len(rooms)

All Paths From Source to Target (LC 797) asks for every path from node 0 to node n-1 in a directed acyclic graph. Since the graph is a DAG, there are no cycles, so you can use simple backtracking without a visited set.

def allPathsSourceTarget(graph):
    target = len(graph) - 1
    result = []

    def dfs(node, path):
        if node == target:
            result.append(path[:])
            return
        for neighbor in graph[node]:
            path.append(neighbor)
            dfs(neighbor, path)
            path.pop()

    dfs(0, [0])
    return result

The backtracking pattern appends a neighbor to the current path, recurses, then pops it off. Each time we reach the target, we snapshot the path. Because the graph is acyclic, every recursive branch terminates, and we enumerate all possible routes.

Complexity Analysis

For a graph with V vertices and E edges, both DFS and BFS visit every vertex once and examine every edge once. The time complexity is $O(V + E)$ and the space complexity is $O(V)$ for the visited set plus $O(V)$ for the recursion stack (DFS) or queue (BFS).

For an m-by-n grid, V = m n and E is at most 4 m * n (each cell has up to four neighbors), so the time complexity simplifies to $O(m \cdot n)$ . The space complexity is $O(m \cdot n)$ in the worst case for the recursion stack or queue, which occurs when the entire grid is land and DFS recurses through every cell. If you use in-place marking, you save the visited set space but still need the stack/queue space.

The key takeaway is that connected-components problems are linear in the size of the input. You look at each node and each edge exactly once. There is no way to do better, since you must at least read the entire input to determine the answer.

Recognizing this pattern

You're probably looking at DFS/BFS for connected components or reachability when:

The problem asks you to count islands, groups, regions, or connected clusters in a grid or graph
You need to clone, color, or label every node in a graph by traversing from a starting point
The question is "can we reach X from Y" on an unweighted graph and you don't care about path length
The input is a grid of 0s and 1s, or an adjacency list with no edge weights, and you need to explore everything

Common templates:

Grid flood fill: recurse into the four neighbors of each cell while marking visited. Example: Number of Islands.
Graph component counting: loop over all nodes, run DFS/BFS from each unvisited node, increment a counter. Example: Number of Provinces.
Reachability check: DFS from the source and check whether the target was visited, or whether the visited set covers every node. Example: Keys and Rooms.
Graph clone with hash map: DFS while maintaining an original-to-copy map so cycles do not cause infinite recursion. Example: Clone Graph.
Path enumeration on a DAG: backtracking DFS that records every route from source to sink. Example: All Paths From Source to Target.