Simple analysis on Nested-loop, Merge Sort and Hash Join
February 01, 2020
There are 3 common database join methods (there are more, but lets start with these):
- Nested-loop
- Merge Sort
- Hash Join
Here I am trying to do a simple comparison analysis on these various methods.
Let us suppose we have 2 tables A
& B
, with primary keys A.id
and B.id
respectively. Tables A
and B
has a one-to-many relationship, where table A
has A.b_id
as a Foreign Key to B
. Also let table A
has m rows, and B
has n rows. We shall also assume we have some sort of B-tree index on A.id
and B.id
.
-
Case 1:
SELECT * FROM A JOIN B on A.b_id = B.id
-
Nested-loop Join
Plan: It first scan through A, and on each row, search the index on B.
Cost: O(m x log(n))
-
Merge Sort (A is presorted by A.b_id)
E.g. A has clustered index on A.b_id (likely if A has one-to-one relationship with B).
Plan: A simply merge with B.
Cost: O(m + n)
-
Merge Sort (A is NOT presorted by A.b_id)
Plan: It first sort A by A.b_id, then Merge with B.
Cost: O(m x log(m) + m + n) = O(m x log(m))
-
Hash Join
Plan: It first scan through A to create a Hash, then it loop through B and probe the created Hash. The cost of accessing the Hash should be of O(1) in most cases, unless there is heavy collisions.
Cost: O(m + n + m) = O(2m + n)
-
-
Case 2:
SELECT * FROM A JOIN B on A.b_id = B.id WHERE A.id =?
-
Nested-loop Join
Plan: It first does index seek through A, and on each row, search the index on B.
Cost: O(log(m) + log(n))
-
Merge Sort
Plan: It first does index seek through A, sort resulting set by A.b_id (this case, nothing, as we have only 1 row), then Merge with B.
Cost: O(log(m) + n)
-
Hash Join
Plan: It first does index seek through A, scan through the resulting set to create a Hash on A.b_id (again cost minimal in this case as we have only 1 row), then it loop through B and probe the created Hash.
Cost: O(log(m) + n)
-
For one row result, Nested-loop rocks. For large pre-sorted dataset on the join keys, Merge Sort rocks. For any other large dataset, Hash Join rocks.
Hope this helps!!