Programming lesson
Empirical Analysis of Insertion Sort vs Merge Sort: A Hands-On Guide for ECM1414
Learn how to implement insertion sort and merge sort in Python, count comparisons for complexity analysis, and compare empirical results with theoretical expectations. Perfect for ECM1414 Data Structures and Algorithms students.
Introduction: Why Sorting Algorithms Matter in 2026
Sorting is a fundamental operation in computer science, powering everything from search engines to social media feeds. In 2026, as AI-driven apps process billions of data points in real-time, understanding the efficiency of sorting algorithms is more relevant than ever. This tutorial guides you through implementing insertion sort and merge sort, counting comparisons to analyze their computational complexity, and interpreting results in the context of the ECM1414 Data Structures and Algorithms assignment.
Setting Up the Comparison Counter
To measure complexity empirically, we need a global counter that tracks every comparison made during sorting. We define a function lessThan(x, y) that increments a global variable comparisons each time it is called. This approach mirrors real-world performance profiling in high-stakes environments like financial trading systems, where every microsecond counts.
comparisons = 0
def lessThan(x, y):
global comparisons
comparisons += 1
return x < yUse lessThan instead of the direct < operator in all sorting functions to ensure accurate counting.
Insertion Sort: The Classic Linear Approach
Insertion sort builds a sorted list by repeatedly inserting the next element into the correct position within the already-sorted portion. Its worst-case complexity is O(n²), making it inefficient for large datasets but simple to implement. Think of it like organizing a playlist by hand—you take one song at a time and place it among the already sorted tracks.
Implementing insert(item, list)
This function inserts an item into a sorted list using linear search to find the correct position.
def insert(item, lst):
i = 0
while i < len(lst) and lessThan(lst[i], item):
i += 1
lst.insert(i, item)Implementing insertSort(list)
This function sorts a list by repeatedly calling insert for each element.
def insertSort(lst):
sorted_list = []
for item in lst:
insert(item, sorted_list)
return sorted_listMerge Sort: The Divide-and-Conquer Champion
Merge sort splits the list into halves, recursively sorts each half, and merges the sorted halves. Its worst-case complexity is O(n log n), which scales much better for large n. This algorithm is analogous to the way modern search engines process query results—breaking down large problems into manageable chunks.
Implementing split(list, list, list)
This function splits a list into two halves.
def split(lst, left, right):
mid = len(lst) // 2
left.extend(lst[:mid])
right.extend(lst[mid:])Implementing merge(list, list)
This function merges two sorted lists into one sorted list.
def merge(left, right):
result = []
i = j = 0
while i < len(left) and j < len(right):
if lessThan(left[i], right[j]):
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result.extend(left[i:])
result.extend(right[j:])
return resultImplementing mergeSort(list)
def mergeSort(lst):
if len(lst) <= 1:
return lst
left = []
right = []
split(lst, left, right)
left = mergeSort(left)
right = mergeSort(right)
return merge(left, right)Generating Random Lists
We need a function to generate random lists of a given length n, with integers in the range [-10n, 10n]. This ensures a wide spread of values, mimicking real-world data variability.
import random
def randomList(n):
return [random.randint(-10*n, 10*n) for _ in range(n)]Running the Demo: listdemo(n)
This function demonstrates both sorting algorithms on the same random list and prints the comparison counts. For the assignment, run this for n = 25, 50, 75, 100 and include the outputs.
def listdemo(n):
global comparisons
lst = randomList(n)
print("Original list:", lst)
comparisons = 0
sorted_insert = insertSort(lst.copy())
print("Insert sort comparisons:", comparisons)
comparisons = 0
sorted_merge = mergeSort(lst.copy())
print("Merge sort comparisons:", comparisons)Data Collection and Tabulation
For n = 200, 400, 600, 800, 1000, generate 5 random lists each and record the comparison counts. Also test on already sorted lists (best-case) and reverse-sorted lists (worst-case). Tabulate the average counts. This data will help you estimate the coefficients a and b in the complexity functions an² and bn log₂ n.
Example Table Structure
| n | Insert Sort (Random) | Merge Sort (Random) | Insert Sorted | Insert Reverse | Merge Sorted | Merge Reverse |
|---|---|---|---|---|---|---|
| 200 | ... | ... | ... | ... | ... | ... |
| 400 | ... | ... | ... | ... | ... | ... |
Plotting the Results
Use a graphing tool (e.g., matplotlib in Python) to plot n on the x-axis and average comparisons on the y-axis. Distinguish between insertion sort and merge sort with different colors or markers. Overlay the best-case and worst-case data for both algorithms. The resulting graph will visually confirm the theoretical O(n²) vs O(n log n) growth.
Estimating Coefficients a and b
From your data, compute c_n / n² for insertion sort to estimate a, and c_n / (n log₂ n) for merge sort to estimate b. Tabulate these values to see if they remain roughly constant. For example, if insertion sort on a random list of n=1000 yields 500,000 comparisons, then a ≈ 500,000 / 1,000,000 = 0.5. Similarly, if merge sort yields 10,000 comparisons, then b ≈ 10,000 / (1000 * log₂(1000)) ≈ 10,000 / (1000 * 9.97) ≈ 1.003.
Commenting on Results
Your commentary should address:
- Theoretical vs empirical: How well do your results match the expected O(n²) and O(n log n)? Discuss any discrepancies.
- Best and worst cases: For insertion sort, the best case (already sorted) should yield O(n) comparisons, while worst case (reverse sorted) yields O(n²). For merge sort, all cases are O(n log n). Does your data support this?
- Choice of complexity measure: We used comparison count. How does this relate to actual runtime? In practice, other operations (e.g., list insertions) may affect performance.
- Lessons learned: This exercise reinforces why algorithm selection matters. In 2026, with data sizes exploding, choosing the right sorting algorithm can mean the difference between a responsive app and a laggy one. For example, AI recommendation systems rely on efficient sorting to deliver personalized content instantly.
Conclusion
By implementing insertion sort and merge sort and counting comparisons, you gain a concrete understanding of algorithmic complexity. This hands-on approach prepares you for real-world optimization challenges—whether you're building the next viral app or analyzing financial data. Remember, the best algorithm depends on the context: insertion sort shines on nearly sorted data, while merge sort provides consistent performance for large datasets. Happy coding!