Insertion Sort vs Merge Sort: Empirical Complexity Analysis

Introduction: Why Sorting Algorithms Matter in 2026

Sorting is a fundamental operation in computer science, powering everything from search engines to social media feeds. In 2026, as AI-driven apps process billions of data points in real-time, understanding the efficiency of sorting algorithms is more relevant than ever. This tutorial guides you through implementing insertion sort and merge sort, counting comparisons to analyze their computational complexity, and interpreting results in the context of the ECM1414 Data Structures and Algorithms assignment.

Setting Up the Comparison Counter

To measure complexity empirically, we need a global counter that tracks every comparison made during sorting. We define a function lessThan(x, y) that increments a global variable comparisons each time it is called. This approach mirrors real-world performance profiling in high-stakes environments like financial trading systems, where every microsecond counts.

comparisons = 0

def lessThan(x, y):
    global comparisons
    comparisons += 1
    return x < y

Use lessThan instead of the direct < operator in all sorting functions to ensure accurate counting.

Insertion Sort: The Classic Linear Approach

Insertion sort builds a sorted list by repeatedly inserting the next element into the correct position within the already-sorted portion. Its worst-case complexity is O(n²), making it inefficient for large datasets but simple to implement. Think of it like organizing a playlist by hand—you take one song at a time and place it among the already sorted tracks.

Implementing `insert(item, list)`

This function inserts an item into a sorted list using linear search to find the correct position.

def insert(item, lst):
    i = 0
    while i < len(lst) and lessThan(lst[i], item):
        i += 1
    lst.insert(i, item)

Implementing `insertSort(list)`

This function sorts a list by repeatedly calling insert for each element.

def insertSort(lst):
    sorted_list = []
    for item in lst:
        insert(item, sorted_list)
    return sorted_list

Merge Sort: The Divide-and-Conquer Champion

Merge sort splits the list into halves, recursively sorts each half, and merges the sorted halves. Its worst-case complexity is O(n log n), which scales much better for large n. This algorithm is analogous to the way modern search engines process query results—breaking down large problems into manageable chunks.

Implementing `split(list, list, list)`

This function splits a list into two halves.

def split(lst, left, right):
    mid = len(lst) // 2
    left.extend(lst[:mid])
    right.extend(lst[mid:])

Implementing `merge(list, list)`

This function merges two sorted lists into one sorted list.

def merge(left, right):
    result = []
    i = j = 0
    while i < len(left) and j < len(right):
        if lessThan(left[i], right[j]):
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    result.extend(left[i:])
    result.extend(right[j:])
    return result

Implementing `mergeSort(list)`

def mergeSort(lst):
    if len(lst) <= 1:
        return lst
    left = []
    right = []
    split(lst, left, right)
    left = mergeSort(left)
    right = mergeSort(right)
    return merge(left, right)

Generating Random Lists

We need a function to generate random lists of a given length n, with integers in the range [-10n, 10n]. This ensures a wide spread of values, mimicking real-world data variability.

import random

def randomList(n):
    return [random.randint(-10*n, 10*n) for _ in range(n)]

Running the Demo: `listdemo(n)`

This function demonstrates both sorting algorithms on the same random list and prints the comparison counts. For the assignment, run this for n = 25, 50, 75, 100 and include the outputs.

def listdemo(n):
    global comparisons
    lst = randomList(n)
    print("Original list:", lst)
    
    comparisons = 0
    sorted_insert = insertSort(lst.copy())
    print("Insert sort comparisons:", comparisons)
    
    comparisons = 0
    sorted_merge = mergeSort(lst.copy())
    print("Merge sort comparisons:", comparisons)

Data Collection and Tabulation

For n = 200, 400, 600, 800, 1000, generate 5 random lists each and record the comparison counts. Also test on already sorted lists (best-case) and reverse-sorted lists (worst-case). Tabulate the average counts. This data will help you estimate the coefficients a and b in the complexity functions an² and bn log₂ n.

Example Table Structure

n	Insert Sort (Random)	Merge Sort (Random)	Insert Sorted	Insert Reverse	Merge Sorted	Merge Reverse
200	...	...	...	...	...	...
400	...	...	...	...	...	...

Plotting the Results

Use a graphing tool (e.g., matplotlib in Python) to plot n on the x-axis and average comparisons on the y-axis. Distinguish between insertion sort and merge sort with different colors or markers. Overlay the best-case and worst-case data for both algorithms. The resulting graph will visually confirm the theoretical O(n²) vs O(n log n) growth.

Estimating Coefficients a and b

From your data, compute c_n / n² for insertion sort to estimate a, and c_n / (n log₂ n) for merge sort to estimate b. Tabulate these values to see if they remain roughly constant. For example, if insertion sort on a random list of n=1000 yields 500,000 comparisons, then a ≈ 500,000 / 1,000,000 = 0.5. Similarly, if merge sort yields 10,000 comparisons, then b ≈ 10,000 / (1000 * log₂(1000)) ≈ 10,000 / (1000 * 9.97) ≈ 1.003.

Commenting on Results

Your commentary should address:

Theoretical vs empirical: How well do your results match the expected O(n²) and O(n log n)? Discuss any discrepancies.
Best and worst cases: For insertion sort, the best case (already sorted) should yield O(n) comparisons, while worst case (reverse sorted) yields O(n²). For merge sort, all cases are O(n log n). Does your data support this?
Choice of complexity measure: We used comparison count. How does this relate to actual runtime? In practice, other operations (e.g., list insertions) may affect performance.
Lessons learned: This exercise reinforces why algorithm selection matters. In 2026, with data sizes exploding, choosing the right sorting algorithm can mean the difference between a responsive app and a laggy one. For example, AI recommendation systems rely on efficient sorting to deliver personalized content instantly.

Conclusion

By implementing insertion sort and merge sort and counting comparisons, you gain a concrete understanding of algorithmic complexity. This hands-on approach prepares you for real-world optimization challenges—whether you're building the next viral app or analyzing financial data. Remember, the best algorithm depends on the context: insertion sort shines on nearly sorted data, while merge sort provides consistent performance for large datasets. Happy coding!