Searching and Sorting

Gregory M. Kapfhammer

March 24, 2025

What are purposes of searching and sorting?

  • Searching finds a specific item in a collection
  • Sorting orders the items in a collection
  • Recursive and iterative algorithms are possible
  • Basic building blocks of many algorithms
  • One technique can be used to implement another!

Linear and tail recursion

  • Linear Recursion:

    • Each function makes at most one recursive call
    • Binary search is linear recursive (makes one call per level)
    • Keeps total calls to \(O(\log n)\) for binary search
  • Tail Recursion:

    • Recursive call is the last operation in the function
    • Compiler optimized into loops, thus avoiding stack overflow
    • Often more memory efficient than non-tail recursion
  • What is the evidence for linear and/or tail recursion?

  • If you see tail recursion try to make an iterative solution!

Iterative versus recursive searching

Empirically comparing the performance of iterative and recursive searching

  1. Implement both versions of the algorithm
    • Create a recursive implementation that uses function calls
    • Create an iterative implementation using loops
  2. Create benchmarking harness
    • Use time.perf_counter() for precise measurement
    • Implement a function like timetrials() that runs multiple trials
  3. Design controlled experiments
    • Use a doubling experiment with appropriate input sizes
    • Generate consistent test data for both implementations
  4. Collect and analyze data
    • Record execution times for each implementation
    • Calculate doubling ratios to determine time complexity

Let’s explore several sorting algorithms!

  • Quadratic time sorting algorithms are easy to implement
  • Various sorting algorithms have quadratic time complexity
  • Mergesort and quicksort are efficient yet harder to build
  • Python contains its own sorting algorithm called timsort
  • The divide and conquer paradigm is useful for sorting

Ascending and descending sort

flowchart LR
    subgraph Original["Original Array"]
        direction LR
        O1[30] --- O2[5] --- O3[100] --- O4[17] --- O5[82]
        style Original fill:#f5f5f5,stroke:#333
    end
    
    subgraph Ascending["Ascending Order"]
        direction LR
        A1[5] --- A2[17] --- A3[30] --- A4[82] --- A5[100]
        style Ascending fill:#f5f5f5,stroke:#333
    end
    
    subgraph Descending["Descending Order"]
        direction LR
        D1[100] --- D2[82] --- D3[30] --- D4[17] --- D5[5]
        style Descending fill:#f5f5f5,stroke:#333
    end

    AscendingSort["Ascending Sort"]
    DescendingSort["Descending Sort"]
    
    Original --> AscendingSort --> Ascending
    Original --> DescendingSort --> Descending

    style AscendingSort fill:#e6e6e6,stroke:#333,fontSize:12px
    style DescendingSort fill:#e6e6e6,stroke:#333,fontSize:12px

  • Ascending: Elements arranged from smallest to largest
  • Descending: Elements arranged from largest to smallest
  • Python’s built-in sorting functions support both:
    • Ascending: sorted(array) or array.sort()
    • Descending: sorted(array, reverse=True) or array.sort(reverse=True)

Detecting a sorted list

def issorted(L):
    for i in range(len(L)-1):
        if L[i]>L[i+1]:
            return False
    return True

A = [1,2,3,4,5]
print(A, "is sorted:", issorted(A))

B = [1,4,5,7,2]
print(B, "is sorted:", issorted(B))
[1, 2, 3, 4, 5] is sorted: True
[1, 4, 5, 7, 2] is sorted: False

How do we know if a list is sorted?

  • Confirm that adjacent elements are in the correct order
  • Use a single for loop to compare adjacent elements in L

All-pairs issorted function

def issorted_allpairs(L):
    for i in range(len(L)-1):
        for j in range(i+1, len(L)):
            if L[j] < L[i]:
              return False
    return True

A = [1,2,3,4,5]
print(A, "is sorted:", issorted_allpairs(A))

B = [1,4,5,7,2]
print(B, "is sorted:", issorted_allpairs(B))
[1, 2, 3, 4, 5] is sorted: True
[1, 4, 5, 7, 2] is sorted: False
  • Use a double for loop to compare all pairs of elements in L
  • The issorted_allpairs function has a time complexity of \(O(n^2)\)
  • The issorted function has a time complexity of \(O(n)\)

Bubble sort algorithm

def bubblesort(L):
    for _ in range(len(L)-1):
        for i in range(len(L)-1):
            if L[i]>L[i+1]:
                L[i], L[i+1] = L[i+1], L[i]

data = [30,100000,54,26,93,17,77,31,44,55,20]
print("Is the data sorted?", issorted(data))
print(data)
bubblesort(data)
print("Is the data sorted?", issorted(data))
print(data)
Is the data sorted? False
[30, 100000, 54, 26, 93, 17, 77, 31, 44, 55, 20]
Is the data sorted? True
[17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]
  • Use a double for loop to order all of the elements in L
  • The bubblesort function has a time complexity of \(O(n^2)\)

Stopping early with bubble sort

def bubblesort_stopearly(L):
    keepgoing = True
    while keepgoing:
        keepgoing = False
        for i in range(len(L)-1):
            if L[i]>L[i+1]:
                L[i], L[i+1] = L[i+1], L[i]
                keepgoing = True

data = [30,100000,54,26,93,17,77,31,44,55,20]
bubblesort_stopearly(data)
print(data)
[17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]
  • Use a while loop containing a for loop to order elements in L

  • Although it may consume less time for some lists, bubblesort_stopearly still has a time complexity of \(O(n^2)\)

Implementing selection sort

def selectionsort(L):
    n = len(L)
    for i in range(n-1):
        max_index=0        
        for index in range(n - i):
            if L[index] > L[max_index]:
                max_index = index
        L[n-i-1], L[max_index] = L[max_index], L[n-i-1]

data = [30,100000,54,26,93,17,77,31,44,55,20]
selectionsort(data)
print(data)
[17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]
  • Key invariant: after \(i\) runs the largest \(i\) elements are in final position

  • Find maximum element and swap it with last unsorted element

  • Yet, the selectionsort function still has a time complexity of \(O(n^2)\)

Implementing insertion sort

def insertionsort(L):
    n = len(L)
    for i in range(n):
        j = n - i - 1
        while j < n - 1 and L[j]>L[j+1]:
            L[j], L[j+1] = L[j+1], L[j]
            j+=1

data = [30,100000,54,26,93,17,77,31,44,55,20]
insertionsort(data)
print(data)
[17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]
  • Key invariant: after \(i\) runs the last \(i\) elements are in sorted order

  • May be faster if the list is already sorted (or almost already sorted)

  • Yet, the insertionsort function still has a time complexity of \(O(n^2)\)

Sorting in Python

X = [3,1,5]
Y = sorted(X)
print("X:", X)
print("Y:", Y)

X.sort()
print("X:", X)
print("Y:", Y)
X: [3, 1, 5]
Y: [1, 3, 5]
X: [1, 3, 5]
Y: [1, 3, 5]
  • Two main functions to sort a list: sort() and sorted()
  • sort orders list and sorted returns a new list that is sorted
  • Note that calling sorted does not change the contents of X

Let’s design and implement some faster sorting algorithms!

  • Divide and conquer algorithms:
    • Step 1: divide the problem into 2 or more pieces
    • Step 2: conquer the problem by solving the pieces
    • Step 3: combine the solutions on the parts into a solution

Implementing mergesort

def mergesort(L):
    if len(L) < 2:
        return
    mid = len(L) // 2
    A = L[:mid]
    B = L[mid:]
    mergesort(A); mergesort(B)
    merge(A, B, L)

def merge(A, B, L):   
    i = 0
    j = 0
    while i < len(A) and j < len(B):
        if A[i] < B[j]:
            L[i+j] = A[i]
            i = i + 1
        else:
            L[i+j] = B[j]
            j = j + 1
    L[i+j:] = A[i:] + B[j:]

data = [30,100000,54,26,93,17,77,31,44,55,20]
mergesort(data)
print(data)
[17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]

Implementing quicksort

from random import randrange

def quicksort(L):
    _quicksort(L, 0, len(L))

def _quicksort(L, left, right):
    if right - left > 1:    
        mid = partition(L, left, right)
        _quicksort(L, left, mid)
        _quicksort(L, mid+1, right)

def partition(L, left, right):
    pivot = randrange(left, right)
    L[pivot], L[right -1] = L[right -1], L[pivot]
    i, j, pivot = left, right - 2, right - 1
    while i < j:
        while L[i] < L[pivot]:
            i += 1
        while i < j and L[j] >= L[pivot]:
            j -= 1
        if i < j:
            L[i], L[j] = L[j], L[i]
    if L[pivot] <= L[i]:
        L[pivot], L[i] = L[i], L[pivot]
        pivot = i
    return pivot

data = [30,100000,54,26,93,17,77,31,44,55,20]
quicksort(data)
print(data)
[17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]

Deep dive into quicksort

Quick sort time complexity

  • Worst-case time complexity: \(O(n^2)\)
  • Worst-case occurs when pivot is smallest or largest element
  • Random pivot selection helps avoid worst-case on average

Time complexity breakdown

  • partition(L, left, right): \(O(n)\)
  • quicksort(L): Defined by _quicksort
  • _quicksort(L, left, right):
    • Average case: \(O(n \times \log n)\)
    • Worst case: \(O(n^2)\)

Compare mergesort and quicksort

Both algorithms use divide and conquer!

  • mergesort:
    • Stable sort maintains the relative order of equal elements
    • Worst-case time complexity is \(O(n \times \log n)\)
    • Requires additional space for merging intermediate sub-lists
  • quicksort:
    • Not a stable sort and thus relative order not preserved
    • Worst-case time complexity is \(O(n^2)\), but often faster in practice
    • In-place sorting means it does not require additional space

Exploring properties of sorting

  • Sorting algorithms can be stable or unstable:
    • Stable: Equal elements maintain their relative order
    • Unstable: Equal elements may change their relative order
  • Stability is most important when values have “satellite” data
  • Sorting algorithms can be in-place or not:
    • In-place: Algorithm only requires constant extra space
    • Not in-place: Algorithm requires non-constant space
  • Sorting algorithms may also be comparison-based or not
  • Question: Can we further improve sorting performance?

Improving the efficiency of sorting requires new assumptions and different approaches

  • Comparison-based sorting is at best \(O(n \times \log n)\)
  • Non-comparison-based sorting algorithms can be faster!

Counting sort algorithm

  • Non-comparison based sorting with \(O(n + k)\) time complexity

Radix sort algorithm

  • Non-comparison based sorting with \(O(d × (n + k))\) time complexity

Recapping count and radix sort

  • Counting sort:
    • Non-comparison sorting with \(O(n + k)\) time complexity
    • Efficient when range of input values (\(k\)) is not much larger than \(n\)
    • Requires extra space proportional to the range of input values
    • A stable sort that preserves relative order of equal elements
    • Counts occurrences of each element, reconstructs sorted array
  • Radix sort:
    • Non-comparison based sorting with \(O(d × (n + k))\) time complexity
    • Uses counting sort approach as a subroutine for each digit position
    • Sorts numbers digit by digit, from least to most significant

What if the sorting only processes data lists containing values in a very restricted range?

  • Sort decimals between restrictive minimum and maximum
  • There is often a need to sort probabilities between 0 and 1

Bucket sort algorithm

  • Assumes that are uniformly distributed between 0 and 1

What if want a list’s \(k\)th smallest element?

  • Quick selection finds the \(k\)th smallest element in an unsorted array
  • Wow, much more efficient than sorting the entire array first!
  • Average time complexity: \(O(n)\) versus \(O(n \times \log n)\) for sorting
  • Leverages the same partitioning strategy as quicksort function
  • Enables various types of statistical analysis without expensive sorting

Recursive quick selection

def quickselect(L, k):
    return _quickselect(L, k, 0, len(L))

def _quickselect(L, k, left, right):
    pivot = partition(L, left, right)
    if k <= pivot:
        return _quickselect(L, k, left, pivot)
    elif k == pivot + 1:
        return L[pivot]
    else:
        return _quickselect(L, k, pivot + 1, right)

data = [30,100000,54,26,93,17,77,31,44,55,20]
selection_one = quickselect(data, 1)
selection_three = quickselect(data, 3)
selection_five = quickselect(data, 5)
quicksort(data)
print(selection_one, selection_three, selection_five, "for", data)
17 26 31 for [17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]

Iterative quick selection

def quickselect(L, k):
    left, right = 0, len(L)
    while left < right:
        pivot = partition(L, left, right)
        if k <= pivot:
            right = pivot
        elif k == pivot + 1:
            return L[pivot]
        else:
            left = pivot + 1
    return L[left]

data = [30,100000,54,26,93,17,77,31,44,55,20]
selection_one = quickselect(data, 1)
selection_three = quickselect(data, 3)
selection_five = quickselect(data, 5)
quicksort(data)
print(selection_one, selection_three, selection_five, "for", data)
17 26 31 for [17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]

Reviewing divide and conquer

  • Binary Search
    • Recursive step takes constant time
    • Makes a single recursive call on smaller list
    • Time complexity: \(O(\log n)\)
  • Sorting Algorithms
    • Running time is linear plus recursive call cost
    • Total length of shorter lists is \(O(n)\)
    • Time complexity: Recursion depth of \(O(\log n)\) means \(O(n \times \log n)\)
  • Quick Selection
    • Running time is linear plus the cost of one recursive call
    • Time complexity: \(O(n)\)

Searching and sorting: Quick recap

  • Search algorithms
    • Binary search is \(O(\log n)\) but needs sorted input
    • Recursive and iterative functions offer similar speed
    • Tail recursion can be converted to iteration for better efficiency
  • Basic sorting algorithms
    • Simple algorithms (bubble, selection, insertion) are \(O(n^2)\)
    • Each maintains different invariants during execution
    • Suitable for small datasets or nearly-sorted lists

Sorting and selection: Quick recap

  • Advanced sorting algorithms
    • Divide-and-conquer algorithms (mergesort, quicksort) achieve \(O(n \log n)\) average time, provably best for comparison sorting
    • Non-comparison sorts (counting, radix, bucket) can beat \(O(n \log n)\) under specific assumptions about input data
  • Fast Selection algorithms

    • Quickselect finds \(k\)th element in \(O(n)\) average time without full list sorting, making it ideal for restricted cases
    • Shares partitioning strategy with quicksort but makes fewer recursive calls and is thus faster although more specialized