A document from MCS 275 Spring 2023, instructor David Dumas. You can also get the notebook file.

MCS 275 Spring 2022 Worksheet 7

  • Course instructor: David Dumas

Topics

The main topic of this worksheet is sorting, and the recursive algorithms for sorting we covered in Lectures 13 and 14.

Resources

These things might be helpful while working on the problems. Remember that for worksheets, we don't strictly limit what resources you can consult, so these are only suggestions.

1. Merge sorted stacks

Recall that a stack is a data structure that mimics a physical stack of items, where the only available operations are to remove the top item ("pop") or add a new item to the top ("push").

In Python, you can simulate a stack using a list by limiting yourself to only calling the methods

  • .pop() for the stack pop operation, and
  • .append(x) for the stack push operation

In this way, the end of the list becomes the top of the stack.

In mergesort, the main step was to create a function that can merge two sorted lists. We made a version of this that uses indexing by integers. However, the algorithm for merging two sorted lists only ever needs to look at the "next" item from each of the lists, meaning it can also be implemented using stacks.

Make a function merge_sorted_stacks(A,B,S) that takes two stacks A and B whose elements are in sorted order, with the top of each stack being the smallest element, and an initially empty stack S. The function should merge the two stacks into a single reverse-sorted stack S. It can destroy A and B as it does so.

Remember, A, B, and S will actually be Python list objects. The only thing that makes them stacks is that you won't use any methods or operations on them except .pop() and .append(x).

For example, merge_sorted_stacks should function as follows:

In [8]:
# Example with numbers
# A list of numbers is a sorted stack if it is in descending order
# meaning the top of stack (last element of the list) is the smallest.
A = [5,3,1]
B = [6,4,3,2,0]
S = []
merge_sorted_stacks(A,B,S)
S  # will be a reverse sorted stack: top=last element will be largest element
Out[8]:
[0, 1, 2, 3, 3, 4, 5, 6]
In [9]:
# Example with strings
# A list of strings is a sorted stack if it is in reverse alphabetical order
# meaning the top of stack (last element of the list) is the earliest in 
# the Python string order
S = []
merge_sorted_stacks(
    ["zebra","kangaroo","aardvark"],
    ["newt","asp"],
    S)
S
Out[9]:
['aardvark', 'asp', 'kangaroo', 'newt', 'zebra']

2. Quicksort with other pivot strategies

The quicksort implementation we discussed in lecture uses the last element of the list as a pivot. Let's explore other options.

Make a new version of quicksort that has an optional argument pivot_strategy with default value "last". Implement these other behaviors for the following values:

  • "first" - Always use the first element as the pivot
  • "middle" - Use an element as close as possible to the middle of the list as the pivot
  • "random" - Select the pivot position at random

(Of course, quicksort is always operating on the part of the list between start and end, so all instances of "first", "last", etc., are understood relative to that part. For example, "first" means index start.)

Test your modified version of quicksort to confirm that it works properly with each strategy.

Don't forget that the pivot_strategy argument needs to be propagated to the recursive calls!

3. Pathological test data generator

We discussed in class that quicksort with last element pivots behaves poorly (quadratic time) on sorted input. I mentioned that "nearly sorted input" also creates problems. But the phrase "nearly sorted" is not precise, and could have many interpretations.

Write the following functions to generate lists that are far from randomly ordered, for testing purposes:

  • nearly_sorted1(n,frac=0.1) - Returns a list of length n. That list should contain consecutive integers, except that int(frac*n) of them are replaced with random values chosen from the range -2*n to 3*n. That is, fraction frac of the positions are chosen randomly from a wider range, but the rest are linearly increasing.
  • nearly_sorted2(n) - Returns a list of length n. The difference between any entry and the one before it is a number randomly chosen from these options: [-1,0,1,2,3,4,5]. Since most of these values are positive, the list "mostly increases" and so could be considered "nearly sorted". Unlike nearly_sorted1, this one doesn't produce lists where an entry is very far from the ones on either side of it.
  • nearly_sorted3(n,k=3) - Returns a list of length n obtained from list(range(n)) as follows: First, that list is broken into k pieces of similar size. Then, those pieces are reassembled in a random order. For example, if n=10 and k=2 you might get a return value like [5,6,7,8,9,0,1,2,3,4] because pieces [0,1,2,3,4] and [5,6,7,8,9] were reassembled in the opposite order. Since the resulting list is likely to have long sublists that are in increasing order, it could be considered "nearly sorted".

4. Quicksort stress test

Do timing studies of quicksort with different pivot strategies on the types of nearly sorted lists generated in the previous problem, for n=1000, n=10_000, and n=100_000. Do you see a clear difference relative to a randomly shuffled list? Is there a difference between the strategies? Between the types of nearly sorted input?