A document from MCS 275 Spring 2021, instructor David Dumas. You can also get the notebook file.

Worksheet 7

MCS 275 Spring 2021 - David Dumas

Topics

The main topics of this worksheet are:

  • Recursion with backtracking, particularly using mazes
  • Mergesort and quicksort algorithms

The main references for these topics are:

Instructions

  • The worksheet has two parts (Mazes and Sorting). The problems about the maze module allow variations and extensions that could probably occupy an entire discussion. However, please make sure to get some practice with the sorting material, too.

I. Mazes

1. Learn a little more about maze.py

This problem involves reading and learning more about the module maze we discussed in Lecture 15. You'll use the things you learn about here in subsequent problems.

In Lecture 15 we discussed a minimal way of using the module:

  • Create a Maze object (e.g. with M = maze.Maze(xsize=7,ysize=7)
  • Set some squares to blocked (using M.set_blocked for individual squares, and/or M.apply_border which makes all of the edge squares blocked)
  • Set the start and goal (M.start and M.goal) locations
  • Use a recursive backtracking function to find a path from the start to the goal The function created in the last step can be found in solvemaze.py.

The module maze has several other features:

Saving a maze as SVG file

Maze objects have a method save_svg(filename) that writes a graphical representation of the maze to a file in the SVG format. This is a vector graphics format (scalable, not pixel-based) that can be viewed in most web browsers.

This method also accepts an optional argument highlight, which if given should be a list of (x,y) pairs that indicate squares in the maze that should be higlighted in light blue. For example,

mymaze.save_svg("cool.svg",highlight=[ (2,1), (3,1), (4,1) ])

will save the maze to cool.svg, and highlight three squares in blue.

The highlight argument is provided as a way to indicate the solution of a maze, or any other feature you might choose to display.

Saving a maze as a PNG file

For this feature, you need to install a module that isn't in Python's standard library. The module PIL is from a package called Pillow, and is used to work with image files. The following command should install it:

python3 -m pip install pillow

After installing Pillow, you can save a maze object as a PNG file (a bitmap graphics format supported by nearly every program that deals with images) using the save_png method. Its arguments are:

save_png(filename,scale=1,highlight=[])

The argument scale is a positive integer controlling the width of each maze square in pixels. The default, scale=1, will result in a tiny image file where each pixel is a maze square. To get a reasonable size output image, try scale=10 or scale=30.

Random Mazes

The class PrimRandomMaze inherits from Maze and automatically generates a random maze that has exactly one path from the start to the goal. Its constructor takes only two arguments

PrimRandomMaze(xsize,ysize)

and both xsize and ysize must be odd numbers. This random maze generator uses Prim's algorithm, which is known for producing mazes of moderate difficulty, with many short dead ends.

2. Basic solved maze image

Make a Python script that imports maze and solvemaze and uses them to do the following:

  • Accept one command line argument, an odd integer n
  • Generate a random maze of size n by n
  • Solve the maze by recursize backtracking
  • Write the solved maze (with solution highlighted) to a SVG file

Test the program, loading the SVG file it produces in a web browser.

Also try the same but with a PNG file as the output image type.

If everything works, your images should look a bit like this.

2.5 Solution movie

If you complete problem 2 quickly and want something to do before moving on, try this: Modify the solver script so that it saves every path ever considered by solvemaze() (at any point in the recursion) to a separate image file. Give them names like path001.png, path002.png, etc.. Run this on a maze of moderate size (e.g. 9x9) and flip through the images in your platform's image viewer to make a crude animation of the recursive process.

(The suggested way to do this is to create a global variable to store paths that have been considered, and have each call to solvemaze append the path it was given to that list.)

3. Simple custom mazes

Instead of using the PrimRandomMaze class to generate a maze, write your own subclass of Maze that creates a maze in which start and goal are set, the border is fully blocked, and so that it is possible to get from start to goal. The constructor should create some obstacles between the start and goal, to make the maze more interesting.

For example, you might make a class that simply places a large rectangle of blocked squares in the middle of the maze, so that a solution must either go along the top and right, or bottom and left. Or you might make some walls coming out of the sides, so that a solution needs to turn back and forth several times.

The key characteristics you are looking for are:

  • Ability to generate a maze of a given size (specified as arguments to the constructor)
  • Certainty that the maze always has a solution

It's OK for the constructor to decide a size is too small or is otherwise unacceptable, and raise an exception. But to keep things interesting, it should allow arbitrarily large mazes.

Write a script that instantiates your class and then uses solvemaze to find a solution. Save the solved maze to a SVG file.

4. Modifying maze solver

Make a blank n by n maze and block the border (only). Set the start to (1,1) and the goal to (n-2,n-2).

There are many paths from the start to the goal. Which one will solvemaze find in this case, and why? Test your hypothesis.

The particular path it finds is distinguished by the order in which the recursive algorithm tries the neighbors of the current square. Make a modified version of solvemaze that changes this order in each of the following ways:

  • The next square to add to the path is selected at random, from among the options available
  • The path tries to move in a straight line, before considering any movement that would constitute a turn
  • Of the next steps under consideration, choose the one closest to the goal "as the crow flies" (in terms of Euclidean plane distance)

Trying these on a Prim random maze won't be very interesting, because it has only one solution. But trying it on a fully open maze should show different results. Save a few to image files.

5. All dead ends

Let's say a square in a maze is a dead end if there is exactly one free neighbor.

Write a recursive function and an iterative function to take a maze and identify all of its dead ends. They should be returned as a list of (x,y) pairs.

The recursive function should be a small modification of solvemaze() which---if you look carefully---already notices when it has reached a dead end. However, you'll need to make changes so that it doesn't stop when it gets to the goal, but keeps going until every part of the maze has been explored.

Make a script that will generate a random maze and then highlight all of its dead ends, saving the result to an image file.

II. Sorting

6. Merge sorted stacks

Recall that a stack is a data structure that mimics a physical stack of items, where the only available operations are to remove the top item (pop) or add a new item to the top (push).

In Python, you can simulate a stack using a list by limiting yourself to only calling the methods

  • .pop() for the stack pop operation, and
  • .append(x) for the stack push operation In this way, the end of the list becomes the top of the stack.

In mergesort, the main step was to create a function that can merge two sorted lists. We made a version of this that uses indexing by integers. However, the algorithm for merging two sorted lists only ever needs to look at the "next" item from each of the lists, meaning it can also be implemented using stacks.

Make a function merge_sorted_stacks(A,B,S) that takes two stacks A and B whose elements are in sorted order, with the top of each stack being the smallest element, and an initially empty stack S. The function should merge the two stacks into a single reverse-sorted stack S. It can destroy A and B as it does so.

For example, it should function as follows:

In [8]:
# Example with numbers
# A list of numbers is a sorted stack if it is in descending order
# meaning the top of stack (last element of the list) is the smallest.
A = [5,3,1]
B = [6,4,3,2,0]
S = []
merge_sorted_stacks(A,B,S)
S  # will be a reverse sorted stack: top will be largest element
Out[8]:
[0, 1, 2, 3, 3, 4, 5, 6]
In [9]:
# Example with strings
# A list of strings is a sorted stack if it is in reverse alphabetical order
# meaning the top of stack (last element of the list) is the earliest in 
# the Python string order
S = []
merge_sorted_stacks(
    ["zebra","kangaroo","aardvark"],
    ["newt","asp"],
    S)
S
Out[9]:
['aardvark', 'asp', 'kangaroo', 'newt', 'zebra']

7. Mergesort and quicksort timing comparison

Which of the sorthing algorithms we implemented in lecture (contained in mergesort.py and quicksort.py) is faster for long lists of integers? What about short lists?

To investigate this, make a program that does the following things for N = 10, 100, 1_000, 10_000, 100_000, and 1_000_000:

  • Make a list L of N random integers between 1 and 10*N
  • Make a copy of that list, e.g. with L2 = [ x for x in L ]. Note that L2 = L will not work, because that would simply make L and L2 two names that refer to the same list.
  • Sort L with quicksort, making note of how long it takes
  • Sort L2 with mergesort, making note of how long it takes
  • Check that L and L2 are now equal, raising an exception if they are not. (This isn't strictly necessary, but it adds a consistency check that would probably detect if either sorting algorithm was broken.)

Then, display the results as a table. You can use whatever formatting you think would be most helpful, e.g.

N=1000  t_merge=0.1219s t_quick=0.4158s

What does this show? Is one clearly faster? By what ratio? Does the ratio change very much with N?

If you finish this quickly, there are lots of avenues for improvement and extension, such as:

  • Since there is a random element to this timing measurement, you cannot expect to see the same results every time. Upgrade the program to automatically do several runs at each value of N and average the results.
  • Add another column that shows the time for Python's build-in .sort() method on the same list.
  • Make a separate comparison script which uses a list that is almost sorted; you can do this by starting with a random list, sorting it, and then choosing 5% of the elements to swap into random other positions. Does this make a difference?
  • Challenge: Instead of using integers, make a custom subclass of int that adds an artificial 0.1s delay to any comparison by overriding the methods __lt__(self, other), __le__(self, other), __eq__(self, other), __ne__(self, other), __gt__(self, other), __ge__(self, other). Now, swapping elements is fast but comparing them is slow. Does this change the comparison substantially?

(The worksheet solutions won't cover these extensions.)

8. Quicksort with middle element pivot

The quicksort implementation we discussed in lecture uses the last element of the list as a pivot. However, this choice is an internal one made by the partition function. As long as partition partitions the list and returns the index of the pivot, it can use any pivot it likes.

Make a new version of partition that selects the element closest to the middle of the list as possible. (Of course, it is operating on a slice of the list between start and end, so it will need to look at the middle of that part.)

Hint: Maybe you can just pre-process the list so that the partition function from lecture, which uses the last element of the list as a pivot, does the right thing...

Test your modified version of quicksort to confirm that it works properly.