A document from MCS 275 Spring 2023, instructor David Dumas. You can also get the notebook file.

MCS 275 Spring 2023 Project 3 - DIY Dictionary

  • Course instructor: David Dumas

Instructions

Deadline is 6pm central on Friday 17 March 2023.

Collaboration policy and academic honesty

This project must be completed individually. We want to evaluate you on the basis of what you can do by consulting the resources listed below, and without help from anyone other than course staff. Seeking or giving aid on this assignment is prohibited (except for asking course staff); doing so constitutes academic misconduct which can have serious consequences. If you are unsure about whether something is allowed, ask. The course syllabus contains more information about the course and university policies regarding academic honesty.

Topic

In this project you'll build a module that is similar to the work we did in trees.py and integerset.py, and which implements a partial replacement for Python's dict type using a binary search tree as its backing data structure.

Resources you are allowed to consult

  • Documents and videos posted to the course web page
  • Any of the optional textbooks listed on the course web page
  • Course sample code

Ask if you are unsure whether a resource falls under one of these categories, or if you think I missed an essential resource.

What to do if you are stuck

Contact the instructor or your TA by email, in office hours, or on discord.

Quick summary

Read this section to quickly get an idea of what the project is about. But understand that the precise specifications in the next section must also be read.

Previous work: set and IntegerSet

In lectures 16-18 of MCS 275, we built classes for binary trees, binary search trees (BST), and then we created a class IntegerSet that is similar to Python's built-in set type (but which has fewer features and additional restrictions on elements). The final structure of that work was:

  • In trees.py:
    • Class Node representing a binary tree where each node stores a key
    • Class BST (a subclass of Node) representing a possibly empty BST supporting insertion and search.
  • In integerset.py:
    • Class IntegerSet for storing an unordered collection of distinct integers, which uses a BST as its main data structure.

You don't need to know anything about Node or BST to use IntegerSet. The use of trees is entirely "behind the scenes".

If you have code that uses Python's set type, then you can use IntegerSet as a drop-in replacement as long as:

  • The only objects you store in the set are integers
  • The only methods you use of the set class are add(...) and __contains__(...) (through the in operator)

This project: dict and Dictionary

In this project you will do something similar for Python's dict type. That is, you will build a partial replacement called Dictionary that uses a variation on a BST for storage. It will have fewer methods and more restrictions than Python's dict type.

When you're done, you'll have the following classes and module:

  • In proj3.py:
    • Class KVNode representing a binary tree where each node stores two values: a key and a value.
    • Class KVBST (a subclass of KVNode) representing a possibly empty BST that supports searching by key and various other operations.
    • Class Dictionary for storing an unordered collection of key-value pairs, which uses a KVBST as its main data structure.

When you're done, you will have a class Dictionary that would serve as a drop-in replacement for Python's dict in any program that:

  • Only uses dictionary keys that can be compared using >, <, and == (e.g. all integers, or all strings)
  • Only uses the following features of the dictionary:
    • add new key-value pair
    • change the value associated to a key
    • retrieve the value associated to a key

In addition to creating these classes, the project specification below also asks you to write some test code for the new Dictionary class.

A few things to keep in mind

Before we go into the details of the code you need to write, I want to emphasize a few things about the relation between this project and our previous work in the lecture examples:

  1. You're writing a single module that contains all three classes. That's different from the lecture example which was split between two modules (trees and integerset).
  2. You can (and probably should) adapt code from trees.py and integerset.py for this project. Just make sure anything you adapt it is truly and fully updated, with no inaccurate comments, no parts that are irrelevant or unused, etc..
  3. This project will not directly use or import trees or integerset modules. This is about making a bunch of new classes that have some similarities to the ones in those modules, but are different enough that importing is not helpful.
  4. You shouldn't use dict to store any key-value pairs in this project. The whole point is to build a partial replacement for dict from scratch. Similarly, when making IntegerSet we didn't use Python's set type.
  5. Dictionary does not impose any type restrictions directly. Unlike IntegerSet, this class doesn't require integer keys, and in general shouldn't do any checking of types.

Specifications of the proj3 module you must write

Download the starter pack:

It contains a file proj3.py with three empty class definitions and a placeholder where module test code would go. Edit that file and submit it as your project. Make it so that the contents follow the specifications in this section.

Class KVNode

Represents a binary tree with left child, right child, and parent attributes, as well as data attributes key and value at each node. The main difference from the Node class from lecture is the presence of the additional value attribute.

Constructor

  • __init__(self,key=None, value=None, left=None, right=None, parent=None) - Stores all the arguments as instance attributes. (It is expected that left, right, and parent are either None or are instances of KVNode.)

Required instance attributes

All initialized from the constructor arguments.

  • left
  • right
  • parent
  • key
  • value

Non-constructor methods

  • keys(self) - Returns a list of key attributes seen during an inorder traversal of the tree.
  • values(self) - Returns a list of value attributes seen during an inorder traversal of the tree.
  • items(self) - Returns a list of (key,value) pairs seen during an inorder traversal of the tree.
  • nodes(self) - Returns a list of the node objects seen during an inorder traversal of the tree.
  • __str__(self) - Returns the a string like <57:'onion'> where 57 is replaced by repr() applied to the key attribute and 'onion' by repr() applied to the value attribute of this node.
  • __repr__(self) - Returns the same thing as __str__.

Class KVBST, a subclass of KVNode

Represents a binary search tree in which a collection of key-value pairs are ordered by key. That is, any key in the left subtree is less than or equal to the key of this node, and any key in the right subtree is greater than the key of this node. In particular every key is expected to be comparable to all of the other keys in this tree (though there is no explicit checking for this).

The value attributes are present at each node but we don't care about their types, how they compare to one another, or even whether they can be compared.

As a special case, if an instance haskey attribute equal to None then the binary search tree is considered to be empty, and the first key-value pair that is inserted will live in this node.

Constructor

  • Inherited from KVNode, not changed.

Required instance attributes

  • Inherited from KVNode, not changed.

Non-constructor methods not inherited from superclass

  • insert(self,k,v) - Find a suitable place to add a KVBST node to this tree that has key k and value v, maintaining the search tree property.
  • search(self,k) - Find a node in this tree that has key equal to k. If no such node exists, return None. Otherwise, return the KVBST object of the node that was located.
    • Must use a binary tree search, not something less efficient like generating a list of all nodes and searching that list.

Class Dictionary (not a subclass of anything)

Represents a partial replacement for dict that uses a KVBST to store its data. Supports assigning values to keys, retrieving values by key, and checking whether a key is present. All keys must be comparable to one another.

Constructor

  • __init__(self,starting_items=None) - Creates a new empty KVBST and stores as an instance attribute T. If starting_items is not None, assume it is a dict object or something that behaves similarly (can iterate over it and get keys; can index it to get values; has a .items() method). Any key-value pair present in starting_items is added to this Dictionary. This provides a way to instantiate Dictionary and have it immediately populated with keys and values, rather than starting empty.

Required instance attributes

  • T - an instance of KVBST that is used to store the key-value pairs. Is private, i.e. meant to be used by methods of this class but not accessed directly by users of the class.

Non-constructor methods

  • __contains(self,k)__ - If there is a node in T with key k, return True. Otherwise, return False.

    • This is the method that gets called by a statement such as "foo" in D, if D is a Dictionary instance. That would translate to method call D.__contains__("foo").
  • __getitem__(self,k)__ - If there is a node in T with key k, return the value attribute of that node. Otherwise, raise a KeyError.

    • This is the method that gets called by a statement such as D["foo"], if D is a Dictionary instance. That would translate to method call D.__getitem__("foo").
  • __setitem__(self,k,v)__ - If there is a node in T with key k, change its value attribute to v. If there is no such node, add one that has key k and value v.

    • This is the method that gets called by a statement such as D["foo"] = 47, if D is a Dictionary instance. That would translate to method call D.__setitem__("foo",47).
  • __str__(self) - Return a string of this form

Dictionary({5: 'hello', 11: 'goodbye', 29: False, 58: 8619215})

where the part inside the {} characters is a series of comma-separated key: value pairs with keys in increasing order. Each key and value should be shown as the string returned by repr() applied to that attribute.

  • __repr__(self) - Returns the same thing as __str__.

REQUIRED TEST CODE

In addition to defining the classes, you need to write some code that tests and demonstrates the behavior of Dictionary.

But that test code must not run when proj3 is imported as a module.

The starter pack has a section that looks like this:

In [1]:
if __name__=="__main__":
    # PUT TEST CODE HERE
    pass

This conditional checks whether proj3.py is being run as a script (e.g. with python3 proj3.py). If it is, the code inside the conditional runs. So that's where the test code will go.

Specifically, add statements that demonstrate and test the Dictionary class as follows:

  1. Create and print an empty Dictionary
  2. Create and print a nonempty Dictionary
  3. Test a Dictionary for the values associated with several keys, all of which are present. Print the key-value pairs.
  4. Test a Dictionary for the values associated with several keys, none of which are present. That should raise an exception; make sure it does, and catch it using a try: ... except: block so it doesn't interrupt the test code.
  5. Make an empty Dictionary, assign a value to a key in that Dictionary, then give the same key a new value. Print the Dictionary after each step.
  6. Make an empty Dictionary and add 10,000 key-value pairs, where the keys and values are integers randomly selected from the range 0 to 999,999 (inclusive). Print a message like "Adding 10000 key-value pairs" beforehand, and "Done" afterward, but don't print anything during the process.

All such test code needs to be in one big indented block under if __name__=="__main__":.

This test code should mean that running

python3 proj3.py

in the terminal will provide a nice quick test of the Dictionary class, but opening the Python REPL and running

import proj3

will not result in any test code running.

If you want to put code that tests the KVNode and KVBST classes, you can do so. However, please put such tests inside the same conditional and before the Dictionary tests.

IMPORTANT RULES YOUR CODE MUST FOLLOW

Your project must follow the rules in the MCS 275 coding standards document. In addition:

  1. The only file you submit should be proj3.py.
  2. Have methods in proj3.py call one another if/when it is appropriate, rather than needlessly duplicating code.
  3. Do not include any commented-out code. Use comments only for explanatory text.
  4. Do not import any modules in proj3.py except those in Python's standard library, as needed.
  5. Do not use dict to store key-value pairs in class Dictionary. Use a KVBST instead.
  6. When imported as a module proj3.py must not do anything other than define classes.

Examples of behavior of the classes KVNode, KVBST, and Dictionary

This section contains some cells of Python code that use a solution to Project 3.

Setup

In [3]:
from proj3 import KVNode, KVBST, Dictionary

KVNode

In [4]:
# Create a node
a = KVNode(key=57,value="onion")
a
Out[4]:
<57:'onion'>
In [5]:
# Create another node
b = KVNode(key="The ultimate volleyball/programming vacation weekend", value=191458)
b
Out[5]:
<'The ultimate volleyball/programming vacation weekend':191458>
In [6]:
# Make one node a child of the other
# Their keys are not comparable, but that's OK for `KVNode`
a.left = b
b.parent = a
In [7]:
# Now we should have
#       a
#      / \
#     b  None
#    / \
# None None
print(a)
print("\t",a.left)
print("\t",a.right)
print("-"*80)
print(b)
print("\t",b.left)
print("\t",b.right)
<57:'onion'>
	 <'The ultimate volleyball/programming vacation weekend':191458>
	 None
--------------------------------------------------------------------------------
<'The ultimate volleyball/programming vacation weekend':191458>
	 None
	 None

KVBST

In [8]:
# Make an empty KVBST
T = KVBST()
T
Out[8]:
<None:None>
In [9]:
# Add some key/value pairs; these have string keys, and strings are comparable, so it's OK
T.insert("kitten",88)
T.insert("chinchilla",75)
T.insert("mole rat",204)
In [10]:
# Notice the left child has alphabetically earlier key
# and the right child has alphabetically later key
print(T)
print("\t",T.left)
print("\t",T.right)
<'kitten':88>
	 <'chinchilla':75>
	 <'mole rat':204>
In [11]:
# No other children present
assert T.left.left == None
assert T.left.right == None
assert T.right.left == None
assert T.right.right == None
In [12]:
# Check parent relations
assert T.right.parent == T
assert T.left.parent == T

Dictionary

In [13]:
# Make an empty dictionary
D = Dictionary()
D
Out[13]:
Dictionary({})
In [14]:
# Add some stuff
D[5] = "haberdasher"
D[9] = "oxygen"
D[2] = "priority"
In [15]:
D
Out[15]:
Dictionary({2: 'priority', 5: 'haberdasher', 9: 'oxygen'})
In [16]:
# Change some stuff
D[9] = "argon"  # modify
D[11] = "transformers lunchbox" # add
In [17]:
D
Out[17]:
Dictionary({2: 'priority', 5: 'haberdasher', 9: 'argon', 11: 'transformers lunchbox'})
In [18]:
# What's the root node of the underlying tree?
D.T
Out[18]:
<5:'haberdasher'>
In [19]:
# Check membership as key
8 in D
Out[19]:
False
In [20]:
# Check membership as key
9 in D
Out[20]:
True
In [21]:
# Retrieve value associated to key
D[2]
Out[21]:
'priority'
In [22]:
# Attempt to retrieve value for key that isn't there
try:
    D[88461713]
except Exception as e:
    print("That generated an exception of type",e.__class__.__name__)    
That generated an exception of type KeyError
In [23]:
# Attempt to get a key that isn't comparable to the ones already there
# will generate an exception because comparison is attempted
# (I catch the exception to avoid printing the solution code
# as part of the traceback!)
try:
    D["reconsidering"]  # str not comparable to int
except Exception as e:
    print("That generated an exception of type",e.__class__.__name__)
That generated an exception of type TypeError
In [24]:
# Make a Dictionary that starts with some items in it
# by passing `starting_items`
E = Dictionary({3:17, 2:25, 6:0, 5:0, 1:42})
In [25]:
E
Out[25]:
Dictionary({1: 42, 2: 25, 3: 17, 5: 0, 6: 0})
In [26]:
E[2]
Out[26]:
25
In [27]:
try:
    E[4]
except Exception as e:
    print("That generated an exception of type",e.__class__.__name__)
That generated an exception of type KeyError
In [28]:
E[2]=10000000000
In [29]:
E[2]
Out[29]:
10000000000

How your project will be graded

Autograder: 35 points

The autograder tests your module and grades it based on its behavior. The test code in proj3.py won't be tested by the autograder.

Manual review: 15 points

I will review your code and look for adherence to the coding standards and other rules given above. I will also check that the test code in proj3.py does what was specified above.

Revision history

  • 2023-03-05 Initial publication